Many websites can be archived quite well with SingleFile, but not all. Some homepages are so dynamic that the content cannot be recorded statically. This is where the tool capt2pdf
developed by Archivista GmbH can help.
The tool records an image of the screen every x desired seconds and transfers the copies created to the Tesseract text recognition system. Optical character recognition (OCR) is used to make the screen copy readable. The screen copy and the recognized text are then converted into a PDF file. With this method, all web content can be recorded quite well in a kind of 'flip book'.
The program requires a reasonably modern Linux desktop. It is included by default on the Archivista desktop and with AVMultimedia. For (almost) all other Linux distributions it can be obtained here:
https://archivista.ch/cms/wp-content/uploads/2024/06/capt2pdf.zip
The zip file must be unpacked. The program can then be started via a terminal with 'perl capt2pdf'. With 'perl capt2pdf 5', for example, a screenshot is recorded every five seconds until it is deactivated again with 'perl cpat2pdf 0'.
On the ArchivistaBox desktop or with AVMultimedia,
cpat2pdf
can also be called via function keys. Ctrl+PrintScreen starts the recording, Shift+Ctrl+PrintScreen ends the recording. The PDF file is also opened directly.
PDF files created in this way can be archived very easily in ArchivistaDMS.