As of version 2021/V, ArchivistaBox includes optional speech recognition (Vosk and Kaldi) to extract spoken content as text from audio and video files. Speech recognition is optional because it is not delivered directly on the ArchivistaBox.
Rather, it must be obtained via a separate download. Due to the size of currently more than 5 GByte (unpacked 10 GByte) the download link is gladly provided on request. The download creates the file vosk.os
. This file must be copied (with root rights!) to /home/data
. The ArchivistaBox must then be restarted.
This activates the speech recognition for the ArchivistaBox. When processing audio and video files, if speech recognition is activated, the text is always also extracted during each text recognition process (OCR) and displayed in the 'Page text' field to match the preview pages as closely as possible. Here is an example: 111.45: recognized text
. The number 111.45
corresponds to the corresponding position of the document in seconds and behind the colon is the extracted text (here recognized text
).
Please note: Between 10 and 60 seconds are needed for one minute of speech, depending on the hardware. The recognized text fragments appear in the page text after the -----speech recognition-----
fragment after recognition.