With form recognition it is possible to automatically process scanned pages that have the same structure or arrangement of items on the page.
The prerequisite for the use of this module is that it has been purchased and activated.
Let us assume that we want to archive suppliers' invoices. The look and structure of the invoices is the same in all cases, i.e. the invoice date and the invoice number are to be found in fixed positions. We would like to extract both pieces of information from the invoices of the same type (forms) and copy them into the pertinent fields of our archive and we would like to split the pages automatically into documents.
This means that we scan all invoices as one batch, then the form recognition module splits them into individual documents, assigns automatically invoice numbers and dates, and files them in the archive.
In order that ArchivistaBox can do this, we must enter forms in WebAdmin
. As a second step these forms must be activated in scan definitions (see 14). The actual start of the form recognition process coincides with the triggering of the scanning process in WebDMS (see 8.4).
The form recognition module allows us to manage several forms. These are defined in WebAdmin
and menu item Form recognition
. One can add individual forms as so called masks. One mask corresponds to one form definition. Within a definition there are objects telling us where on the form exactly a piece of information can be found (e.g. invoice number), to which type it belongs (e.g. numerical field) and in which field of the archive it should be stored.
Thanks to good text recognition software the extraction of the information works well, however, we encounter the problem that scanning invariably involves certain inaccuracies. The invoice is not scanned straight, there is paper skew or, the page was printed awry in the first instance.
For all these cases there is the logo recognition
. Thanks to it the first step is a logo search on the page and a subsequent positioning of the objects in accordance with the logo. This enables the recognition process to extract the needed information in a very exact manner no matter how much paper skew there was during scanning.
Whenever possible the logo recognition should be used together with the form recognition. If this is not a possibility because the pages do not contain a logo, for example, one must add tolerance to the areas to be recognized (ca. 3 millimeters).
In the following the functions in connection with adding forms are described. Each mask definition corresponds to a form definition.