This option is used to postprocess (further improve) the recognized text. By entering a script name here you effect that during the form recognition process the mentioned script is run. As first and only parameter the script takes the file name of the recognized text. The script must open this file, process the text and display it on the console. The form recognition takes the script output and files the value in the field in question in the archive.
Example: From a date that was recognized only the year should be extracted. The Perl script below does just that job:
#!/usr/bin/perl use strict; my $file = shift; # get the file name open(FIN,$file); # read the file binmode(FIN); my @lines = <FIN>; close(FIN); my $txt = join("",@lines); # get the whole text in one variable $txt =~ s/\r/ /g; # replace all return with space $txt =~ s/\n/ /g; # " all newlines $txt =~ s/\t/ /g; # " all tabs $txt =~ s/\s\././g; # " space and point goes to point $txt =~ s/\.\s/./g; # " point and space goes to point $txt =~ s/\s{2,2}/ /g; # replace two 2 spaces with 1 $txt =~ /^(.*)([0-9]{2,2})(\.)([0-9]{2,2})(\.)([0-9]{4,4})(.*)$/; if ($2 ne "" && $4 ne "" && $6 ne "") { $txt="$6"; # if we got a day,month and year, give back the year } else { $txt=""; # don't give back anything } print $txt; # print it out (give it back to the form recognition)
In order that the script may be run it must be stored in the path mentioned below.
/home/data/archivista/cust/formrec
To copy a script into this folder you can proceed in the same manner as when you prepare a logo for form recognition. For more information see 18.4.2.