With this option, a script can be processed before the document is processed. This is useful, for example, if files should only be added to the archive under certain circumstances. It is also conceivable that the set file is used to tag another file.
During processing, the internal job number and the file name of the file to be processed are transmitted. The internal job number can be used, for example, to determine the database into which the import is to take place. Below is an example of a script:
The above example is used to create keywords for articles downloaded from the Swiss media database (see swissdox.ch). First, the PDF file is obtained and created with the name xxxx.pdf. This is uploaded to the archive. Then the html file matching the article with the name xxxx.html (or xxxx.htm) is downloaded and also uploaded. The script now checks whether a PDF file with the same name as the HTML file exists. If this is the case, the most important information for keywording is prepared. Specifically, this is the title, the date, the medium (e.g. NZZ) and the page on which the article was published.
#!/usr/bin/perl
use strict;
use lib qw(/home/cvs/archivista/jobs);
use AVJobs;
my $jid = shift;
my $jname = shift;
logit("$0 with $jid and $jname started");
my $dbh = MySQLOpen();
if ($dbh) {
if (HostIsSlave($dbh)==0) {
my $sql = "select db from archivista.jobs where id=$jid";
my @rows = $dbh->selectrow_array($sql);
my $db = $rows[0];
if ($db ne "") {
$sql = "select value from archivista.jobs_data where ".
"jid=$jid and param='WEB_FILE'";
@rows = $dbh->selectrow_array($sql);
my @parts = split(/\//,$rows[0]);
my $fname = pop @parts;
@parts = split(/\./,$fname);
my $ext = lc(pop @parts);
my $base = join('.',@parts);
if (($ext eq "html" || $ext eq "htm") && $base ne "") {
my $fpdf = $base.".pdf";
$sql = "select Laufnummer from $db.archiv where ".
"MediaName=".$dbh->quote($fpdf)." and (Titel=” or Titel is null)";
@rows = $dbh->selectrow_array($sql);
my $lnr = $rows[0];
if ($lnr>0) {
my $cont = "";
readFile2($jname,\$cont);
foreach my $part ("header","footer","script") {
my $hbeg = "<$part";
my $hend = "</$part>";
my $hto = "<$part></$part>";
$cont =~ s/($hbeg)(.*?)($hend)/$hto/sm;
}
my $cl = "<div class=\"";
my $hbeg = "LayoutSc__StyledSidebar";
my $hend = "LayoutSc__Main";
$cont =~ s/($cl)($hbeg)(.*?)($cl)($hend)/$cl$hend/sm;
my $title="";
my $date="";
my $medium="";
my $page="";
my $start="ArticleContentSc__Headline";
$cont =~ /($start)(.*?)(>)(.*?)(<\/div>)/s;
$title = $4 if $1 eq $start && $4 ne "";
if ($title ne "") {
logit("$title for $lnr in $db found, so it is meta file");
$title=optimizer($title);
$sql = "Titel=".$dbh->quote($title);
my $id = 0;
$cont =~ /(<article\sdata-id=\")([0-9]+)(\">)/;
$sql .= ",swissdoxID=".$dbh->quote(optimizer($2)) if $2>0;
foreach my $val ("newspaper","date","page") {
my $start = "ArticleMetaSc__Item";
$cont =~ /($start)(.*?)($val\">)(.*?)(<\/div>)/;
if ($1 eq $start && $4 ne "") {
if ($val eq "newspaper") {
$medium = optimizer($4);
$sql.=",Medium=".$dbh->quote($medium);
} elsif ($val eq "date") {
$date = optimizer($4);
my $date1 = $date;
$date1 =~ s/(\s.*?)$//;
@parts = split(/\./,$date1);
my $year = pop @parts;
my $month = sprintf("%02d",pop @parts);
my $day = sprintf("%02d",pop @parts);
$sql.=",Datum='$year-$month-$day 00:00:00'";
} elsif ($val eq "page") {
$page = optimizer($4);
}
}
}
my $sql1 = "";
$sql1 = "$medium" if $medium ne "";
$sql1 .= "; $title" if $title ne "";
$sql1 .= "; $date" if $date ne "";
$sql1 .= "; $page" if $page ne "";
$sql .= ",Bibliographie=".$dbh->quote($sql1);
$sql = "update $db.archiv set $sql where Laufnummer=$lnr";
logit($sql);
$dbh->do($sql);
unlink "$jname" if -e "$jname";
}
}
}
}
}
}
sub optimizer {
my ($cont) = @_;
$cont =~ s/<.*?>//g;
$cont = fromUTF8($cont);
return $cont;
}
Please note: The above script expects the fields 'swissdoxID', 'Medium' and 'Bibliography' as fields. The multimedia fields (see 25.7.14) must also be activated. The script itself can be transferred to the box via WebAdmin via 'Administrate jobs' (21). Alternatively, it can also be stored in the
/home/data/archivista/cust/autofields
folder (rights must be set to Execute).