[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Using MARC.pm to pre-process netLibrary (or any other) MARC files



Our library is a recent subscriber to netLibrary titles through our
affiliation with the main campus.  But as a law library we only wanted to
load a subset of records for the books into our system.  (We have our own
ILS.)  We also wanted to do some transformations on the records specific
to our site.

I looked around, but couldn't find a PERL script which did this, so I
created one of my own (attached).  While you certainly won't do the same
things to your records that we wanted to do, this may give you a starting
point for your own script.

The script is attached and you'll find a pod2html version of the
documentation at:

  <http://www.pandc.org/peter/work/projects/parseNetLibrary.html>


Peter
--
Peter Murray, Computer Services Librarian              W: 860-570-5233
University of Connecticut Law School             Hartford, Connecticut
#!/usr/local/bin/perl -w
##
###########################################################################
##
##  Program:  parseNetLibrary.pl
##
##  Purpose:  Look at netLibrary records for those matching a regexp and
##            prepare them for processing into the INNOPAC
##
##  Version:  1.1  1-Oct-2001
##
##  Author:   Peter Murray, 
##            University of Connecticut School of Law
##            pmurray@law.uconn.edu
##
##  Legalities:
##            Copyright 2001 University of Connecticut.
##
##  Revision History:
##    29-Sep-2001  pem  Initial Release
##     1-Oct-2001  pem  Added debug flag; added 710 "authority control"
##     2-Oct-2001  pem  Released to PERL4LIB as v1.1
##
##  To-dos:
##
## DOCUMENTATION, in PERL POD format, is at the end of the program.
## Running the program `perldoc <programname>` should output the manual.
##

use strict;
use vars qw($opt_o $opt_d $opt_q);
use MARC;
use Getopt::Std;

## $searchParam is the regular expression searched for to be included in the output file
my %searchParam = (field=>'050',subfield=>'a',regex=>'/^K.*/');

## %markerField is the field which will be inserted into the MARC file to tag each one
## We'll search on this field later when we want to gather all of the records from this
## load together.
my %markerField = (record=>'1',field=>'590', i1=>'0', i2=>'0', 
  value=>[a=>"netLibrary insertion ".sprintf('%4.4d-%2.2d-%2.2d',
  (localtime())[5]+1900, (localtime())[4]+1, (localtime())[3])]);

## %commandField{1|2} are fields which will be inserted into the MARC file to set parameters
## for the INNOPAC record loader
my %commandField1 = (record=>'1',field=>'999', i1=>'0', i2=>'0', value=>[d=>'@'], ordered=>'n');
my %commandField2 = (record=>'1',field=>'940', i1=>'0', i2=>'0', value=>[l=>'web'], ordered=>'n');

## $subzmsg is the message we'll replace the 856 subfield z with
my $subzmsg = 'Access an electronic copy of this book.';



## Get the commandline parameters and display a usage statement if there is a problem.
getopts('o:dq');
if (((!defined $opt_o)&&(!defined $opt_d)) || (scalar(@ARGV) == 0)) {
  die "Usage:  $0 (-d | -o outputfile) inputfile [inputfile ...]\n";
}
if ((! $opt_d)&&(-s $opt_o)) {
  die "$0: This program will not overwrite existing files. $opt_o exists.\n";
}

## We'll use this variable to display the total number of records at the end
my $total_rec_found;

## For each input file specified on the command line, loop...
foreach my $infile (@ARGV) {
 # Create a new MARC instance
  my $x = new MARC;
 # Try to open the MARC file
  if (!$x->openmarc({file=>$infile, format=>'usmarc'})) {
    warn "Couldn't open $infile: $!\n";
    next;
  }
  my($rec_count,$rec_found);
 
 # Now that we have it open, we'll loop through one record at a time.
 # "Give me a record, Vasily.  One record only please!"  -SConnery
  while ($x->nextmarc(1)) {
    $rec_count++;
   # We found what we were looking fo in this record, so we'll process it.
    if ($x->searchmarc(\%searchParam)) {
      $rec_found++;
      
     # Copy the 050 to the 090
      $x->addfield({record=>'1',field=>'090'}, $x->getupdate({record=>'1',field=>'050'}));
      
     # Get the record as a long string so we can use regexs to manipulate it.
                       my $stringvar = $x->[1]->as_string();
          # Replace the 856 field
               $stringvar =~ s/^(
         856\s     # 856 fields...
         ..\s      # ...with any indicators...
         .*\c_z)   # ...and everything before subfield 'z' [hold in $1]
         [^\c_]+   # We'll throw away the existing subfield z value (until the next subfield marker)
         (.*)$     # And capture everything after the subfield z [hold in $2].
         /$1$subzmsg$2/xm;
     # Update the authority field for the alt author to the correct form.
            $stringvar =~ s/^(710\s2\s\s.).*$/$1anetLibrary, Inc./xm;
     # Store the string format back into the record
                  $x->[1]->from_string($stringvar);

     # Add the marker field and command fields
      if (!$x->addfield(\%commandField2)) { warn "problem with record $rec_count of $infile\n";}
      if (!$x->addfield(\%commandField1)) { warn "problem with record $rec_count of $infile\n";}
      if (!$x->addfield(\%markerField)) { warn "problem with record $rec_count of $infile\n";}
      
     # Now output the record, either to the screen for debugging or to the specified file.
              if (defined $opt_d) {
                  print $x->output({format=>'ascii'});
                      } else {
                      # Send the record to the output file.  APPEND!
                  $x->output({file=>">>$opt_o", format=>'usmarc'}) ||
                       die "Couldn't output to $opt_o: $!\n";
               }
    }
   # Delete this record from the object so we can move on to the next.  Note!
   # this does not delete the record from the input file itself; the input file
   # is unchanged.
    $x->deletemarc();
  }
 # Done with this file.  Display messages, close it, add the record total, and move
 # on to the next.
  print "Extracted $rec_found records of $rec_count from $infile\n" if !defined $opt_q;
  $x->closemarc();
  $total_rec_found += $rec_found;
}

print "Wrote $total_rec_found records".(defined $opt_o ? " to $opt_o" : '').".\n" if !defined $opt_q;
exit 0;


=head1 NAME

parseNetLibrary.pl - Process MARC files from netLibrary

=head1 SYNOPSIS

  parseNetLibrary.pl -o outfile inputfile [inputfile ...]

=head1 DESCRIPTION

Using the netLibrary MARC records, select the specific records we want to
load, transform them (e.g. change the 856 subfield z text), mark them in
such a way that we can collect them in a boolean list at a later time, then
output them to a file.

=head1 FLAGS

=over 4

=item -o <outfile>

All of the records selected from the input files will be written to this
output file.

=item -d

Rather than sending output to a file, send an ASCII version of the MARC 
record to the screen (useful for debugging).

=item -q

Don't output messages count the number of records in each file and the total
number of records.

=back

=head1 COPYRIGHT

Copyright 2001 University of Connecticut.

=head1 AUTHOR

 Peter Murray
 Computer Services Librarian
 University of Connecticut School of Law
 pmurray@law.uconn.edu
 
Updates from http://www.pandc.org/peter/work/projects/parseNetLibrary.html

=cut