package Marc; use Carp; use vars qw($VERSION $AUTOLOAD); use strict; $VERSION = "4.3"; ##------------------------------------------------------------------------## ## Constructor sub new { my $class = shift; my $fields_ref = shift; my $self = { directory_labels => {}, permitted => $fields_ref, sort_function => 'sub { $a cmp $b }', %$fields_ref, }; $self->{permitted}->{sort_function} = 'sub { $a cmp $b }'; bless $self,$class; return $self; } ##------------------------------------------------------------------------## ## The AUTOLOAD function allows for the dynamic creation of accessor methods sub AUTOLOAD { my $self = shift; my $type = ref($self) or croak "$self is not an object"; my $name = $AUTOLOAD; # DESTROY messages should never be propagated. return if $name =~ /::DESTROY$/; # Remove the package name. $name =~ s/^.*://; unless (exists($self->{permitted}->{$name})) { $self->error("Can't access `$name' field in object of class $type"); } if (@_) { return $self->{$name} = shift; } else { return $self->{$name}; } } ##------------------------------------------------------------------------## ## Read Configuration File (if it exists) ## PUBLIC METHOD sub read_config_file { my $self = shift; my $dir = $self->working_dir; my $code = ''; open FH, "<$dir/.marc-search.cfg" or $self->error("Could not open config file $dir/.marc-search.cfg: $!"); while () { next if /^\s*$/ or /^\#/; chomp; if (m,^\s*,i) { while () { last if m,^\s*,i; next if m,^\s*$, or m,^\#,; $self->archive_name($_); } } elsif (m,^\s*,i) { while () { last if m,^\s*,i; next if m,^\s*$, or m,^\#,; my ($dir,$label) = split (/\s+=>\s+/,$_); $self->{directory_labels}->{$dir} = $label; } } elsif (m,^\s*,i) { $code = 'sub { '; while () { last if m,^\s*,i; next if m,^\s*$, or m,^\#,; my ($dir,$label) = split (/\s+=>\s+/,$_); $code .= $_; } $code .= ' }'; $code =~ s/\s+/ /g; $self->error("Bad sort function: $code") if ($code !~ /\$a/ or $code !~ /\$b/); # print $code; # debugging $self->sort_function ($code); } } close FH or $self->error("Could not close config file $dir/.marc-search.cfg: $!"); } ##------------------------------------------------------------------------## ## Error Handler ## PUBLIC METHOD sub error { my $self = shift; my ($package,$filename,$line) = caller; print <

Marc-search internal error

@_
This error occured in the $package package of $filename at line $line.
xxxEOFxxx exit 1; } 1; __END__ =head1 NAME marc-search.cgi - CGI Script to search MHonArc archives =head1 AUTHOR Eric D. Friedman friedman@uci.edu Jason C. Lin jlin@uci.edu (earlier version, of which little remains here) =head1 DESCRIPTION Searches e-mail archives created by Earl Hood's MHonArc (http://www.oac.uci.edu/indiv/ehood/mhonarc.html). Search options include "From," "Subject," "Date," and "Message Body." Returns results in a visually useful format, with matches printed in bold. Search terms can be treated as a literal phrase or as words to be joined by 'AND' or 'OR' booleans. The full suite of Perl5 regular expressions is allowed. Allows user to set a limit for the number of records to return on each page, with an option to continue the search or start a new one. =head1 INSTALLATION Move the marc-search.cgi script to scriptaliased directory. Install Marc.pm, Marc/Form.pm, and Marc/Search.pm on your filesystem (not necessarily in your site_perl) and indicate the location of that directory in a use lib '/path/to/directory'; statement in the configurable options section of marc-search.cgi. It's a good idea to make sure perl can find the modules by testing the script from the command line as follows: perl -c marc-search.cgi Add the following line to the 'maillist.html' page (or whatever you've changed it to using mhonarc's idxfname switch) in the directory where your MHonArc archive is kept: Search Install a reliable perl module for parsing input from HTML forms. I recommend CGI_Lite, but others are certainly possible. See the Version History and the marc-search.cgi source for more on this subject. Set the configurable options as appropriate for your site. Note that all of these are in single quotes - '' - and that the lines must end with a semi-colon. =over 4 =item $server The URL for your HTTP server WITHOUT a trailing slash. =item $help The location (when concatenated with the value in $server) of a file containing "help" for your users on the various options offered by marc-search. The hyperlinks on the search form will attempt to resolve to the specific section on this page corresponding to the link chosen. An English "help" page is included in the distribution. Thomas M. Stein has graciously prepared a bilingual (German-English) version of the same text, which is also included. =item $doc_root The directory on your file system which corresponds to your web server's document root. In other words, the directory you would `cd' to if you wanted to look at the files accessible at the top level of your web server. =item $script The URL for marc-search itself (when concatenated with the value in $server. =item $usersubdir Tells which subdirectory of a user's home directory (file system) corresponds to http://server/~user/. On most systems this is `public_html.' You only need to worry about this if the URL to your archive has a tilde in it. Thanks to Jeffrey B. Thompson for some helpful suggestions that led to an (I think) elegant solution. =back =head1 CONFIGURATION FILE Note that these files are optional. If you do not include them in your archive directories, directory names will be used for your Archive Name and for the labels in your list of subdirectories. Each directory from which users can initiate a search may have its own configuration file. This allows you to give specific names to parts of your archive: My Archive - 1996, for example. The file should be named .marc-search.cfg (note the dot!) and should look something like this: The Archive of Really Important Messages 01 => January 02 => Feburary # More directories here (note that # signs at the beginning of # a line indicate a comment which will be ignored in the reading # of your configuration file.) # reverse alphabetical order $b cmp $a In this example, this file resides in a directory with multiple subdirectories, one for each month of the year. These directories have numeric names on the filesystem, but marc-search will translate those codes into the corresponding string on the opposite side of the arrow. The sort function given here tells marc-search to list multiple subdirectories in reverse alphabetical order. The default is to sort them alphabetically. You are free to write your own custom sort functions, and can make them as complicated as you wish, though naturally this will mean learning something about sorting in perl. =head1 VERSION HISTORY =over 4 =item 4.3 Fixed two bugs that caused complaining in log files. Fixed bug in which marc-search would go into an infinite loop looking for tags which MHonArc does not guarantee will always be present. Introduced a loop into the searching code so that Version information at the top of files generated by recent versions of MHonArc is skipped. Thanks to Douglas Gray Stephens for his help with this. =item 4.2 Various bug fixes. =item 4.1 Split up into separate modules. =item 4.0 Works with archives split over multiple directories. Works on web sites that are tilde dependent. Make sure to set $usersubdir to match your server settings in the configurable variables section during installation! Checks for a configuration file (.marc-search.cgf -- note leading dot!) in the directory from which the search was initiated and reads the archive name from it. Otherwise the directory name is used. Reads in labels for subdirectories from the configuration file (if present). Uses Shishir Gundavaram's CGI_Lite module for reading/parsing form data. You may modify the source to use another library if you must (the relevant section is documented extensively), but I urge you to give Shishir's module some serious consideration: as the name suggests, it's lightweight and efficient, and does everything you need it to do. Pick up a copy at . Prints records as they are found rather than storing them in a (possibly quite large) hashtable until the end of the search. Result: uses less memory and performs one less sort operation for greater speed. Now completely object oriented. Eliminated separate methods for full body and partial searches. Various other tweaks and bug fixes. =item 3.3 Boolean AND searching scans the entire message body rather than looking for all of the words on a single line. Results from this kind of search include the line in which the words were found. =item 3.2 Added multi-page output control and 'new search' option. =item 3.1 Added context highlighting for subject, from and date. Added option to require a match of all search terms. Improved boolean searching using dynamic subroutines. Fixed bug in _find_match ($str was not localized). =item 3.0 Rewritten by EDF from the ground up in Perl5. New Interface for search form - Tables, but lynx friendly. New format for results, including 3 lines of context for Body searches, and date, subject, and author info on all searches. Setting a limit on the # of records to search now actually has meaning on the functional level of the program. Added crude Boolean searching (OR only). Added case (in)sensitive matching. Added 'prefer new/old' messages option. Added navigational links that point to the original archive. Eliminated need for a FORM on the calling page (maillist.html). Fixed several flow of control problems =item 2.1 ??? (No documentation to speak of) =back =head1 COPYRIGHT Copyright (C) 1996 Eric D. Friedman, friedman@uci.edu http://www.oac.uci.edu/indiv/friedman This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. =cut