I've made a few modifications to deal with some peculiarities of the lists
that I'm archiving. I include a patch to mhonarc and lib/mhutil.pl below.
I can't guarantee that everything is implemented as cleanly as possible,
but it works for me while archiving over 10000 messages. Let me know if
anyone finds any of this useful.
Things I did:
Added a new resource element, TRIMSUBJECT, which is a regexp that will
be removed from the subject of each message wherever it appears. My lists
go out with a tag like "FVWM:" at the head of each message, which I didn't
want to show up in the archives. The match is case sensitive.
Added a new resource element, TRIMMSG, which is a multiline regexp
that will be removed from the body of each message wherever it appears. My
lists go out with a five line footer giving the location of appropriate web
pages, unsubscription instructions and the address of the list maintainer.
I wanted this removed from the archived messages. The match is case
sensitive and multiline.
Added a new resource variable, $OUTDIR$, which contains the current setting
of the -outdir command line flag. I wanted to include this information in
the headers of my indices.
Reordered regexp matches in extract_email_address() to work around bug as
suggested by Earl Hood in
<199603072339(_dot_)RAA07213(_at_)imagine(_dot_)convex(_dot_)com> to the
mailing list.
- J<
--- mhonarc.orig Mon Mar 4 23:15:32 1996
+++ mhonarc Wed Mar 6 00:17:22 1996
@@ -1245,6 +1245,7 @@
## Get Subject ##
##-------------##
if ($fields{'subject'} !~ /^\s*$/) {
+ $fields{'subject'} =~ s/$TRIMSUBJECT// if $TRIMSUBJECT;
($sub = $fields{'subject'}) =~ s/\s*$//;
&htmlize(*sub);
} else {
@@ -1302,6 +1303,11 @@
$data .= $_;
}
return '' if $skip;
+
+ if ("$TRIMMSG") {
+ $data =~ s/$TRIMMSG//mo;
+ }
+
$fields{'content-type'} = 'text/plain'
if $fields{'content-type'} =~ /^\s*$/;
($ret, @files) = &'MAILread_body($header, $data,
@@ -1787,6 +1793,8 @@
{ $tmp = $NumOfMsgs; last REPLACESW; }
if ($var eq 'ORDNUM')
{ $tmp = $i+1; last REPLACESW; }
+ if ($var eq 'OUTDIR')
+ { $tmp = $OUTDIR; last REPLACESW; }
if ($var eq 'PREVFROM') {
$canclip = 1; $raw = 1;
$tmp = &dehtmlize($From{$previndex});
--- lib/mhutil.pl.orig Mon Mar 4 23:07:54 1996
+++ lib/mhutil.pl Tue Mar 12 22:28:35 1996
@@ -504,6 +504,24 @@
}
last FMTSW;
}
+ if ($elem eq "trimmsg") { # String to trim from msg
+ $TRIMMSG = '';
+ while ($line = <FMT>) {
+ last if $line =~ /^\s*<\/trimmsg\s*>/i;
+ $TRIMMSG .= $line;
+ }
+ last FMTSW;
+ }
+ if ($elem eq "trimsubject") { # String to trim from subj
+ $TRIMSUBJECT = '';
+ while ($line = <FMT>) {
+ last if $line =~ /^\s*<\/trimsubject\s*>/i;
+ next if $line =~ /^\s*$/;
+ chop $line;
+ $TRIMSUBJECT = $line;
+ }
+ last FMTSW;
+ }
if ($elem eq "tsubsort") {
$TSUBSORT = 1; last FMTSW;
}
@@ -543,10 +561,17 @@
local($str) = shift;
local($ret);
- if ($str =~ s/\([^\)]+\)//) {
- $str =~ /\s*(\S+)\s*/; $ret = $1;
- } elsif ($str =~ /\<(\S+)\>/) {
+# if ($str =~ s/\([^\)]+\)//) {
+# $str =~ /\s*(\S+)\s*/; $ret = $1;
+# } elsif ($str =~ /\<(\S+)\>/) {
+# $ret = $1;
+# } else {
+# $str =~ /\s*(\S+)\s*/; $ret = $1;
+# }
+ if ($str =~ /\<(\S+)\>/) {
$ret = $1;
+ } elsif ($str =~ s/\([^\)]+\)//) {
+ $str =~ /\s*(\S+)\s*/; $ret = $1;
} else {
$str =~ /\s*(\S+)\s*/; $ret = $1;
}