mhonarc-users

Some hacks I find useful

1996-03-13 16:39:40
I've made a few modifications to deal with some peculiarities of the lists
that I'm archiving.  I include a patch to mhonarc and lib/mhutil.pl below.
I can't guarantee that everything is implemented as cleanly as possible,
but it works for me while archiving over 10000 messages.  Let me know if
anyone finds any of this useful.

Things I did:

Added a new resource element, TRIMSUBJECT, which is a regexp that will
be removed from the subject of each message wherever it appears.  My lists
go out with a tag like "FVWM:" at the head of each message, which I didn't
want to show up in the archives.  The match is case sensitive.

Added a new resource element, TRIMMSG, which is a multiline regexp
that will be removed from the body of each message wherever it appears.  My
lists go out with a five line footer giving the location of appropriate web
pages, unsubscription instructions and the address of the list maintainer.
I wanted this removed from the archived messages.  The match is case
sensitive and multiline.

Added a new resource variable, $OUTDIR$, which contains the current setting
of the -outdir command line flag.  I wanted to include this information in
the headers of my indices.

Reordered regexp matches in extract_email_address() to work around bug as
suggested by Earl Hood in 
<199603072339(_dot_)RAA07213(_at_)imagine(_dot_)convex(_dot_)com> to the
mailing list.

 - J<

--- mhonarc.orig        Mon Mar  4 23:15:32 1996
+++ mhonarc     Wed Mar  6 00:17:22 1996
@@ -1245,6 +1245,7 @@
     ## Get Subject ##
     ##-------------##
     if ($fields{'subject'} !~ /^\s*$/) {
+       $fields{'subject'} =~ s/$TRIMSUBJECT// if $TRIMSUBJECT;
        ($sub = $fields{'subject'}) =~ s/\s*$//;
        &htmlize(*sub);
     } else {
@@ -1302,6 +1303,11 @@
        $data .= $_;
     }
     return ''  if $skip;
+
+    if ("$TRIMMSG") {
+       $data =~ s/$TRIMMSG//mo;
+    }
+
     $fields{'content-type'} = 'text/plain'
        if $fields{'content-type'} =~ /^\s*$/;
     ($ret, @files) = &'MAILread_body($header, $data,
@@ -1787,6 +1793,8 @@
            { $tmp = $NumOfMsgs; last REPLACESW; }
        if ($var eq 'ORDNUM')
            { $tmp = $i+1; last REPLACESW; }
+       if ($var eq 'OUTDIR')
+           { $tmp = $OUTDIR; last REPLACESW; }
        if ($var eq 'PREVFROM') {
            $canclip = 1; $raw = 1;
            $tmp = &dehtmlize($From{$previndex});
--- lib/mhutil.pl.orig  Mon Mar  4 23:07:54 1996
+++ lib/mhutil.pl       Tue Mar 12 22:28:35 1996
@@ -504,6 +504,24 @@
            }
            last FMTSW;
        }
+       if ($elem eq "trimmsg") {               # String to trim from msg
+            $TRIMMSG = '';
+           while ($line = <FMT>) {
+               last  if $line =~ /^\s*<\/trimmsg\s*>/i;
+               $TRIMMSG .= $line;
+           }
+           last FMTSW;
+       }
+       if ($elem eq "trimsubject") {           # String to trim from subj
+            $TRIMSUBJECT = '';
+           while ($line = <FMT>) {
+               last  if $line =~ /^\s*<\/trimsubject\s*>/i;
+               next  if $line =~ /^\s*$/;
+                chop $line;
+               $TRIMSUBJECT = $line;
+           }
+           last FMTSW;
+       }
        if ($elem eq "tsubsort") {
            $TSUBSORT = 1; last FMTSW;
        }
@@ -543,10 +561,17 @@
     local($str) = shift;
     local($ret);
 
-    if ($str =~ s/\([^\)]+\)//) {
-       $str =~ /\s*(\S+)\s*/;  $ret = $1;
-    } elsif ($str =~ /\&lt;(\S+)\&gt;/) {
+#    if ($str =~ s/\([^\)]+\)//) {
+#      $str =~ /\s*(\S+)\s*/;  $ret = $1;
+#    } elsif ($str =~ /\&lt;(\S+)\&gt;/) {
+#      $ret = $1;
+#    } else {
+#      $str =~ /\s*(\S+)\s*/;  $ret = $1;
+#    }
+    if ($str =~ /\&lt;(\S+)\&gt;/) {
        $ret = $1;
+    } elsif ($str =~ s/\([^\)]+\)//) {
+       $str =~ /\s*(\S+)\s*/;  $ret = $1;
     } else {
        $str =~ /\s*(\S+)\s*/;  $ret = $1;
     }

<Prev in Thread] Current Thread [Next in Thread>