procmail
[Top] [All Lists]

Re: subject processing

1997-07-05 12:54:00
[Don't read this, Eli; it will anger you.]

Michael Helm asked,

| I'm trying to deal with a mailing list that has some subscribers
| with problem user mail agents. These prefix the subject with 
| various junk strings that trip up threading & sorting in conventional
| mailers, eg Re[8]: my subject or Sv: My subject or FW: stuff
| What's worse is that they combine like a genetics experiment:
| Often see Re: Re[2]: Sv: FW: nothing
| & similarly ugly combinations.

| I can get rid of these junk strings separately, but what I want to
| do is process an incoming message until they are *all* gone.  ...
| What I'd like to do is process the messages recursively until 
| they stop matching these rules, & then move on, but I've never been
| able to figure out how to get procmail to do that.  I'd prefer not
| to overload my brain with even more complex regular expression stuff,
| gets very difficult to understand or change.  What can I do?
| 
| Any suggestions appreciated.

Michael said the magic word: recursion.  Because, as Era Eriksson explained
to Martin Ramsch, procmail is right-side-greedy and left-side-stingy in
assigning MATCH, we can't do the simple thing [well, we can when there are
no colons in the significant part of the subject:

    * ^Subject:.*\/[^:]+
    
but it isn't easy to guarantee that].  So, here goes (untested) -- put this
into your main rcfile:

 # If one of the prefixes is Re: or an equivalent, we want to end with one Re:.
 :0 # caret, asterisk, and second left bracket are literal
 * ^Subject:(.*\>)?\/Re[[*^:].*
 { SUBJECT=$MATCH FOUND_A_RE=yes INCLUDERC=/path/to/.stripsubjectrc }

# A relative path is also acceptable; it will be assumed to start from
# $MAILDIR.

 :0E # Otherwise, we want to end with no prefix at all.
 * ^Subject:(.*\>)?\/(FW|Sv):.*
 { SUBJECT=$MATCH FOUND_A_RE INCLUDERC=/path/to/.stripsubjectrc }

Now, .stripsubjectrc should be a separate file, looking something like this:

 :0fwh # Did the last recursion finish the job?  Then do the fix and return.
 * ! SUBJECT ?? ^^(FW|Sv|Re(\[[0-9]*]|\^[0-9]*)?):[     ]*\/[^  ].*$
 | formail -I"Subject: ${FOUND_A_RE:+Re: }$SUBJECT"

 :0E # $MATCH is now one prefix shorter, so try again.
 { SUBJECT=$MATCH INCLUDERC=$_ }

Recursion depth is limited by the number of file descriptors your kernel
will allow, but that shouldn't be a problem if the prefixes are not allowed
to build up in the first place.

<Prev in Thread] Current Thread [Next in Thread>