procmail
[Top] [All Lists]

Re: Brain dead Re^2 killer?

1997-07-24 18:54:00
Ben Stuyts <benst(_at_)terminus(_dot_)stuyts(_dot_)nl> writes:
On Thu, 24 Jul 97, process(_at_)qz(_dot_)little-neck(_dot_)ny(_dot_)us (Eli 
the Bearded) wrote:

# Filter "Re[5]:"/"Re^3:" and the like out of Subject:
:0fh
*  ^Subject: *((Re: ?)*Re[[^]|Re: *Re:)
| perl -pe 's/^subject:\s*(?:re[][^\d]*:\s*)+/Subject: Re: /i;'

Yes, this looks like it would solve my problem. But there seems to be a  
problem. From my log file:

procmail: Match on "^Subject: *((Re: ?)*Re[[^]|Re: *Re:)"
procmail: Executing " perl -pe 's/^subject:\s*(?:re[][^\d]*:\s*)+/Subject: Re:
 /i;'"
/^subject:\s*(?:re[][^\d]*:\s*)+/: ?+* follows nothing in regexp at  
/tmp/perl-e003860 line 1.

Any idea what's happening? I'm not versed in perl. This is on FreeBSD  
2.2-Stable, Procmail 3.11p7, perl 4.0.

The regexp was written towards perl5.  To make it perl4 compatible,
remove th "?:" that's after the open paren, i.e.:

        | perl -pe 's/^subject:\s*(re[][^\d]*:\s*)+/Subject: Re: /i;'


(The "?:" tells perl not to bother save what was matched inside the parens
into $1.  It's can be necessary in some situations, but here it's just an
optimization.)


There's one other problem that this script could tackle. My mail reader also  
has trouble sorting with "RE: " in stead of "Re: ". But maybe that's better  
dealt with in a separate recipe, as it would need to be case sensitive.

I'd just go ahead and use the same perl action for both recipes.  That'll
save processing in cases where both problems are present.  As for which
case to check for first, the RE: case is likely to be faster to match, so
I'd test it first:

        :0 fhD
        * ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]:.*RE:
        | perl -pe 's/^subject:\s*(re[][^\d]*:\s*)+/Subject: Re: /i;'

        :0 fhE
        * ^Subject: *((Re: ?)*Re[[^]|Re: *Re:)
        | perl -pe 's/^subject:\s*(re[][^\d]*:\s*)+/Subject: Re: /i;'

The first condition is probably overkill: sendmail at least will always
canonicalize headers to all lowercase except for the first character and
any letter after a minus sign.

Yes, you could get it down to only one copy of the perl action, but it
would be no faster, and I don't think it would be any clearer, so what's
the point?


Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>