At 22:43 2007-01-04 +0800, DR. Lee - NS3 wrote:
I had meant to critique (for the purposes of educating) some of the logic
in your originally posted recipe:
:0
* ^To:.*<.+>
* $ ? grep ^To:.* |gawk -F '<' '//{print $2}' |gawk -F \> '//{print $1}'
|grep -f to_list
{
[snip]
So, these conditions run and let's say the address is successfully
extracted, but doesn't match the to_list, you'll fall through to the next
recipe - which isn't for bracketed To:, but will attempt to extract the
header just the same.
* ^From:.*<.+>
* $ ? grep ^From:.* |gawk -F '<' '//{print $2}' |gawk -F \>
'//{print $1}' |grep -f from_list
This sub-recipe condition launches off assuming that the From: header will
in fact be formatted with brackets just like the To: was. If it ISN'T,
then the from_list won't be queried, and the message will drop through to
MATCH_TO even if the (unbracketed) address would have beeen found in from_list.
* $ ? grep ^To:.* |gawk '//{print $2}' |grep -f to_list
Now, if the address WAS in brackets, but just wasn't matched in the
to_list, we'll have fallen through to the second recipe group, which will
process the To: header as if it didn't have encapsulating brackets -- even
if in fact, it did. Which means we suffer all the extraction again, but
fail to get a token we should expect to find in the to_list file, so this
condition predictably fails (when the address is bracketed).
* ^From:.*
* $ ? grep ^From:.* |gawk -F '<' '//{print $2}' |gawk -F \>
'//{print $1}' |grep -f from_list
! $MATCH_BOTH
Curiously, this inspection of the From: again expects it to be bracketed,
even though it is within an outer condition for an unbracketed (though not
confirmed to be unbracketed) To:. Uh, so you have NO support for the
entirely legal address syntax shown on the From: header of the messages
I've been sending to the Procmail list for the past (gaak!) 11+ years.
Ok, so I haven't changed my posting style much in over a decade, and that
certainly doesn't mean it is predominant - but it does remain LEGAL
formatting. Lest you think of me as some lone loon, the following
significant procmail contributors have at some point used the same From:
formatting: DWT, TJL, and (drumroll ...) SRB.
If you review the recipe I posted, you'll see that I use an extraction
which passes through formail, which helpfully strips the address of comment
tokens and whatnot, reducing it to a simple address. No brackets, no muck.
:0
* ^From:.*
* ? grep $MATCH -f from_list
! $MATCH_FROM
Now, if neither of the To: conditions matched, we go to extract and check
the from address by its lonesome - but, er, only in it's unbracketed
form. If in fact it was bracketed, this won't match against the
file. Well, and that's assuming you'd properly extracted a MATCH in the
first condition: it is devoid of the \/ match construct (which would still
grab the field complete with comments). The grep operation therefore will
be searching from_list for whatever MAY have been in $MATCH from some prior
rulset somewhere in your procmailrc.
:0
! $MATCH_NEITHER
I expect a lot of messages would have delivered here which were not
intended to, based on the above issues.
In the worst-case scenario, your rulesets would invoke a
grep|gawk|gawk|grep, fail on that, then hit the second recipe and do a
grep|gawk|grep, fail on that, then fail on the final grep of the From:
(which itself isn't good), only to fall through to MATCH_NEITHER. OR, fail
the first sequence, match the second, and then perform a
grep|gawk|gawk|grep, and either fail or succeed with that (which would be
the most CPU intensive: 6 greps, 5 gawks, and STILL not likely to properly
match a fair number of messages).
That's a LOT of processes.
My offered approach invoves a pipe to formail for the CLEANFROM extraction,
and an echo|sed|tr pipe for the To: extraction and cleaning (and this
pipeline is pretty lightweight, unlike even a single invocation of
gawk). Then two singular grep operations (no shell pipelines). In my
environment, CLEANFROM is executed for all messages anyway, because it's a
useful extraction, used in various places (that's why it is in my
sandbox). Bottom line: my recipe will run about half as many processes,
and will do so quite consistently (as it isn't a series of alternate forms
to accomodate different input formats) - that is, whether an address is or
isn't in your files, the processing power necessary to check will be quite
consistent. With your approach, if they are in it with the first form of
formattting, they may match with _just_ 4 greps and 4 gawks (8 processes,
not including shells). If they're in it with the second formatting (or not
at all), it'll be more processes (nevermind accurracy of the
expressions). If you get a lot of mail, all those cycles add up.
The issues outlined above don't apply to the solution I offered yesterday
(though certainly in the process of testing it, you might find some other
issues). I offer the above criticisms so that you might review them and
see some of the errors in the original implementation which prevented it
from functioning as you had hoped.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail