On Fri, 7 Nov 1997 13:15:23 -0500, "Nevel, Simeon"
<Simeon(_dot_)Nevel(_at_)Schwab(_dot_)COM> wrote about ^FROM_DAEMON:
I'm having trouble understanding the RE *after* the "From "
but before the "secondary" header field (#1) (Postmaster,
daemon etc) and the one *after* the secondary header field (#2).
#1
([^>]*[^((_dot_)%(_at_)a-z0-9])?
I translate this to mean an Optional group of characters
consisting of zero or more characters that aren't a ">"
followed by a single character that isn't any of "(.%a-z0-9"
In the older copy of the manual I have here, this is just (.*[^(...])?
which would match any run of characters, followed by any character
which is not valid in an e-mail address (but not delve into
parenthesized coments). I believe the intent here is to skip any
unparenthesized comments, as in " System Postmaster Account " in the
line "From: System Postmaster Account <postmaster(_at_)site(_dot_)net>".
#2
(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?$([^>]|$)
^^^
That "\t " sequence *really* has me confused? Is the "\t"
sequence supposed to represent a tab? I thought the procmail
I think this is only to make it obvious to the reader that that is a
tab. The source (config.h) actually has a literal tab here.
PS. Am I correct in assuming that in a "character class"
(things between "[" and "]") that RE metacharacters *don't*
need to be escaped?
Yes. [$[^\] would match any one of the literal characters dollar,
opening bracket, caret, and backslash.
And here is the entire RE (for the context) with my comments):
(^
(Precedence:.*(junk|bulk|list)
|To: Multiple recipients of
|(((Resent-)?(From|Sender)|X-Envelope-From):
|>?From )
The indentation here is misleading; I'd write this
|To: Multiple recipients of
|(((Resent-)?(From|Sender)|X-Envelope-From):
|>?From)
([^>]*[^((_dot_)%(_at_)a-z0-9])?
Skip fluff, maybe. ("Maybe" as in "optionally". :-)
(Post(ma?(st(e?r)?|n)|office) Postoffice, Postman,
(and various other similar strings elided)
(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?$([^>]|$))
)
You'll notice the trailing expression here starts with a somewhat
similar character class to the one near the beginning. Also note that
several of these expressions are optional, i.e. governed by a ? after
the closing paren.
(([^).!:a-z0-9] End of e-mail address token
[-_a-z0-9] Another alpha token
)? ... or maybe not;
[%@>\t ] Address separator -- either
<address(_at_)(_dot_)(_dot_)(_dot_)> or
<address> or a bare address with whitespace
around it
[^<)]* Skip as long as we don't run into another
broketed address or end of comment
(presumably to prevent this from matching
inside parentesized comments in the first
place)
(\(.*\).*)? Skip optional parenthesized comments and
anything after them if found
)? ... or maybe not; maybe we just see an ...
$ ... end of line instead
([^>]|$) Uh, I should know what this is supposed to do,
but I can't quite remember what it's for. I
think it had something to do with continued
header lines ... Anyone?
Actually, it would be very nice if these expressions were in fact
documented somewhere ...
/* era */
--
Paparazzi of the Net: No matter what you do to protect your privacy,
they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>