At 10:26 2001-11-29 -0600, David W. Tamkin wrote:
Sean suggested to Timothy,
| * REPLYTO ??
(\<|^)(joe(_at_)aol\(_dot_)com|ed(_at_)msn\(_dot_)com|john(_at_)yahoo\(_dot_)com)(\>|$)
\< and \> already match newlines, so alternating them with ^ and $ is
unnecessary.
While it applies to the trailing newline, it DOES NOT with the leading
anchor, since there is no _newline_ present at that end of the string
(you're welcome to make a test script to demonstrate this to
yourself). Since formail typically extracts a space in front of the
returned header, the WORD BREAK matches (on a space, not a newline). If
you've extracted the header as a raw address without the leading space
(perhaps because you plan to do something else with it), say, like follows:
SENDER=`formail -b -xFrom:`
# Strip leading whitespace from the sender
:0
* SENDER ?? ^[ ]*\/[^ ].*
{
SENDER=$MATCH
}
You will need the BOL anchor in the rule to ensure you're anchoring in the
event that the match text is at the beginning of the string. FTR, a zero
or more (or alternatley or'ing with a null string) defeats the purpose,
since if there IS something immediatley preceeding the text, it'll still match.
However, since formail -r will never extract more than one
return address, Holger's recommendation to use ^^ at each end will work, and
since ^^ will not match a hyphen or a period, it's preferable.
This holds true if you wish to use the regexp specifically for the envelope
address and the envelope address ONLY. As I believe I explained, other
addresses (if you extract the "From:" for instance) may have additional
crud around the address.
I *DID* point out that rolling out the \< regexps and adding additional
characters to the exclusion would improve the matching, I just didn't
expand that within the example myself, leaving it as an excercise for the
reader.
That would result in a rule similar to:
:0:
* SENDER ?? (^|[^-a-zA-Z0-9_.])($useraddrsexpression)($|[^-a-zA-Z0-9_.])
test.match
(so shoot me if I think continuing to include the EOL explicitly in the
regexp simply makes it clearer)
The astute reader will recognize that the suggestion expansion of the \<
and \> macros with the addition of '.' and '-' characters makes them the
SAME as the subexpression used in the ^TO_ macro. Coincidence?
This for instance will ensure that a firstname-lastname(_at_)domain or
firstname(_dot_)lastname(_at_)domain doesn't match on lastname(_at_)domain, and that
user(_at_)domain(_dot_)com(_dot_)net or user(_at_)domain(_dot_)net-com(_dot_)com doesn't match when you're
looking for user(_at_)domain(_dot_)com
My apologies if my example wasn't optimized for the limitations of the
original expression - my intent was to offer a rule which could be used
successfully on a wider variety of input data.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail