procmail
[Top] [All Lists]

Re: If VARIABLE = (this|orthis|orthat)

2001-11-29 12:41:28
At 10:26 2001-11-29 -0600, David W. Tamkin wrote:
Sean suggested to Timothy,

| * REPLYTO ?? 
(\<|^)(joe(_at_)aol\(_dot_)com|ed(_at_)msn\(_dot_)com|john(_at_)yahoo\(_dot_)com)(\>|$)

\< and \> already match newlines, so alternating them with ^ and $ is
unnecessary.

While it applies to the trailing newline, it DOES NOT with the leading anchor, since there is no _newline_ present at that end of the string (you're welcome to make a test script to demonstrate this to yourself). Since formail typically extracts a space in front of the returned header, the WORD BREAK matches (on a space, not a newline). If you've extracted the header as a raw address without the leading space (perhaps because you plan to do something else with it), say, like follows:

SENDER=`formail -b -xFrom:`

# Strip leading whitespace from the sender
:0
* SENDER ?? ^[  ]*\/[^  ].*
{
        SENDER=$MATCH
}

You will need the BOL anchor in the rule to ensure you're anchoring in the event that the match text is at the beginning of the string. FTR, a zero or more (or alternatley or'ing with a null string) defeats the purpose, since if there IS something immediatley preceeding the text, it'll still match.

However, since formail -r will never extract more than one
return address, Holger's recommendation to use ^^ at each end will work, and
since ^^ will not match a hyphen or a period, it's preferable.

This holds true if you wish to use the regexp specifically for the envelope address and the envelope address ONLY. As I believe I explained, other addresses (if you extract the "From:" for instance) may have additional crud around the address.

I *DID* point out that rolling out the \< regexps and adding additional characters to the exclusion would improve the matching, I just didn't expand that within the example myself, leaving it as an excercise for the reader.

That would result in a rule similar to:

:0:
* SENDER ?? (^|[^-a-zA-Z0-9_.])($useraddrsexpression)($|[^-a-zA-Z0-9_.])
test.match

(so shoot me if I think continuing to include the EOL explicitly in the regexp simply makes it clearer)

The astute reader will recognize that the suggestion expansion of the \< and \> macros with the addition of '.' and '-' characters makes them the SAME as the subexpression used in the ^TO_ macro. Coincidence?

This for instance will ensure that a firstname-lastname(_at_)domain or firstname(_dot_)lastname(_at_)domain doesn't match on lastname(_at_)domain, and that user(_at_)domain(_dot_)com(_dot_)net or user(_at_)domain(_dot_)net-com(_dot_)com doesn't match when you're looking for user(_at_)domain(_dot_)com

My apologies if my example wasn't optimized for the limitations of the original expression - my intent was to offer a rule which could be used successfully on a wider variety of input data.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>