procmail
[Top] [All Lists]

Re: bug in ^TO_ macro: character '+' *is* allowed in emails

2004-11-26 03:49:26
On Thu, Nov 25, 2004 at 06:21:17PM -0800, Tristan Savatier wrote:

The ^TO_ macro assumes that the '+ sign is not allowed in email addresses,
but + is allowed in email addresses! (i.e. love+hugs(_at_)foo(_dot_)com is 
legal).

Your supposition above is simply not true.  The macro makes no such assumption.
Actually, I'm not exactly sure *what* it does do (see below); but what it
*does not* do is limit certain characters in your match.


Therefore the part [^-a-zA-Z0-9_.] in the ^TO_ macro should be replace by
[^-a-zA-Z0-9_.+]

this way, ^TO_hugs(_at_)foo(_dot_)com would not match on:
To: love+hugs(_at_)foo(_dot_)com

(currently it does match, and that's a bug).

It does match, and it's not a bug insofar as it's what the regex says.
It might be a bug in that maybe Stephen didn't think it would match -- I
don't know the answer to that.  See more below.


MISCELLANEOUS
       If the regular expression contains `^TO_' it will be substituted by
       `(^((Original-)?(Resent-)?(To|Cc|Bcc)|(X-Envelope
       |Apparently(-Resent)?)-To):(.*[^-a-zA-Z0-9_.])?)', which should catch
       all destination specifications containing a specific address.

Let's deconstruct that a bit.  I believe it says it's looking for
a header line that can start (anchored left) with any of those
header-y words in it; folowed by a colon (demarking end-of-headername-
field); followed by the grouping of: ".*" (anything at all!  And
here is where comes your addresses that match!), that being followed
by a class with a caret in front of it, which means "any character NOT 
in this class."

So the macro attempts to match your idea of an address, and it accepts
*anything at all* there, ending the search area only with the *next* 
zeroth or first occurrence of a, shall we say boundary, character.

So if we have, purely as an example:

 X-Envelope-To: Pooh Bear 
##***you___+++!your!address!com***###(_at_)somelist(_dot_)somedom [XYZ-list]

and you write:


 :0
 * ^TO_\/###[^o]*
 { FOO = $MATCH }


It will work (match positively on "###[^o]*" in the sample line.

 procmail: Matched "###(_at_)s"
 procmail: Match on 
"(^((Original-)?(Resent-)?(To|Cc|Bcc)|(X-Envelope|Apparently(-Resent)?)-To):(.*[^-a-zA-Z0-9_.])?)\/###[^o]*"
 procmail: Assigning "FOO=###(_at_)s"

As for whether that is the result that Stephen intended, I do not know.
I essentially never use the ^TO_ or ^TO macros in my coding, and haven't
looked at them all that closely before.  But frankly, I don't see what's
better or different about the above result than simply this:

  * (^all|those|header|choices):.*yourstring

So there may be a bug insofar as the man pages misstate what can be matched.
But the match on your plus sign is not an issue.

Finally, If anything were to be added to the macro, what I'd find useful would
be the searching of X-Original-To and Delivered-To along with the
present plain-ol' Original-To and X-Envelope.  However, and this is key,
you should keep in mind that the macros are nothing more than convenient
constructs often found to be occurring in email.  As such, their
presence in the code is nothing more than a shortcut tool you might consider
useful or helpful to whatever your specific regex needs are.  You can
always code around the macros by hand, however.

There have been list articles in the past similarly deconstructing
the ^TO_ and ^TO macros.

-- 
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>