procmail
[Top] [All Lists]

Re: Et TO, procmail ?

1997-09-16 12:34:45
Vikas Agnihotri <vikas(_at_)insight(_dot_)att(_dot_)com> replied to my reply to 
him:
To: procmail(_at_)Informatik(_dot_)RWTH-Aachen(_dot_)DE
...
Cc: procmail(_at_)Informatik(_dot_)RWTH-Aachen(_dot_)DE

Hmmm. Standard SmartList setup will drop the second copy.

1. How does it handle the real-name part of the TO address? This can
contain pretty much any garbage according to RFC822 as long as the address
is enclosed between <...> ?
    From: "jester(_at_)fun(_dot_)house" <fool(_at_)aol(_dot_)com>
Will confuse a test for "^TO_jester@". 
What about 
From: someone(_at_)somewhere(_dot_)com <another(_at_)one(_dot_)com>

Will confuse the regexp.

Per RFCs, another(_at_)one(_dot_)com is the real email address here, right? 
So TO/TO_
should catch it and *not* someone(_at_)somewhere(_dot_)com, right? 

If you only test for "another(_at_)one(_dot_)com", it is not a problem. If you
are checking for "someone(_at_)somewhere(_dot_)com" you are out of luck.

Well, shouldnt things like (,),< and > be included in the reg-exp above
since they 99% of the time indicate the start of a ACTUAL email address?

I don't like the ^TO and ^TO_ macros for most things and typically use
stuff like this:

        ^(Resent-)?(To|CC):.*[< ]{address}([ >]|$)

It still can be confused, but the things that will cause problems
are fairly rare in practice. You might prefer something like this:

        ^(Resent-)?(To|CC):([^(]+([(].*[)])?)*[, <]{address}([, >]|$)

Which can correctly deal with

        To: (hatter(_at_)tea(_dot_)party) {address}
        To: (fake {address}) 
bill(_dot_)the(_dot_)lizard(_at_)the(_dot_)jury(_dot_)box
        To: Alice <alice(_at_)the(_dot_)croquet(_dot_)game>, "W. Rabbit (late)"
                <hare(_at_)small(_dot_)hole>, Gentle Reader <{address}>
        To: jabberwocky(_at_)vorpal(_dot_)swords(_dot_)r(_dot_)us, 
duchess(_at_)the(_dot_)croquet(_dot_)game,
                chesire(_at_)no(_dot_)where, {address}, 
dinah(_at_)meow(_dot_)org

It will still fail for 

        To: (fake <{address}>) mockturtle(_at_)tortoise(_dot_)edu

If someone is malicious enough to send you such mail.

3. What is the essential difference (again, in plain English, please!)
between TO and TO_ ?
The definition of the word boundary[...]
What is the assumption being made here?  Why will TO_address catch
everything to 'address' while TOword catch everything to 'word'?

^TOgryphon will match both of the following, while ^TO_gryphon will only
match the second.

        To: gryphon(_at_)lobster(_dot_)quadrille
        To: the(_dot_)gryphon(_at_)sleeping(_dot_)by(_dot_)a(_dot_)cliff

4. What in the world is (.*[^-a-zA-Z0-9_.])?   ?
Still cant get something. Wont the <space> after the to: make both ^TO and
^TO_ end there without going any further since the [^.....] above says end
the regexp at anything that does not contain alphanumeric, _ and . and -?
So the .* would match 0 atoms and [^-a-zA-Z0-9_.] would match the space and
thats it! What am I missing?

If you just have "^TO" or "^TO_", then in fact, the space wont even be
matched because of procmail non-greediness. It is the *context* of what
you put after the macro that will force the .* to slurp up a bunch of
text.

Elijah
------
I /dev/null dupes, no need to CC list posts.  It is not my responsibility to
prove to you my mail is not spam, if mail to you bounces it will not be resent.

<Prev in Thread] Current Thread [Next in Thread>