procmail
[Top] [All Lists]

Re: I'm sure I'm brain dead, but I can't see the problem in this recipe

2004-08-26 01:57:58
On Thu, Aug 26, 2004 at 12:32:15AM -0400, Bob George wrote:

Chuck Campbell wrote:

* (^List-id:.*)\<(satalk|spamassassin)\>

Won't that match either:

^List-id: <satalk>

^List-id: <spamassassin>

but NOT:

List-Id: "SpamAssassin Users" <users(_at_)spamassassin(_dot_)apache(_dot_)org>
                               ^^^^^^            ^^^^^^^^^^^
In other words, aren't you matching <satalk> or <spamassassin> -- 
missing either some wildcards between the matched words 
(satalk|spamassassin) and the angle brackets (<>)?

No, Bob.

Chuck did not use angle brackets in his regex.

He used the procmail token symbol "\<" and "\>".

See "man procmailrc":

       \< or \>  Match the character  before  or  after  a  word.
                 They are merely a shorthand for `[^a-zA-Z0-9_]',
                 but can also match newlines.  Since  they  match
                 actual  characters,  they  are  only suitable to
                 delimit words, not to delimit inter-word  space.


Angle brackets, had he used them, would not have been quoted.
One does not need to quote angle brackets in procmail.[1]


How about:

* (^List-id:.*)\<.*(satalk|spamassassin).*\>


It would work, but is no better.  Moreover, it repeats Chuck's
bad habit of sticking everything in parens, which has already
bitten him at least once (see previous denouement of thread).

The words "satalk" and "spamassassin" happen to be unusual
enough that your suggestion, which loses some precision by
not having surround-word-boundaries, won't suffer.  But if the
match were meant for words that are more common strings, then
your suggestion would be a step backwards.  Example:
Imaginary list named "Archie & Veronica" whose email address
contains "archie(_at_)example(_dot_)com".  If you write a regex to match

   * ^List-id:.*\<.*archie.*\>

and I send you an email from my imaginary German list about the
ancient hierarchy among the demigods, and the list uses the German
word for "hierarchy" in its address (_Hierarchie_), even in the
plural (_Hierarchien_), it is going to get misfiled.  (And _ur_ means
"ancient," more or less:)

   List-Id: "Urheirarchien unter Demigoetter" 
<urhierarchien(_at_)himmel(_dot_)de>
    
(Sorry, there really is a "himmel.de", but I don't want people to go
there. I just couldn't resist. )

Also, you did not need or want the trailing ".*\>", because it
is pointless.


So (coming back down to Earth) in the last example, Chuck's attempt
was closer than yours to being correct.  We'd want"


   * ^List-Id:.*\<archie\>
or
   * ^List-Id:.*\<(archie|veronica)\>

or something.

   * ^List-Id:.*\<(satalk|spamassassin)\>

-- 
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>