On Thu, Aug 26, 2004 at 12:32:15AM -0400, Bob George wrote:
Chuck Campbell wrote:
* (^List-id:.*)\<(satalk|spamassassin)\>
Won't that match either:
^List-id: <satalk>
^List-id: <spamassassin>
but NOT:
List-Id: "SpamAssassin Users" <users(_at_)spamassassin(_dot_)apache(_dot_)org>
^^^^^^ ^^^^^^^^^^^
In other words, aren't you matching <satalk> or <spamassassin> --
missing either some wildcards between the matched words
(satalk|spamassassin) and the angle brackets (<>)?
No, Bob.
Chuck did not use angle brackets in his regex.
He used the procmail token symbol "\<" and "\>".
See "man procmailrc":
\< or \> Match the character before or after a word.
They are merely a shorthand for `[^a-zA-Z0-9_]',
but can also match newlines. Since they match
actual characters, they are only suitable to
delimit words, not to delimit inter-word space.
Angle brackets, had he used them, would not have been quoted.
One does not need to quote angle brackets in procmail.[1]
How about:
* (^List-id:.*)\<.*(satalk|spamassassin).*\>
It would work, but is no better. Moreover, it repeats Chuck's
bad habit of sticking everything in parens, which has already
bitten him at least once (see previous denouement of thread).
The words "satalk" and "spamassassin" happen to be unusual
enough that your suggestion, which loses some precision by
not having surround-word-boundaries, won't suffer. But if the
match were meant for words that are more common strings, then
your suggestion would be a step backwards. Example:
Imaginary list named "Archie & Veronica" whose email address
contains "archie(_at_)example(_dot_)com". If you write a regex to match
* ^List-id:.*\<.*archie.*\>
and I send you an email from my imaginary German list about the
ancient hierarchy among the demigods, and the list uses the German
word for "hierarchy" in its address (_Hierarchie_), even in the
plural (_Hierarchien_), it is going to get misfiled. (And _ur_ means
"ancient," more or less:)
List-Id: "Urheirarchien unter Demigoetter"
<urhierarchien(_at_)himmel(_dot_)de>
(Sorry, there really is a "himmel.de", but I don't want people to go
there. I just couldn't resist. )
Also, you did not need or want the trailing ".*\>", because it
is pointless.
So (coming back down to Earth) in the last example, Chuck's attempt
was closer than yours to being correct. We'd want"
* ^List-Id:.*\<archie\>
or
* ^List-Id:.*\<(archie|veronica)\>
or something.
* ^List-Id:.*\<(satalk|spamassassin)\>
--
dman
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail