procmail
[Top] [All Lists]

Re: Regexp problem

2003-12-08 13:26:17
At 11:54 2003-12-08 -0700, Jon Brinkmann wrote:

        procmail: [1832] Mon Dec  8 10:11:51 2003
        procmail: No match on "[bcdfghjklmnpqrstvwxyz]{4,}"

So it seems that procmail regexp is broken, either in the interpretation
of the recipe or in the regexp routine .

No, you're trying to use the interval regexp operator which is unsupported by procmail.

Instead, try:

ENGLISH_CONSONANTS="bcdfghjklmnpqrstvwxyz"

:0
* $ B ?? ()\/[${ENGLISH_CONSONANTS}][${ENGLISH_CONSONANTS}]\
        [${ENGLISH_CONSONANTS}][${ENGLISH_CONSONANTS}]
{
        LOG="Sequence of four or more english consonants ($MATCH)${NL}"
}

Using a variable to hold the characters makes it a bit cleaner and certainly a bit more self-explanitory and updateable. One might even elect to remove 'y' from the collection since that is sometimes treated like a vowel in English.

Note the use of the match operator, which allows the log message to actually report WHAT it deemed to have matched. Run this in a sandbox, and you'd be enlightened.

I also tried the recipe:

: 0 B
* [bcdfghjklmnpqrstvwxyz][bcdfghjklmnpqrstvwxyz][bcdfghjklmnpqrstvwxyz][bcdfghjklmnpqrstvwxyz]
test.junk

But this caught all mail that contained any consonants!

This should match only on a sequence of FOUR or more. However, you're going to match variable names in source codes, acronyms (RTFM is a favourite) and base-64 code up the wazoo. Don't forget hex colour codes in HTML messages, such as "#FFBB00" and the like. Or, if you require something SIMPLE to trip on: "HTML" itself qualifies.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>