procmail
[Top] [All Lists]

Re: filtering chinese characters

2001-03-20 09:09:20
On 20 Mar, Christoph Kukulies wrote:
| I'm receiving more and more unwanted chinese/taiwanese email.
| I thought of testing the subject for the occurence of three or more
| characters > \200 by a regeular expression of something
| like
| 
| [\200-\377]{3,} 
| 
| Would that work?

Nope. First problem is that procmail doesn't support the {n,m} syntax. 
If you want to specify at least 3, generally you have to repeat the
character class 3 times (i.e. [\200-\377][\200-\377][\200-\377] ).

That said, I'm not sure procmail supports the \200 octal notation
either. A couple of quick tests here with an older version (3.13.1)
don't work. It's been about a year since I had a recipe like this
implemented, but looking at that old rcfile I used the actual characters
inside the brackets. And I recall a thread some time back on this
list about how to enter 8-bit characters in various editors. Without
actually including the characters in the message and risking tripping
other members' filters, you want something like:

:0
* ^Subject:.*[\200-\377][\200-\377][\200-\377]
/path-to/file-o-spam

where the \200 and \377 above (and in the two examples below) are
replaced with the "real" characters. Hopefully without starting a
religious flame fest, in vim I can do that with <ctrl><v> then 128 and
<ctrl><v> then 255.

If you don't like that, or find you want to match more than 3 and it
becomes cumbersome, you could try:

:0
* ^Subject:\/.*[\200-\377]
{
   :0
   * -2^0
   *  1^1 MATCH ?? [\200-\377]
   /path-to/file-o-spam
}

Or if you are willing to drop the requirement that it be in the subject,
it can be simplified to:

:0
* -2^0
*  1^1 [\200-\377]
path-to/file-o-spam


-- 
                         /"\
Don Hammond              \ /     ASCII RIBBON CAMPAIGN
Raleigh, NC US            X        AGAINST HTML MAIL,
                         / \      AND NEWS TOO

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>