procmail
[Top] [All Lists]

Re: How to do high-bit characters in procmail regexp ?

1999-12-02 07:57:24
era eriksson <era(_at_)iki(_dot_)fi> writes: 
On Thu, 02 Dec 1999 03:02:07 -0500, Walter Dnes
<waltdnes(_at_)waltdnes(_dot_)org> wrote:
...
 >  :0HB
 >  * -40
 >  * 1^1 [X-Y]
 >  | formail -A "X-Reject: High-bit character set in email"

The correct syntax here would be -40^0 on the first condition. Other
than that, I think this ought to work, although I would perhaps prefer
something which counts the high-bit characters as a percentage of the
whole. In Finnish text, for example, accented characters can easily be
several per cent of a message and a long message might easily be more
than 4000 characters (although I find typical messages to be in the
range 1500-3000 bytes, headers included).

If you can read Finnish, then you would exclude Finnish accented characters
from the counts. (If you can't read Finnish you might want to treat
Finnish messages a spam anyway...)

To calculate percentages, how about:

:0HB
* -400
* -1^1 .
* 10^1 [X-Y]
| formail -A "X-Reject: High-bit character set in email"

This would reject messages with more than "10% plus 40 characters"
with the high bit set.

                        Martin

Martin(_dot_)Ward(_at_)durham(_dot_)ac(_dot_)uk http://www.dur.ac.uk/~dcs0mpw/ 
Erdos number: 4
Maintainer of the G.K.Chesterton web site: http://www.dur.ac.uk/~dcs0mpw/gkc/
Shortcuts: http://i.am/mw and http://i.am/gkc -- try them!
Vote against spam: http://www.politik-digital.de/spam/en/