procmail
[Top] [All Lists]

How to do high-bit characters in procmail regexp ?

1999-12-02 01:06:52
  With all the foreign-language/foreign-character-set spam
hitting now, I figure that a good test for use by English-
speaking people is to flag any email with lots of high-bit
characters.  So how do we do it?  Here's a try.  Note that
[X-Y] would really have byte=>CHR(128) in place of "X" and
byte=>CHR(255) in place of "Y".  I don't think they'll
transmit to well, so I'm doing pseudocode here.

 :0HB
 * -40
 * 1^1 [X-Y]
 | formail -A "X-Reject: High-bit character set in email"

  The 40 margin allows email through if it's regular us-ascii
but has a few "Copyright/Registered/"Trademark" characters in
the text.  That should avoid false positives.  If procmail
doesn't recognize that sequencing, another option might be
the following (pseudocode again)

 :0HB
 * -40
 * 1^1 [abcdefg...etc]
 | formail -A "X-Reject: High-bit character set in email"

  Where "abcdefg...etc" is replaced by the 128 individual
characters in byte range 128..255.  Any comments, maybe
even from Philip, on whether procmail will choke on that?
I'm temporarily without shell access until I get shell on
my new ISP (a couple of days).  Interlog (bought out by
PSI) is getting rid of shell access.

-- 
Walter Dnes <waltdnes(_at_)waltdnes(_dot_)org>