On Thu, 02 Dec 1999 03:02:07 -0500, Walter Dnes
With all the foreign-language/foreign-character-set spam hitting
now, I figure that a good test for use by English- speaking people
is to flag any email with lots of high-bit characters. So how do we
do it? Here's a try. Note that [X-Y] would really have
byte=>CHR(128) in place of "X" and byte=>CHR(255) in place of "Y".
I don't think they'll transmit to well, so I'm doing pseudocode
* 1^1 [X-Y]
| formail -A "X-Reject: High-bit character set in email"
The correct syntax here would be -40^0 on the first condition. Other
than that, I think this ought to work, although I would perhaps prefer
something which counts the high-bit characters as a percentage of the
whole. In Finnish text, for example, accented characters can easily be
several per cent of a message and a long message might easily be more
than 4000 characters (although I find typical messages to be in the
range 1500-3000 bytes, headers included).
Another problem is that people might MIME-encode their messages. I get
some amount of Quoted-Printable Chinese spam. My University's Sendmail
setup automatically converts that to 8-bit text, so I never actually
see the QP message, but I imagine most sites don't have a setup like
/* era */
Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition