procmail
[Top] [All Lists]

Re: Chinese-spam filter

2000-02-12 19:46:05
On Sat, 12 Feb 2000 23:45:56 +0200, Liviu Daia 
<Liviu(_dot_)Daia(_at_)imar(_dot_)ro>
wrote:

    You don't happen to speak French, do you? :-)
  No.  I assume you're referring to high-bit accented characters
used in French or other languages.  Non-English use of "western"
charactersets can be accomadated in my filter with a little bit
of work.  That issue is addressed on my website...

  1) draw up a list of valid highbit characters in required for
email in your language. I'll use CHR(160) as an example.

  2) delete those highbit characters from the binary listing
 * 20^1 [################################]

  3) subtract the count given for Quoted-Printable version, e.g.
 * -20^1 =A0

  When I first started started getting Chinese spam, I counted
all highbit characters from CHR(128) on up.  After accumulating
a larger sample of Chinese spam (it only took a month<g>) I saw
that characters with ascii values between 128 and 159 weren't
being used.  I cut that range out.  I know it's used for
accented French characters in codepage 437.  This further reduces
false positives.

-- 
Walter Dnes <waltdnes(_at_)waltdnes(_dot_)org> http://www.waltdnes.org
SpamDunk Project procmail spamfilters.
A picture is worth a thousand words; unfortunately,
it consumes the bandwidth of ten thousand words.

<Prev in Thread] Current Thread [Next in Thread>