On Sat, 12 Feb 2000 23:45:56 +0200, Liviu Daia
<Liviu(_dot_)Daia(_at_)imar(_dot_)ro>
wrote:
You don't happen to speak French, do you? :-)
No. I assume you're referring to high-bit accented characters
used in French or other languages. Non-English use of "western"
charactersets can be accomadated in my filter with a little bit
of work. That issue is addressed on my website...
1) draw up a list of valid highbit characters in required for
email in your language. I'll use CHR(160) as an example.
2) delete those highbit characters from the binary listing
* 20^1 [################################]
3) subtract the count given for Quoted-Printable version, e.g.
* -20^1 =A0
When I first started started getting Chinese spam, I counted
all highbit characters from CHR(128) on up. After accumulating
a larger sample of Chinese spam (it only took a month<g>) I saw
that characters with ascii values between 128 and 159 weren't
being used. I cut that range out. I know it's used for
accented French characters in codepage 437. This further reduces
false positives.
--
Walter Dnes <waltdnes(_at_)waltdnes(_dot_)org> http://www.waltdnes.org
SpamDunk Project procmail spamfilters.
A picture is worth a thousand words; unfortunately,
it consumes the bandwidth of ten thousand words.