procmail
[Top] [All Lists]

Re: Chinese-spam filter

2000-02-14 03:08:03
On Sat, 12 Feb 2000 21:37:40 -0500, Walter Dnes
<waltdnes(_at_)waltdnes(_dot_)org> wrote:
  When I first started started getting Chinese spam, I counted all
highbit characters from CHR(128) on up. After accumulating a larger
sample of Chinese spam (it only took a month<g>) I saw that
characters with ascii values between 128 and 159 weren't being
used. I cut that range out. I know it's used for accented French
characters in codepage 437. This further reduces false positives.

I don't think anybody in their right mind ever would have used CP437
for Internet mail, even in those days when some people would use CP437
on their own computer locally. The standard way to intercommunicate
back in those days was to substitute 7-bit characters inside the ASCII
range -- see the ISO-646 set of standards for some breathtakingly ugly
examples.

Th[s [s \ c]nstructed ex\mple but [t sh]uld g[ve y]u s]me [de\ ]f wh\t
pe]ple put up w[th [n th]se d\ys.

(You'd often have a localized terminal which displayed your local
accented characters properly, but instead had no facility for
displaying the [\] characters at the same time.)

You certainly see people emit CP437 or CP850 or some weird Mac
character set occasionally (improperly tagged, or not tagged at all)
just like you see people send text/html nowadays. Still not the way to
do it (and in the heyday of 7-bit hardware, receiving random 8-bit
data could screw things up badly. This is why the 128-159 range is
undefined in ISO8859 -- if you're using 7-bit equipment which simply
discards the 8th bit, those bytes turn into control characters).

Ah, well. Back to your regular scheduled program.

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition

<Prev in Thread] Current Thread [Next in Thread>