ietf-asrg
[Top] [All Lists]

RE: [Asrg] How to defeat spam that uses encryption?

2003-03-31 18:20:02
That implies that AOL's content filter could be improved.  You must
decode QP and Base64 before computing hashes, because there are MTAs
that add and remove at least QP and I think sometimes Base64 encoding.
If you don't always decode, you can't get consistent signatures.

In $#(*#$'s name why?

The more stupidity they put into the mail servers the less well the
system works. 

Base64 exists for one reason alone - because stupid MTAs had a 
habit of molesting the messages they were transmitting. In most
cases for very shortsighted reasons, like wrapping the text at
80 chars so it didn't overflow the screen the vt100 terminals
we all use today.

Another case of that problem is that if your filter system works in
both the MTA and the MUA, then you must decode QP and B64 whenever you
see it and especially in the MTA, because when you see it in the MUA,
it will be decoded.

Quite so, it should not be a biggie for a textual filter to have
something preprocess the messages to extract the text. It might
also make sense to use filtering to detect F-R-E-E and analogues,
there is actually quite a small set of these and if present they
are an almost foolproof spamdicator.

The one thing I would caution against however is ONLY doing the
text based analysis. Just as F-R-E-E is a spamdicator a word
with an embedded comment is as well. So don't loose that info
in the architecture.

It should not be necessary to parse the HTML, just strip it 
out with an FSR, it only takes about ten states. But instead 
of passing just words to the text filter pass tagged words:

 "Need free, F-R-E-E Vi<!--stupidstupidstuff-->agra"

Becomes (with possible scoring in square braces)

 <Normal>         need         [score 0]
 <Normal>         free         [score 5]
 <DashDisguise>   free         [score 20]
 <SplitComment>   viagra       [score 50]

The attempted disguise increases the probability it is spam.

javascript and such should be stripped out, converted into 
canonical form and compared with a fingerprint database of
known spamdicators. Even if you don't execute it or allow
it through your mail system you should check it for
spamdicators.

Equally, it was very frustrating that at the MIT conference on
solving spam with Bayesian filtering, analysis of the headers
was virtually ignored.


                Phill
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg