Re: [Asrg] Re: "worm spam" and SPF

Just a few points below, as the last email was becoming so long
that it's hard to follow.

Regarding HTML and MIME obfuscation tricks: 

I still don't agree that you are proposing things that aren't easily
evaded by determined spammers (explained below). When asked about
mislabeled attachment types, you suggest (unless I misunderstand) that
you can simply ignore the problem and scan the body directly for the kind of
regular expressions associated whith HTML tags and remove them.

Besides destroying (and in the process subtly breaking) the message
contents, which has serious user privacy issues, it won't stop all
HTML from bypassing the filter.

For example, message parts can be encoded in several formats
(UUENCODE, Base64, Quoted-Printable etc) with arbitrary levels of
nesting. (e.g. a message/rfc822 containing a message/rfc822 containing
a message/rfc822 containing a message/rfc822 containing a
message/rfc822 containing..., with each layer encoded differently. And
the very first layer might be labeled a GIF file.)

Whatever regular expression for an HTML tag you come up with, it can
easily be made unrecognizable. Even the interpretation of HTML tags
can be redefined on-the-fly if it comes to that. But say you keep up
to date with tricks designed to make a complex payload look innocuous
to simple minded filters, then you are on the losing side of such an
arms race, because a spammer need only change their email, while you
need to patch your software with new regular expressions and redeploy
it to all the customers every time.

Note also that it is straightforward for spammers to deduce the checks
made if they have access to your software (as they invariably will if
it becomes widely deployed), so there is little point in not
discussing specific parsing techniques publicly. It only makes
discussion imprecise and harder to see any flaws.

Some direct points:

You argue that perhaps the most important overall function is to block
the spread of viruses, worms and zombies, as these are the current
enabling technology. If so, you should address that problem directly,
as it has much wider scope than the "attachment" problem.  

Blocking attachments, if widesread, will only achieve that the payload
is moved from the email body to an external server. Users are then
tricked to open an external connection which downloads the malware in
any of a wide variety of ways, and still sends spam from then
on. Meanwhile, in the process you destroy the user's reasonable
expectation that their email is delivered as-is, unless they are in
some first class relationship with you.

Another issue is the use of your system in conjunction with a content
filter. If you remove/modify the mail content before passing it to a
content filter which is expected to handle the hard cases, you may be
shooting yourself in the foot. Modern content filters often have many
rules which are optimized to work together, but are not necessarily
optimized to work on mangled email.

A few points about "Bayesian" systems: 

To my knowledge, no successful attack has been performed on such
systems yet. There is a lot of garbage in mail to try to pass through
the statistical filtering, but just like you look for nonsense tokens
as an indicator of spam on a case by case basis, such nonsense tokens
if present easily tip the balance toward spam in a statistical filter,
automatically.

In some ways, these systems are a generalization of where you are headed.
For example, where you have code such as "if rule X is triggered or rule Y
is triggered" (with rules X and Y being statements about email structure or
presence of HTML etc), a Bayesian system will put weights on rule X and rule
Y, combining the weights to obtain a belief about the message. But that is
for another discussion.


-- 
Laird Breyer.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg