ietf-asrg
[Top] [All Lists]

Re: [Asrg] Re: "worm spam" and SPF

2004-12-09 13:09:55
Regarding HTML and MIME obfuscation tricks: 

I still don't agree that you are proposing things that aren't easily
evaded by determined spammers (explained below). 

I'll deal with those points to them as you raise them.

When asked about mislabeled attachment types, you suggest (unless I 
misunderstand) that you can simply ignore the problem...

I'm not IGNORING anything.  But that doesn't mean that you necessarily have to 
believe everything that E-mail tells you, either.  :-)

...and scan the body directly for the kind of regular expressions associated 
with HTML tags and remove them.

Not quite.

First of all, I am *not* using "regular expressions" (a basically braindead 
excuse for pattern matching, which dates back to the days of early, primitive 
Unix systems and mechanical teletype terminals).  SNOBOL and SPITBOL 
(programming languages;  SPITBOL is to SNOBOL sort of what Turbo Pascal is to 
ordinary Pascal) allow for MUCH more sophisticated pattern matching than is 
possible using reg-ex type patterns.

Besides destroying (and in the process subtly breaking) the message
contents, which has serious user privacy issues, 

Well, yes and no.

Agreed that if the mail is PGP-signed or something, then changing the mail 
contents will (of course) show up as a "changed" message body.  Frankly, that 
is 
a price that *I* am very willing to pay... a lot of the stuff that I get in a 
lot of my incoming E-mails is simply repetitive and annoying, and I'd rather 
simply have those kinds of things just disappear from the mail that arrives 
here.  Obviously, a different recipient might make different choices than I 
would, and that's part of why my system is designed for the RECIPIENT to be 
able 
to control, in a finegrained way and based on who the sender was, what it does 
and doesn't do.

...it won't stop all HTML from bypassing the filter.

Maybe, although:

  1)  it will prevent MANY kinds of (recognizable) HTML from being passed 
(including, hopefully, kinds that are likely to be dangerous or even just 
confusing);

  2)  it can transform the remainder so that ANY OTHER program is unlikely to 
recognize it as HTML and thus act on it (for example, by changing pointy 
brackets to curly brackets).

For example, message parts can be encoded in several formats
(UUENCODE, Base64, Quoted-Printable etc) with arbitrary levels of
nesting. (e.g. a message/rfc822 containing a message/rfc822 containing
a message/rfc822 containing a message/rfc822 containing a
message/rfc822 containing..., with each layer encoded differently. And
the very first layer might be labeled a GIF file.)

Of course, and there's all manner of ways to encode information (including 
steganographic encryption of information, such as for example sending a 
pornographic JPEG photo also containing on the nightstand table a clock where 
the hands point at 11:17:32 which time maybe has some secret meaning to the 
intended recipient).  You're NEVER going to be able to prevent the transfer of 
all kinds of information.  No point in even trying.

Fortunately, one doesn't HAVE to get that paranoid to essentially solve the 
problem, since MUAs aren't all that aggressive and suicidal in ferreting out 
dangerous HTML to interpret, either.  :-)

In my current experimental incoming mail filter, I use a recursive subroutine 
to 
deal with nested encodings and message parts.  I suppose I could even 
(trivially, in fact) limit that routine to only support a limited nesting 
depth, 
but I suspect that most mail clients would crash on such mails long before my 
mail filter would.  :-)

Whatever regular expression for an HTML tag you come up with, it can
easily be made unrecognizable. 

Sure, but it can also in the process be made unrecognizable to MUAs, too.  

...Even the interpretation of HTML tags
can be redefined on-the-fly if it comes to that. 

Probably, but again:

  1)  the USE of such types of stuff is prima facie evidence of an E-mail 
having 
something to hide;

  2)  such tricks are of little value if they confuse or break MUAs too;

  3)  translating pointy brackets to curly brackets (or square brackets or 
something else) will also effectively "neuter" such HTML, such that MUAs won't 
try to process it;

  4)  it's relatively easy to (again, by default) simply say "NO HTML, period" 
and divert offending mail.

But say you keep up
to date with tricks designed to make a complex payload look innocuous
to simple minded filters, then you are on the losing side of such an
arms race, because a spammer need only change their email, while you
need to patch your software with new regular expressions and redeploy
it to all the customers every time.

Well, "patch" isn't really necessary.  :-)  It's rather easy to add new stuff 
to 
SNOBOL/SPITBOL programs, including at run time.

But again, that's why one doesn't just look for FIXED limited number of 
specific 
things. 

If one simply bans (default case, for unknown senders) *all* attachments and 
*all* HTML, then it's pretty hard to argue that they'll figure out some new 
kind 
of HTML (but if and when they do, then one MIGHT have to tweak the filter a 
little bit).  If it doesn't look enough like HTML to be recognized by the MUA, 
then it clearly doesn't have to be recognized by the filter, either.

Note also that it is straightforward for spammers to deduce the checks
made if they have access to your software (as they invariably will if
it becomes widely deployed), so there is little point in not
discussing specific parsing techniques publicly. 

Fair enough, although it's pretty extraneous to discuss them publicly at this 
time.  As I've said, the current implementation is "experimental" and like all 
such software, a work in progress.... which I modify and improve from time to 
time as that seems necessary.

...It only makes discussion imprecise and harder to see any flaws.

The important thing is NOT whether there are "flaws" at the lowest level (and 
undoubtedly there are, since all nontrivial programs contain bugs or at least 
opportunities for improvement).  At this point we ought to be talking concepts 
and approaches, rather than getting bogged down in pointless minutae and detail 
which in any case is going to be implementation-dependent.

Some direct points:

You argue that perhaps the most important overall function... 

I don't know that I would characterize it that way, although I *do* believe 
that 
it's sorta silly to try to address the spam problem while ignoring the kinds of 
ridiculously suspect worm/virus stuff that clueless users naively click on.

...is to block the spread of viruses, worms and zombies, as these are the 
current enabling technology. 

Again, let's not say "the" (which implies one).

If so, you should address that problem directly, as it has much wider scope 
than the "attachment" problem.  

I think it makes sense to attack the attachment problem DIRECTLY, and HEAD ON, 
since it is important NOT ONLY JUST to worms/viruses, but ALSO for spam evasion 
of content filters (e.g. text-as-image or even just content-as-image).

Blocking attachments, if widesread, will only achieve that the payload
is moved from the email body to an external server. 

That's fine.

Users are then tricked to open an external connection...

OK, but at least it's not going to be something that they just click on (again, 
by denying HTML, and hopefully by implementing suitably dire-sounding warnings 
when they try to follow any other links to external executables, whether EXE 
files or SCR files or DLL files or ActiveX or whatever).

Hey, we DO agree that "social engineering" can result in people doing pretty 
stupid things (like giving their secret passwords because someone calls on the 
phone and asks for them, etc etc) but at least we can offer REASONABLE 
safeguards and "are you SURE?" type things to at least make them have a second 
thought before proceeding with such things.

A person who is DETERMINED to sink their own ship, of course, CAN still do 
that, 
and at some point one simply has to cut the rope and let them go.

...which downloads the malware in any of a wide variety of ways, and still 
sends spam from then on. 

Right.  It's important to at least make that less "encouraged".  That's one 
good 
reason for also (by default, from unknown senders) getting rid of HTML, which 
tends to encourage (and conceal/misrepresent) external links.

Meanwhile, in the process you destroy the user's reasonable
expectation that their email is delivered as-is, unless they are in
some first class relationship with you.

If you don't know me (and even if you DO) you do NOT have any right to expect 
that I will even receive or choose to read your mail AT ALL, let alone without 
my modifying it beforehand to suit MY tastes.  You lose your right to its 
absolute integrity as soon as you seal it up and send it to someone else.

Perhaps the key to your point, though, is the word "delivered".  And I suppose 
that your point is okay, since it _is_ "delivered" (to my incoming mail 
processing filter!), and that filter (AS I HAVE INSTRUCTED IT) then chooses to 
modify the incoming mail according to rules *I* have established to help make 
it 
acceptable to me, before I need to look at it.  Perhaps this is another good 
reason to implement the filter at the recipient end, rather than somewhere 
enroute.

Another issue is the use of your system in conjunction with a content
filter. If you remove/modify the mail content before passing it to a
content filter which is expected to handle the hard cases, you may be
shooting yourself in the foot. Modern content filters often have many
rules which are optimized to work together, but are not necessarily
optimized to work on mangled email.

Perhaps so, and what may end up happening is that content filters will be 
simplified and re-engineered to make them faster and more tailored to use 
within 
a framework such as I propose.  (Although some of those "hard cases" might 
still 
get through, from "somewhat-trusted" senders).  Current content filters usually 
presume that they are getting E-mail "raw" and therefore have to handle cases 
that might be filtered out already by the time mail would get to them through 
my 
filter.

A few points about "Bayesian" systems: 

To my knowledge, no successful attack has been performed on such
systems yet. 

Depends on what you call an "attack", but certainly an awful lot of spam 
contains bogus (random or unrelated) stuff that's designed to confuse or evade 
such types of filters.

There is a lot of garbage in mail to try to pass through
the statistical filtering, but just like you look for nonsense tokens
as an indicator of spam on a case by case basis, such nonsense tokens
if present easily tip the balance toward spam in a statistical filter,
automatically.

Perhaps, and we agree that a good program can detect certain types of such 
stuff. but at SOME point the spam E-mail in question is going to look EXACTLY 
like a regular E-mail that you want to get, except for the spam content (which 
might be JUST a URL or a phone number or who knows what?)   

In some ways, these systems are a generalization of where you are headed.
For example, where you have code such as "if rule X is triggered or rule Y
is triggered" (with rules X and Y being statements about email structure or
presence of HTML etc), a Bayesian system will put weights on rule X and rule
Y, combining the weights to obtain a belief about the message. But that is
for another discussion.

Fine.  In any case, it is POSSIBLE to create spam E-mail that looks just like 
legitimate E-mail, at least within statistical uncertainties.  There are limits 
at what can be achieved going down that path (but that doesn't necessarily mean 
that it's not worth going there, if there's useful progress there nevertheless).

Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support free and fair US elections!  http://stickers.defend-democracy.org
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg