Re: [Asrg] Re: "worm spam" and SPF

On Dec 06 2004, gep2(_at_)terabites(_dot_)com wrote:

 What my current mail filter here does is to first strip
HTML-burdened alternative attachments, and then to further strip the
majority of HTML tags that it finds in "plain text" parts of the
message as well.


How does the scheme deal with mislabeled attachments? Spam messages
don't play by the rules. And some popular mail programs virtually
ignore attachment types altogether and use the filename extension, or
rely on defaults known for the target mail reading software.

How does the scheme identify HTML in plain text? Does it correctly
recognize a discussion about HTML tags from a text using HTML markup?

The only sure way to protect users against HTML attachments is to

prohibit them from using mail software which displays HTML,

There's no point in "prohibiting users" from doing most ANYTHING.
The whole point is that users should be able to do anything they CAN
do now, but encouraging them to do it selectively, for those cases
where they trust the senders.


But as I understood it, your scheme is supposed to block the receipt
of email containing HTML from unknown/untrusted senders originally,
and only allow such email through after the sender is trusted. If you
let HTML through to begin with (assuming you can always identify it),
it's much less effective?

What you perhaps don't realize is that an attachment marked plain

text but containing HTML tags is often displayed as HTML by mail
reading software anyway. Some software reads plain text, looking for
anything that resembles a web address and generates a clickable URL
(thereby turning the plain text into HTML).

Fine, but at least in that case the URL will NOT be spoofed or
misrepresented, right?  :-)


The art of misrepresenting URLs to the public is called phishing, and
is fairly well developed ;-)

Clearly, but the point is to force the spammers into areas that are
VERY, VERY gray (and which for most users, simply don't exist at
all).


The fact that most users aren't in the gray area is irrelevant. If
(your, any) scheme lets through too many spams, it is as useless as if
spams weren't being identified in the first place.

However, the amount of spam passing through depends solely on the spam
filter and how clever the spammers are. So discussions naturally
gravitate toward spammer tricks.

 I'm not naive enough to presume that ANY one approach will work for
EVERYBODY or for ABSOLUTELY ALL possible (past/present/future)
software.  I don't think that any other approach I've seen proposed
will protect so many people, so automatically, with so little
interference with their existing methods and systems, and with so
many other advantageous effects.


I'm not suggesting you are. However, current state of the art systems
achieve 99+ percent success, and for organizations with very large
userbases, even that isn't enough.  Unless your scheme can claim these
sorts of numbers for nearly all people to begin with, then it isn't
worth trying to convince people here. Most people on this list are
only interested in systems which can deal with vast numbers of
messages with little extra work.

http://www.usenix.org/events/lisa04/tech/blosser/blosser.pdf

Moreover, you might not realize that SpamAssassin is bundled with

bulk mail sending software precisely so that spammers can design
their emails against SpamAssassin and tweak them until they pass the
tests performed by SA, and only then start spamming.

Of course.  Most of those deceptions [mis-]use HTML or attachments.
That's one big reason for encouraging users to limit the use of such
features only to trusted senders.


I don't seem to remember whether you explained how to initially deal
with (untrusted) people who (unwittingly, incapably) send exclusively
HTML burdened messages.

 Well, it WOULD be sufficient to block "large" (as defined by the
recipient) messages coming from unknown senders.


It is debatable whether large attachments are in fact an indicator of 
friendly (ie nonspam) messages. Spammers tend to send small messages,
unless they're trying to be clever, while worms are as you pointed out
a bit larger. But the truly large messages are sent by clueless people who will
write two lines in Word, include a BMP image, then attach the whole
document to an empty email message and send the lot...

There are two avenues for worms under your scheme. Worms can get

smaller in size, or worms can stay the same size and spread via
trusted senders, by looking for regular correspondents in address
books etc. Fifteen years ago viruses were less than 1K in size,
there is no reason beyond lazyness why worms need to be around 25K.

All this use of heuristics can be evaded by various strategies.  If
worms shrink to less than 25K, it might be that a recipient will
shift their "allowed maximum unsolicited E-mail size" to 15K, or
10K, or maybe even 5K or something.  THAT IS *THEIR* CHOICE, and
they aren't constrained to maintain the SAME choice.


None of these suggestions is preventative, only reactive. You're
suggesting that attacks on your proposed spam defense should be
best handled by an arms race between the spammer and the user.
The spammers still have the initiative.

 > The devil is in the details, because the high level concepts in
which you describe the scheme do not map perfectly into the low
level concepts required for implementation.

There is no one SINGLE set of details required.  There is a wide
leeway in terms of how the various aspects get implemented, and in
fact this is an advantage...  it allows different companies to
differentiate their products, while at the same time creating
distinct products which probably will not share common potential
weaknesses that a spammer might exploit.


Right, various elements of what you propose (scanning for HTML,
blocking attachments) are widely implemented in many different
anti-spam products. One could argue that companies already have
several pieces of your puzzle and have mixed and matched them as they
like.

Some problems with whitelists:

Once you've whitelisted all your friends and colleagues, you depend on

how good *their* spam defenses are. If they receive a worm, it can
travel to you freely. 

No, ABSOLUTELY NOT, since my proposed "whitelist" is NOT just a simple yes/no 
whitelist.


You've talked about examining a limited set of properties such as existence of
attachments and of HTML tags. Your whitelisted friend may have a trojan or worm
which will send plain text advertisements to all people in his address book.

If some combination of tags and attachments can get through, it will
get through eventually.


  1)  Most of my "friends and colleagues" don't REQUIRE explicit whitelisting 
because the stuff they send me does not exceed the default (very safe) rules.


That's great for you, but these issues must be addressed when proposing a
spam solution for the masses.

  2)  The ones I *would* whitelist don't automatically receive "wide open" 
permissions.  I would allow individual senders to send me the type of stuff 
that 
I might expect THEM as an INDIVIDUAL to send.  NONE of them, for example, are 
likely to be permitted to send me (say) PIF attachments.  :-)   Likewise, 
VERY 
few (if any) would be whitelisted to send me (say) ActiveX or cookies or 
scripting.


How do you recognize a PIF attachment? What if it's inside a ZIP file?
Inside an LHA file? Inside a zipped HTML file containing an embedded
OBJECT tag with a particular class ID pointing to external data? There
are innumerable ways of sending malicious code with various levels of
obfuscation. Spammers don't use them because the simple stuff works
quite well for now.

 One of the advantages of my approach is that, in fact, people using
it would block VIRTUALLY ALL incoming worms and viruses, EVEN IF
they came (ostensibly) from people who they otherwise know and
trust.

And worm writers realize this, so it is one of their priorities.


You greatly underestimate the difficulty of evading the type of
permissions-list filter such as I'm proposing.  Perhaps you still
don't understand my approach; I'll be glad to try to explain it
further.


I think I sort of understand your approach. Your filter scans an
email's MIME structure, looking for particular types of
attachments. It also scans for some simple tags such as <FONT>, <A>,
etc., in the plain text attachments.

[Not all email uses MIME, not all MIME using email is correctly
generated. Not all MIME attachments contain what they are labeled as
containing, not all mail software interprets attachments according to
the labels either, but others do]. 

If the sender is unknown, the mail is blocked if it contains the tags
you scan for or attachments labeled as forbidden. If the sender is
known, the mail is blocked if it contains tags or attachments which
are prohibited for that sender.

[It blocks mail from unknown people who use HTML or attachments or
large mail bodies, which is a sizable proportion of the email sending
population. People need to know enough about email structure to maintain
complex sets of permissions.]

Blocked mail can be inspected individually, possibly by grouping 
according to some heuristics based on header fields or message structure.

[For spam, header fields are both faked and randomized to thwart easy
grouping, as that would also make filtering easy. If you receive 500 spams
a day and wait a week to check them at your leisure, you have to wade
through 3500 messages looking for the elusive incorrectly blocked
email - that's still a chore, once a week, and the unfortunate sender
has been waiting for a response for a week].

Based on the above objections, I'm not convinced your system is
competitive (accuracy wise) with other systems for large
organizations. For individual use by power users, it appears to me to
be less work than custom keyword filters, but more work than personal
Bayesian filters.

This is just my opinion based on our discussion, of course. 
For a serious evalutation, nothing beats physical deployment at 
multiple locations, and you at least pass the test of using your own
system yourself.

-- 
Laird Breyer.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg