ietf-asrg
[Top] [All Lists]

Re: [Asrg] Re: "worm spam" and SPF

2004-12-06 15:56:44
The problem in a nutshell is this: when your filter receives a mail
and analyses it to see if it contains prohibited content such as HTML
(to take an example), what criteria does it use? 

[snip]

There are some standards for when a mail contains HTML and when it
doesn't but they are often ignored by mail reading software, and often
ignored by spammers, and also ignored by legitimate senders depending on
how the programmers who wrote their mail programs interpreted the gray
areas in standards. 

So unless your filter uses the exact same heuristics about when a
message contains HTML and when it doesn't, there will be discrepancy
as above. Either you accept that there will be an amount of false
positives, or you accept that your users will see HTML content even
though your filter thought it only contained plain text.

The choice doesn't have to be a binary one, which seems to be what you're 
talking about.  The whole point is that my permissions list approach allows the 
user to specify what sorts of things they trust a given sender to send to them. 

What my current mail filter here does is to first strip HTML-burdened 
alternative attachments, and then to further strip the majority of HTML tags 
that it finds in "plain text" parts of the message as well.

The only sure way to protect users against HTML attachments is to
prohibit them from using mail software which displays HTML, 

There's no point in "prohibiting users" from doing most ANYTHING.  The whole 
point is that users should be able to do anything they CAN do now, but 
encouraging them to do it selectively, for those cases where they trust the 
senders.

It MIGHT have more to do with the proposed ASSOCIATED use of a good
antispam content filter (like Spam Assassin) but my proposal really
has only VERY limited need (itself) to interpret the contents of the
E-mail, beyond looking at attachments and HTML tags.

What you perhaps don't realize is that an attachment marked plain text
but containing HTML tags is often displayed as HTML by mail reading
software anyway. Some software reads plain text, looking for anything that
resembles a web address and generates a clickable URL (thereby turning
the plain text into HTML). 

Fine, but at least in that case the URL will NOT be spoofed or misrepresented, 
right?  :-)

Some software accepts even malformed HTML
tags, which aren't considered HTML by any standards.

My point is that when you say "looking at attachments and HTML tags",
you are glossing over a difficult problem, because while ordinary
users tend to produce mail where attachment types and HTML are clear,
spammers purposefully make those aspects as hard to identify as they can,
by exploiting gray areas.

Clearly, but the point is to force the spammers into areas that are VERY, VERY 
gray (and which for most users, simply don't exist at all).

I'm not naive enough to presume that ANY one approach will work for EVERYBODY 
or 
for ABSOLUTELY ALL possible (past/present/future) software.  I don't think that 
any other approach I've seen proposed will protect so many people, so 
automatically, with so little interference with their existing methods and 
systems, and with so many other advantageous effects.

Moreover, you might not realize that SpamAssassin is bundled with 
bulk mail sending software precisely so that spammers can design their
emails against SpamAssassin and tweak them until they pass the tests
performed by SA, and only then start spamming. 

Of course.  Most of those deceptions [mis-]use HTML or attachments.  That's one 
big reason for encouraging users to limit the use of such features only to 
trusted senders.

Furthermore, typical spam messages are 2-3K in length, well below
your proposed limit.

Yes, and that causes NO problems for my approach.  I'm NOT proposing
that SIZE be used to discriminate spam/nospam.  I *am* saying that,
along with antispam/antivirus/antiworm goals, it ALSO makes sense
that recipients be able to prevent abusively large messages from
being sent to them (and perhaps chewing up their limited-size
ISP-provided Inbox) from people they don't know and don't trust.
(Limiting the size DOES, however, provide an additional way of
controlling incoming worms (from untrusted/unknown senders), most of
which are significantly larger than 25K).

You are proposing that size be used for discrimination. What you are not
doing is proposing that it be used exclusively. 

Well, it WOULD be sufficient to block "large" (as defined by the recipient) 
messages coming from unknown senders.

There are two avenues
for worms under your scheme. Worms can get smaller in size, or worms can
stay the same size and spread via trusted senders, by looking for 
regular correspondents in address books etc. Fifteen years ago viruses
were less than 1K in size, there is no reason beyond lazyness why worms
need to be around 25K. 

All this use of heuristics can be evaded by various strategies.  If worms 
shrink 
to less than 25K, it might be that a recipient will shift their "allowed 
maximum 
unsolicited E-mail size" to 15K, or 10K, or maybe even 5K or something.  THAT 
IS 
*THEIR* CHOICE, and they aren't constrained to maintain the SAME choice.

As for "trusted senders", they'd have to use trusted senders who are trusted to 
send THAT type of content to THAT user.  This is a much narrower gauntlet to 
walk than at present (and MANY users will not allow *ANYBODY* to send them 
executable attachments, which means that NO trusted user exists whose name will 
get a virus or worm past the intended victim's front door).

It would be ENTIRELY possible to implement it entirely within an
E-mail client (such as Outlook or Outlook Express for example) and
wouldn't require ANY changes whatsoever from the ISPs or the wider
Internet infrastructure at large... 

Sure. I'm not arguing you shouldn't implement it, I'm arguing that
the scheme isn't as foolproof as might be hoped. 

Nothing is foolproof "because fools are SO resourceful".  :-)  But the fact is 
that this approach still gives FAR better results, faster, at a FAR lower cost 
and with fewer ugly side-effects, than any other proposed solution I've seen to 
date.

The devil is in the details, because the high level concepts in which you 
describe the scheme do not map perfectly into the low level concepts required 
for implementation.

There is no one SINGLE set of details required.  There is a wide leeway in 
terms 
of how the various aspects get implemented, and in fact this is an advantage... 
it allows different companies to differentiate their products, while at the 
same 
time creating distinct products which probably will not share common potential 
weaknesses that a spammer might exploit.

Some problems with whitelists: 

Once you've whitelisted all your friends and colleagues, you depend on
how good *their* spam defenses are. If they receive a worm, it can
travel to you freely. 

No, ABSOLUTELY NOT, since my proposed "whitelist" is NOT just a simple yes/no 
whitelist.  

  1)  Most of my "friends and colleagues" don't REQUIRE explicit whitelisting 
because the stuff they send me does not exceed the default (very safe) rules.

  2)  The ones I *would* whitelist don't automatically receive "wide open" 
permissions.  I would allow individual senders to send me the type of stuff 
that 
I might expect THEM as an INDIVIDUAL to send.  NONE of them, for example, are 
likely to be permitted to send me (say) PIF attachments.  :-)   Likewise, VERY 
few (if any) would be whitelisted to send me (say) ActiveX or cookies or 
scripting.

One of the advantages of my approach is that, in fact, people using it would 
block VIRTUALLY ALL incoming worms and viruses, EVEN IF they came (ostensibly) 
from people who they otherwise know and trust.

And worm writers realize this, so it is one of their priorities.

You greatly underestimate the difficulty of evading the type of 
permissions-list 
filter such as I'm proposing.  Perhaps you still don't understand my approach;  
I'll be glad to try to explain it further.  

If you have a finely grained scheme which requires giving permission
for all sorts of attachments on a case by case basis, you'll be investing
a lot of time into maintaining your rules and regulations, 

That depends a LOT on just how the software ends up being implmented (and that 
would depend on the specific software vendor).  But NOTHING requires that it be 
complicated or time-consuming.  (For example, it would be easy enough to say 
"let mail like this through from this sender in the future" and have the 
software determine the minimum set of permissions needed to allow such example 
mail to pass the filter, and add those permissions to the sender's existing 
permissions).

...even if the software can handle them in the blink of an eye. This depends 
on how much and how varied your mail is (the problem is worst for customer 
service type employees), 

Not really, because they will tend to receive a quite predictable type of 
E-mails.  People in this type of job tend to actually find that the great 
majority of their incoming E-mails fit into a small group of types.

...but it's clearly a case of diminishing returns.

Indeed.

Do you intend to share whitelist data with colleagues? 

I probably wouldn't.  People who E-mail to me and those who E-mail my 
colleagues 
are probably not that overlapping a set.

What I've done in my experimental incoming mail processing program here, 
though, 
is to provide several levels of permissions, which are sort of 
"company/installation/department/individual" (but this could vary, of course).  
Since I'm presently the only one using the system (it's a home office) I'm not 
really using those multiple levels much;  but the capability infrastructure is 
there in the code should I decide to make use of the software somewhere else 
(or 
if someone with funding and marketing ability wants me to pursue it as a 
potential commercial product).

If so, how does the software deal with conflicts?

The present software is designed such that "rules" about stuff like that can be 
defined at each of the various levels.  This way, it would be possible to 
implement "mandatory installation rules" that could not be overridden by lower 
level people, and/or "company/department/workgroup" defaults that individuals 
(or groups) could supplant if desired.  

Finally, what happens to the emails which are blocked by the whitelist?

At the moment, what happens is that they're routed to a "quarantine folder" 
(which could be on a user-by-user basis;  presently I'm the only user of the 
system here) which is simply a subdirectory with each E-mail message 
represented 
by an individual file.  That way it's easy enough to search for specific 
messages (by whatever criteria).  Again, the handling of this could be 
determined by the same kind of multilevel rule configuration as is used to 
control the rest of the system.

Are they silently thrown away, do they collect in a folder which must
be checked periodically, do they pop up a message informing the user?

They COULD be handled in any of those ways, or others if desired.  Presently 
they collect into a folder.  I certainly COULD write a program to summarize 
recently arrived messages, at some (specified) frequency, and alert me to those 
(and perhaps allow me to flag some for delivery, or delete them, or whatever) 
but up to now that simply hasn't been enough of a priority for me here that 
I've 
felt the need to write that one.

The last two schemes don't reduce the amount of work a user must do,
as each message must still be scanned. 

Well, yes and no.  

First, one could combine quite a number of supposed "spam"/disreputable 
messages 
and summarize them as a group.  This could be combined by sender, or by subject 
line, or by size, or perhaps by other characteristics.  In practice, I find 
that 
looking at the logs to see WHY the message was rejected (along with things like 
the sender name and/or subject, and/or To: address) usually allows me to very 
quickly concur with the system's triage for most of the E-mails it's rejected.

The reasons could be diverse:  reference to a distrusted/disreputable domain, a 
high score by my "gibberish" detector, reference to a 
distrusted/disreputable/blacklisted E-mail address or domain, or whatever.

One of the reasons why I'm mostly just presently collecting the rejected stuff 
is precisely so that when I *do* decide to sit down and write an experimental 
"scan/confirm" part of the system, I'll have a healthy universe of stuff to use 
as test material...  :-)

But anyhow, it's not nearly as annoying as having to deal with the individual 
messages... since you can deal with them later (when you're not busy with 
pressing matters) and because you're already aware of what the filter found 
about each individual E-mail that it didn't like.  That allows one to make 
pretty short work of dismissing large groups of them.

Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support free and fair US elections!  http://stickers.defend-democracy.org
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg