ietf-asrg
[Top] [All Lists]

RE: [Asrg] 2a. Analysis - Spam filled with words

2003-09-12 19:17:29
[I'm still iffy on how on-topic this is, so I've provided the following handy pointer. If you want to skip the side discussions and see the small part which I actually think is actually relevant to the consent discussion, search for the string "BEGIN RELEVANT CONSENT SECTION". The rest of the text is just me being unable to stop from responding to things. -kee]

At 3:03 PM -0400 9/12/03, Pete - Madscientist wrote:
| It's an ongoing battle.  As such, you should pick those areas where
| the enemy has the least flexibility and attack there.  In content
| they have *infinite* flexibility.

Actually, the "message" of the spammer has some rigidity to it so it's
not without merit to attack there under this argument. The *infinite*
flexibility you speak of is not truly there - only apparently.

No, I'm not talking about apparent flexibility. I really mean infinite. Javascript, and CSS to a lesser extent, and HTML to some extent as well, provide the ability to present the same content in an infinite number of ways. (If you want to get picky, I'd agree that message size and presentation time are limitations, but otherwise I don't see any.) So to beat the game you either need to provide a full, bug-for-bug-compatible, implementation of the end-user's email program (tough for server implementations) or start playing the virus-signature game. (Which has recently been shown to incapable of responding as quickly as viruses, never mind spam.)

At some point you presumably stop trying to look for content per-se, and start analyzing the techniques used to present it to see if they show signs of deliberate obfuscation. Of course this introduces a whole new class of false positives since there are companies out there whose business model is "active email". On the bright side, if you fail as an anti-spam provider, you can always sell the software to universities for grading students' programs. :-)

By the way... this obfuscation technique could easily be generalized and
captured by many content analysis systems...

Absolutely.  And the next, and the one after that, and the one after that....

Actually that particular technique is probably of most concern to people doing checksums. And it can clearly be solved there as well. But like all of these things, it continues to be a battle because of the fantastic flexibility of the presentation software, that of the human mind viewing the final result, and that of the human mind coming up with the evasive techniques. This is not something I want to try and pit a computer program against.

In order to define consent in a way that is executable we must define
the "to what" part of the equation. In practical terms, I frequently
have two users on the same system disagree about the definition of a
single message and that definition is often only resolvable by content
analysis...

I was with you there for a sentence and a half. I did my undergrad thesis on natural language processing. I have good friends who are still in the business (well actually, who are out of jobs, which says something in itself.) I've seen nothing that has given me any indication that the technology is at the stage where it can read an email message and make a better assessment about its content than two arbitrary humans. *Especially* content that someone is deliberately trying to obscure from the program.

With the exception of the application of explicit consent tokens & C/R
mechanisms (which have their ups and downs) there is no clear way to
establish "what" a RECEIVER is giving CONSENT to receive. In the end,

I'm not sure what C/R does in this discussion. I suppose the challenge can specify what content is permitted, but it provides no control (and some Nigerian-scam spammers *do* respond to C/R systems).

A while ago on this list I brought up the idea of having bulk-mailers categorize email at a more granular level. In particular, I thought distinguishing between transaction related email "your shipment is on its way" and straight sales mail "we have a special on". (I confess, the idea wasn't mine, but came out of discussions I'd had with one of the legit bulk email houses.) At the time the response was rather negative. Then again, this group hadn't really gotten into the consent thing at the time.

the best practice will be for the RECEIVER to have any and all
mechanisms at their disposal for this purpose so that the greatest
diversity of needs can be met. Also, as with much of the Internet, once

This is really an aside, but that's a best practice that flies in the face of market realities. Doing that without making the whole thing too complicated for the user to understand is extremely hard. And if they don't understand, they tend to give up. Messagefire's web site has complicated code for doing a lot of things, including viewing held messages safely on the web, providing webmail, and lots of other stuff. But the single most complicated and difficult page to develop, and the one which we get the most feedback on, is the one where we let them report errors and give them options on how to make sure the error doesn't occur in the future. (In other words--the one where we try and meet the diversity of their spam-blocking needs.) I'm sure it seems simple to us on this list. Depending on what we know about the message you can block the user, block the domain, block the country, report it as abuse to an ISP, unsubscribe from the list, and (of course) tell us we made a mistake (reverse all that if we made the error in the other direction). No big deal, right? Wrong. Too many choices. We tested it, and our beta customers just got confused. They didn't *want* any options. They didn't want the options to change. They'd be happiest with a button at just said "Just deal with it.". But if we took out choices, our advanced users (you know, the ones that make the purchasing decisions) felt too limited. There are obvious solutions to that problem. But the main issue is that in a competitive environment, the guy who has a "Just deal with it button" may well win over the one that lets you solve the problem correctly.

Spammers are being used to market mainstream products more and more
frequently (this is sad but inevitable),... and the same organizations
that send out ink, insurance, and travel spam are just as likely to send
you the IBM newsletter you signed up for or your latest RedHat notices.
Need I mention McAfee and Norton who seem to have an army of spammers
selling their wares... all without their permission of course (yeah

Those are excellent arguments against content analysis. Because in fact the people *transmitting* the message are not the same. They aren't the same because the mainstream bulk mailers do *not* want to get blocked, and they know that if there are too many complaints, they *will* get blocked. So long as governments (and here I point my finger primarily at the U.S. Congress) don't change that dynamic by legitimizing unsolicited email, that will continue to be the state of affairs. So then the question is, how do you tell the difference between the Norton Anti-Virus ad from an "unsanctioned" reseller and the one that you actually requested? Clearly not by looking at the content.

Yes, if legitimate mailers start sending illegitimate, unsolicited content, then my argument has major problems. That is very rare right now, and I don't expect that to change. Mostly it occurs when a spammer tries to go legit and gets some real customers. I've seen that happen with an online catalog I get. I explained the problem to the catalog vendor and apparently I wasn't the only one, because within a few months he had switched bulk mailers.

right). Or what about the amazing (cough) trend of anti-spam vendors
spamming to sell their software! (I hate those guys - we filter them and
we look like we're being unfair, we don't filter them and we get pounded
with customer complaints and submissions... and we're not allowed to use
this practice!! - we refuse!!!)

Hey, the way I see it, they are the only people who actually do use spam for targeted marketing. Their target is people who receive spam!

BEGIN RELEVANT CONSENT SECTION

That said, if RECEIVERs are going to have control of what they receive
then content analysis will have to play a role in that mechanism -
simply because recipients, more often than not, define what they want
and don't want by the content of the message, not by the sender, not
even by agreements they may have with that sender (explicit or implied).

I agree that there are lots of people who *think* they define what they want by content. But in a wide open system I think they'd change their mind very quickly. I'm looking for a new jacket right now. If I sign up at a store I like and ask them to send me mail about jackets, I've just asked *them* to send me email about jackets. I don't want to hear from their affiliated companies, their vendors or their retail locations. Nor do I want to hear from every other jacket manufacturer in the world.

To put that in consent terms.

Consent is first and foremost an agreement with a particular sender.

Secondarily it is an agreement about the content that can be sent to me by that sender.

If you're arguing that consent is first about content, and secondly about sender, then I disagree.

If you're arguing that content analysis is necessary in order to enforce a consent agreement, then I also disagree. I believe the content clause of a consent agreement is primarily enforced by trust. Because if I can't trust a sender to listen to my content request, then clearly I can't trust them to maintain the overall integrity and privacy of the consent agreement itself.

Case in point. Microsoft. Many years ago I signed up on their site for something or other, and I happened to use my personal email address since I was between companies. I requested that they not send me any content. The email address was just for logging in.

Years later, and all of a sudden Micrsoft is sending that address virus alerts telling me to patch my (non-existent) Microsoft system. They think that this issue is so important that it overrides any of my content requests. They think that this is so important that they provide no way for me to get off the list, or delete the login (I don't even thing the original place I signed up exists any more). A content filter isn't going to fix that problem. That's a trust problem.

END RELEVANT CONSENT SECTION


Filtering systems are now and will be as important as search engines for
precisely the same reasons... the amount and variability of information
that is "out there" or in the case of email on it's way to your mailbox
is *infinite*. Selecting what you want and ignoring what you do not want
is the critical task of the information age.

Ah. Now that hits on something close to my heart--auto-filing my email. But that's a very different class of activities. In that case you're trying to identify content that was actually constructed so as to be understood.

Making that capability a practical reality requires technology - and
content analysis is part of that arsenal.

Yes!


People keep thinking of this problem in terms of abuse, attacks and
illegal activities that should be stopped... much of that is true since
there are few controls in place... however the deeper "meaning" in all
of this mess is that once you can access anything you ever could want -
you then need a mechanism to manage that ability.

It *is* abuse and attacks. And the key indicator is that when you try to fix it one way, they comes up with a new way to send it. That is not a problem that normal content systems have to deal with.

You sound like someone who has a nail-gun and is trying to fire nails at the enemy. I think that nail-gun is a fantastic tool for processing my email, but a machine gun will work much better on the spammers.
--
Kee Hinckley
http://www.messagefire.com/  Anti-Spam Service for Companies and Individuals
http://commons.somewhere.com/buzz/  Writings on Technology and Society

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg