ietf-asrg
[Top] [All Lists]

Re: [Asrg] Re: Asrg Digest, DNSBL BCP v.2.0

2007-03-04 11:54:11
I think you can do FAR BETTER from a content standpoint
(content analysis, such as Spam Assassin, following "a
priori" blocking of mail from unknown/untrusted senders
containing HTML or attachments) than you can using any
kind of IP-based blacklisting or other "reputation"
scheme.

SpamAssassin (and other content filters) don't actually work the way you think 
they do, on many levels.

The major measurable component of spam is whether or not the sender
has permission to contact the recipient.

I disagree. I have no objection at all to being contacted by someone I've never met before. I hand out business cards at trade shows and elsewhere. I have my E-mail address on my (well-indexed) personal Web site. Many companies put their E-mail address on Yellow Pages ads and other public places. The fact that I've not previously authorized contact isn't the problem. The problem is the delivery of unwanted, highly repetitive and annoying, scams and garbage.

Content filters in particular
have no view into consent; no way to measure consent.

In the absence of a fine-grained whitelist (on a per-sender basis), I agree that they don't have any way to measure consent. But adding that component changes that situation quite dramatically. Not only does the system then know WHO has established "consent", but also WHAT has been "consented" to.

The trick then is deciding which of the NEW, first-time contacts is likely to be unwanted. Certainly, there are various clues... including the presence of content commonly used to evade filtering (decryption scripting, obscured URLs, URL redirection, etc etc).

There is no HTML
code or X-Header that reliably provides proof of opt-in.

If there were, and if it could be relatively easily spoofed, it would be less than terribly useful.

I think the solution involves (among other components) a tacit understanding at both ends of the communication between what the sender is sending, and what the recipient expects them to send. The bar should be higher for previously unknown, therefore unestablished or untrusted, senders.

They do some
good things based on modeling of what looks like spam; but it's also
true that things that look like spam are not always spam.

That's true. And that's where the recognition of a familiar (to the recipient) sender enters into things.

False positive issues you rant about occur just as often with content filters. 
Some would claim, even more so!

I am likely to be far less upset about getting questionable mail if there is at least SOME arguable reason why the filter ought to have delivered it. Users ought to be able to tweak their filters so that they can change the rules whenever they desire, especially for particular cases that occur with some frequency for them.

With a blacklisting, I get a bounce back and can find somebody to argue with. 
With the common method of implementing a content filter,
my mail is quietly eaten and I get no information back regarding the failure to deliver the mail to end recipient. This is worse than IP blacklisting; less transparent; less obvious; less opportunity for
feedback and investigative recourse.

The big problem with blacklisting bouncebacks is that in the general case, you cannot be sure WHO to send the bounceback TO. Once spam has gone through one or more levels of forwarding, the only way to go further back is via the Received: headers, but those are commonly counterfeit. Sending bouncebacks multiplies the wasted bandwidth due to spam.

Worse, "intentional bounceback" can be used by spammer as one way to get their spam delivered to a third party... they send mail in a way that they are confident will be bounced back, but arrange things so that the bounceback will go to the actually intended recipient... but this time, the (bounceback) message is originating from a not-blacklisted MTA.

Ultimately, I believe that the best way to deal with such spam is to at least OFFER recipients a chance to review blocked messages (and hopefully via rules that they can use to eliminate the necessity of their reviewing repetitively familiar spam), or the choice of accepting the system's determination and just junking it.

But again, I'm far more willing to accept mail from someone if (1) I recognize the name of the sender, and (2) the mail "looks like" the sort of mail I would expect to receive from that sender.

The fact that you think they're better is likely based on an incomplete view on your part.

I doubt it, but I'm certainly willing to learn.

You actually probably have no idea how much of your mail has ever been 
redirected to a bulk or trash folder
by a content filter.

Actually, I tend to monitor that rather closely, in part because I use that knowledge to refine my ruleset.

And of course, not to mention that SpamAssassin, which you hold up as
the better model,

I consider it a 'respectable' example of the genre. I generally make it a point to include "like" or "-type" in references to that product.

has lovingly crafted hooks into it to allow direct
support of IP-based blacklist and other IP-based reputation
mechanisms.

Hopefully they use that as an INPUT into the rating process; I don't have a problem with that, as long as mail coming from such "blacklisted" IP addresses is not BLINDLY trashed regardless of any other considerations.

Note to rest of world: I'm not anti-SpamAssassin. I've run it myself
before and likely will again. I'm just pointing out that like just about every other kind of spam filtering or blocking mechanism, a
content filter is imperfect.

Certainly they have limitations, including some which are so severe as to be essentially crippling. HTML, embedded images, attachments, and the like make it nearly impossible for content-based spam filters to do a good and effective job. Even if, (IF!) for example, a content filter had OCR abilities to try to analyze text-as-image... an embedded image could change the referenced image (say) an hour after sending the E-mail, such that it was actually read AFTER the analysis had passed the (previously linked) image.

It's a bit mind-blowing to see content
filtering held up as this panacea to address the ills of IP-based blocking, since they're both approximate models of what somebody
thinks is spam,

The only opinion that MATTERS is that of the recipient... which is why they should be able to control the ruleset and the sender-by-sender whitelist, as well as what to do with spam (e.g. putting it into a spam folder that they can examine as they wish to confirm the accuracy of the filtering).

...and have flaws inherent to both technology and policy
limitations.

Again, what the USERS want is the ability to have the mail that makes it into their inboxes bear some approximation to the mail they expect and want to see. And only those users are able to make that judgement call, in the end analysis. What we need is an effective, practical tool (or toolset) to allow them to express that set of criteria.

It's clearly not enough to look just at the e-mail headers; but within that limitation (for example) I was getting relatively useful filtering using the web-based ruleset offered by my domain provider, until I ran into their limit of 200 rules...!

Regards,
Al Iverson

Gordon Peterson
http://personal.terabites.com
1977-2007 Thirty year anniversary of local area networking

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg