ietf-asrg
[Top] [All Lists]

Re: [Asrg] Re: bounces, and anti-spam principles

2007-01-25 18:50:52
I'm grouping together responses to several individual points on this thread.

[comment #1]

In any case, I still contend that simplistic blocking by IP address or domain name is a very poor approach, and for a whole variety of reasons.

I will contend that there cannot be a content filter that can reliably separate spam from non spam.

It doesn't NEED to be 100.000% accurate.

The bulk of mail most people receive comes from people they are familiar with, and which fits certain patterns. A given sender (mailing list etc) will typically have a signature file, for instance. I know that Aunt Matilda is NOT going to send me an E-mail containing a JavaScript decryption routine, or an ActiveX enclosure. She also is not going to send me an executable attachment. If stuff like that arrives here, it is safe to presume it is NOT from her, no matter what the From: address says (and even if it WAS sent from her computer).

If you know what mail from your Yahoogroups AfricanViolets mail looks like, you could for example specify to look for that common content in mail claiming to come from that mailing list.

Any non-spam message received by one person would be spam if it were sent to 10 million harvested addresses.

Sure, and that's why the recipient knowing the sender is one of the key criteria. Stuff that you might accept from soomeone you know and trust might be spam if someone you'd never heard of sent it to you. It's ABSOLUTELY not enough to test subject/from/IP/domain.

OTOH, taking the definition of spam as Unsolicited Bulk Email makes detecting a spamming IP address almost trivial.

I guess that depends on what you call "bulk", and how you propose to detect it. Again, whatever rule you put into effect (on a global-type basis) is going to be discovered by spammers and they will engineer their sending patterns to avoid violating it. That's why you need a really narrow and twisty 'gauntlet' they must negotiate, with DIFFERENT RULES for different recipients, where they don't know and basically can not figure out what rules they would have to comply with to get a message through to a particular person.

That said, there should be a default set of rules which will get "safe/small" mail through from unknown senders, as long as it doesn't "look like" spam (again, SpamAssassin is not perfect, but it's pretty good once HTML, scripting, and attachment ruses are denied to them for the purpose).

The trick is to stop accepting mail from that IP address only until it has cleaned up.

Again, when you have a LOT of users (and possibly MANY servers) behind a NAT router, denying mail from that IP address results in simply too much collateral damage. More to the point, it's a very blunt instrument for the job, and it's relatively simple to do very much better.

Once the spam is gone there is no need to block the address unless it has proven to be a repeat offender without an effective process for shutting the spammers down.

What about when the flow of spam is interleaved with all sorts of good/important traffic as well?

[comment #2]

Speaking as an ISP, what's unrealistic in these utopian end-user
filtering only arguments is costs.

Suffice it to say that we can add fast, capable mail servers and see
them flooded in a matter of hours.

I'm not saying that end-user spam filtering is the ONLY approach that should be used. On the other hand, it is likely to be the most accurate and least objectionable from a user standpoint. Plus, it is the most likely to reject spam in a way that corresponds with how a USER would decide it's spam. (I will open mail from a friend with the same subject line that I would discard if it came from someone I didn't know...)

Pushing all the filtering to the end-user would make that much worse.

There is a lot of spam which is obvious. That includes messages which contain links to known-spam-promoted Web sites (at least in the absence of contradicting factors, say being from a list discussing spam senders!)

It also includes, for example, messages which are identical to messages that some number (dozens? hundreds?) of other recipients at the same ISP have already reported as being "spam". One would think that ISPs could locate and perhaps recategorize identical messages (again, perhaps tempered by a specific recipient rule) which are still queued and have not yet been delivered to their remaining customers.

Yahoo, for all their claims, does a pretty fair job of only sequestering spam messages, although an awful lot of obvious spam still curiously slips through their filters.

...and a user should be able to selectively prevent blocking of mail that otherwise would get blocked.

But let me state again (and this is part of what made me respond, starting this sub-thread) is that it is virtually NEVER a good idea to send a bounce message after-SMTP-time, because you can't be sure where to send it, and most likely you are just harassing another innocent victim. Far better to just toss the mail. If you are going to alert anybody, it makes more sense to offer the offending mail (tagged accordingly) to the intended recipient so that THEY can make the final decision on what to do with it.

Being able to "slam the phone down" on miscreant IP blocks at the accept() or helo is much, much, less processing than going thru the entire SMTP interaction and whatever it takes to pass processing off to an end-user.

It's true that it costs less, but it's also true that it blocks a lot of innocent and legitimate mail that might be originating from the same IP address (NAT router?). There could be dozens, hundreds, or even thousands of innocent users affected.

IMHO, such innocent users who found their messages blocked might have legal recourse against SOMEONE... it's simply far too blunt an instrument.

Put another way, you can have almost unfiltered access and
near-perfect spam filtering!

Here's how to do it:

Get your own link to the backbone.

Set up your own mail servers etc.

Hire one or more secretaries to pre-screen your email according to
rules you have trained them in.

It might cost a few thousand a month, but surely in the face of all this expressed urgency about the pitfalls of centralized filtering
it's a small price to pay.

As more and more businesses become dependent on the Internet, and timely delivery of communications, such a cavalier attitude is going to lead to business failures at ISPs who don't realize that this isn't "just a hobby for computer geeks" anymore.


[comment #3]

Absolutely, and that's a good reason why blocking by either IP address or domain name is such a bad solution. A fine-grained whitelist which specifies ALLOWED behavior on a per-sender basis, on the other hand, can easily allow or block messages from a given sender ON A MESSAGE-BY-MESSAGE basis, so that their legitimate messages get delivered but the (zombie) messages being sent by their same (infected)machine, using the same mail servers and same permissions/certifications but which do not look the way that sender's messages are expected to look (by the recipient!) are efficiently and accurately identified and blocked.

So "rehabilitation" isn't even an issue.

So the zombie becomes unable to emit spam, but there's no incentive to fix it so it's still available to the botmaster for use as a C&C machine, web/DNS server, and DDoS participant. I'd prefer that it get
uninfected.

Obviously, that is ideal, but the problem is that after (first!) SMTP time, the (intermediary, or final) recipient doesn't really know who they ought to notify...! Notifying the wrong person, or someone who has no control over the situation, probably does more harm than good.

Again, I don't believe it is possible to prevent unwanted mail from being injected into the Internet. What ultimately will stop it is once its likelihood of success is SO small that it's simply not worth attempting it.

People don't write viruses for Coleco ADAM computers simply because there are very few of those connected to the Internet. The chances of the author's creation encountering a vulnerable system is simply too low.

[comment #4]

"spam" is a slang word, which is often used to describe *A SUBSET OF* unwanted email. Some legal jurisdictions have legislation that defines spam very narrowly. If you insist on blocking "spam", you *WILL* end up
spending a lot of time and money in court cases where...

1) the spammer insists that his spam is "not-spam" because of some technicality. Expect to see lots of legal "is not spam; is so; is not; is so; is not" being billed at lawyers' regular rates. And of course, you can rest assured that the politicians who enact legislation will make exemptions for solicitations for campaign contributions. Any "spam-filters" that block any "not-spam" *WILL* get hit with
cease-and-desist orders

That is one further reason why the RECIPIENT should be the person to judge what they are and are not willing to receive, and from who. Senders basically have no legal recourse if somebody chooses to delete that sender's mail from their Inbox, whether they have read it or not.

2) saying that Joe Blow sends spam is equivalant to calling him a spammer. Watch the defamation (libel/slander) lawsuits fly.

There have already been such suits against blacklist management organizations.
However, if you block "unwanted email" rather than "spam"...

1) spammer says "wahhh, wahhh, wahhh, my 'valuable information' is 'not-spam'" and you can enthusiastically agree. The the customer still doesn't want it. "Because I said so" should be sufficient reason.

Right. And the recipient can reasonably set (even completely arbitrary!) rules to determine that they do and don't want delivered to their Inbox.

[snip]

Similarly, don't try to define "the S-word" in technical terms. A bunch of geeks sitting at their keyboards are no match for a nit-picking lawyer who was the captain of his class debating team. It's effectively a pro se defense against high-powered lawyers, and the results are very predictable. Don't engage in a battle you can't win. Go with... - our customer says he doesn't want your emails. No, we don't know
   why he doesn't want your emails.
 - the customer is always right; end of story.
 Don't give the spammers' lawyers anything to attack.

Bingo.

 - I am a customer of clss.net (Aurora Internet)

- they have a modified Qmail that generates 550 SMTP-stage rejects (i.e. *NOT* a DSN) based on a customer-configurable control file in the customer's home directory. There are separate rule files for sub-accounts. E.g. I point my domain MX at their server. abuse and postmaster are basically unfiltered compared to this address.

- step 1 is to declare a whitelist of emails that I accept
   unconditionally

That's good, but I basically want finer control than that... I want to be able to open up the window (like the keyway on a lock) to allow the messages in that I expect from each sender. Even a sender that I would accept an executable attachment from, I might refuse a message containing ActiveX or JavaScript.

- I don't want email from residential machines on dynamic IP addresses sending direct-to-MX. So I block based on dynamic IP DNSbls, regexp filter against rDNS, and obviously block email from machines with no
   rDNS whatsoever.

Obviously you can (and should) set the rules however you want, as recipient. I wouldn't want, for example, my ISP(s) forcing those same rules on me.

- I don't talk to myself. I don't want email from people who lie in their email, by including "waltdnes.org" in the HELO or return-path.
   So I block those emails.

Certainly reasonable!

- I don't want email from certain countries, so I block them, using
   country-codes in rDNS and return-path

Also reasonable enough, as long as you are setting those rules for yourself. Personally, I WILL accept (legitimate) mails from just about any country anywhere (including particularly countries I have visited, and that's a list of almost 50 countries). And on my travels, I have sent E-mails from (say) Beijing. I would be annoyed if those E-mails had been blocked just because I happened to have sent them from China.

Again, your Inbox, your rules.

 Executive summary...

- blocking email, because it meets some technical criteria, is easier
   on the technical side, but introduces legal problems

- blocking email, because the customer said so, may be harder
   technically, but avoids legal problems

- any complications on the anti-spam side are outweighed by equivalant complications on the spammers' end. ISPs will have to enable end users to configure their own rules, and everybody's filters and whitelists will be slightly different. Imagine how spammers will feel knowing that each of several million targets for a spam-run has a slightly different defense, that has to be overcome in order to
   deliver the email.

EXACTLY. But also, knowing that all the classical ruses to avoid spam classification (text as image, embedded links, attachments, scripting, disguised HTML links, etc etc) are a priori denied them.... certainly takes a major bite out of spammers.

And only allowing executable attachments, HTML, and "big" messages from known/trusted senders basically eliminates E-mail as a vector for virus/worm propagation, which takes a big bite out of spambot zombie recruitment. That, all by itself, is a huge improvement in the spam detection/blocking situation.


[comment #5]

All I can say is, you are certainly welcome to block any mail you please, and no cooperation from other MTA operators is required, nor is any meeting of the IETF. The only purpose for the IETF involvement is to coordinate cooperative action. Since the IETF is voluntary, the action needs to be of benefit to all participants, and that greatly restricts the field of actions practical for widespread implementation. But it doesn't in any way restrict what you as an individual can do.

That's certainly true, and one advantage of fine-grained recipient blocking is that it doesn't require any great worldwide consensus, nor any re-engineering of Internet infrastructure.

What WOULD be helpful, though, would be a recognition by the IETF that:

a) such fine-grained per-sender by-recipient blocking (and hopefully augmented by subsequent content scanning) is an effective and desirable approach to the problem, and

b) in the general case, blocking of all non-whitelisted E-mails containing HTML, scripting (probably covered under HTML... is it possible to put in scripting without HTML?), or attachments is a "best practice". (It is probably a good idea to suggest including a maximum message size, too, as a way of preventing "denial of service" attacks by sending big E-mails to someone which would be expected to fill their E-mail inbox to overflowing, blocking subsequent legitimate E-mails).

That would at least provide a direction forward which would make for a huge improvement, avoid the legal issues of blocking e-mails too crudely, and take a big bite out of spambot zombie recruitment. What's more, (as was pointed out by another post), having millions of different target recipient, each with different delivery criteria is a far more daunting challenge to spammers.

Since your method requires no cooperation from any other MTA operator, it doesn't require any endorsement from this group.

Right, no endoresement is NEEDED, but (like the introduction of the original IBM PC) it would be nice to have it recognized as a useful direction. Spammers are far more likely to be dissuaded from attempting to send HTML-based or attachment-based spam if it is RECOGNIZED that it's unlikely to be delivered, rather than it just disappearing down a black hole somewhere and leaving them believing that it's still a viable technique.

That is fine - it doesn't make your method illegitimate or anything like that. But most users wish for a cooperative anti-spam technique, because they reasonably expect it will work better, and they reasonably expect many other MTA operators to cooperate with them.

And, if that's enough to satisfy them, chances are good that the (cooperative!) "default" case (no HTML, no attachments, messages < some maximum size, and message passed by SpamAssassin or similar) would already constitute a MAJOR improvement over existing spam blocking. The whitelisting capability mostly just gives the recipients the opportunity to tweak things further, opening the keyway to allow more risky mail if they so desire, or to block stuff they don't want that the ISP's default scanning would still let through.

This has been true in the past - consider the many DNSBLs and other activities against spam. When we kept a list of spamming IP addresses sending to our MTA, we found after 2 weeks that only 1% of the IPs had send more than one message. Our subscription to Spamhaus kills about 65% of incoming messages. That is a victory for cooperation and it makes us think that more cooperation might be better.

Again, the problem is the degree of collateral damage that IP-based blocking produces. I consider that to be unacceptable, and perhaps creating legal liability. Now, if the USER implements IP-based blocking, that's THEIR choice and I don't believe any court would rule against their right to do that. But an ISP is a very different situation.

It is true that cooperative actions attract lawsuits, but that is only because it isn't practical to sue an individual for refusing mail,

Not only is it not practical, but they have the ABSOLUTE right to read or not read anything given to them (certainly at least anything delivered by E-mail!).


[comment #6]

[how users configure their whitelist rules]

The problem being that out of the 60,000 seats here, perhaps less than 10 of them are able to competently configure a set of rules like what you have.

That's a software implementation issue, not an inherent problem in the approach. I envision a button to click on that simply says "allow E-mails like this from the same sender in the future" and where the software will open the keyway JUST enough to allow that type of message if seen again from that sender. How that recognition is accomplished, whether by something crude like simple GREP-type scanning, or something brain-damaged like RegEx pattern matching, or something still more sophisticated like the pattern matching SNOBOL/SPITBOL offers, or even a different sort of statistical ranking/rating approach like content scanners use... will vary from one implementation to another. The final products will probably use a combination of techniques.

Many of them don't even have a clear notion of the concept of "source IP" is, let alone being able to make reasonable choices of, say, knowing why you'd want to block dynamic IPs or IPs in Korea.

Again, I consider IP-based blocking to be inherently flawed, to the point where I consider it a dead-end.

Furthermore, and with complete irony, I'll note that the only reason I read this thread is that my very own, personally trained, UA bayesian
filtering flung it all in the junk folder! ;-)

:-)

Yeah, I admit that I usually at least cast a cursory eyeballing of the Yahoo mail "spam" folder too, rather than just emptying it. Occasionally I -do- find a non-spam message there. (Although that happens seldom, as I almost never give that E-mail address to anybody... It's almost useful as a "personal honeypot" to see what's being spammed out, before going to my more usual E-mail accounts and possibly wondering if that curious E-mail just MIGHT be legitimate).

We're achieving effectiveness rates in excess of 98% with our "one set of rules" server based defences. My personal account, which receives 400-600 emails/day, has 100 or more spams/day filtered out by the central server solution. I usually go a week or so between spams that get past those central filters - I see _many_ more FPs with my bayesian
than I see spam getting through.

There will be FPs and spams get through, probably regardless of what filtering technique you use. The important thing is that the RECIPIENT controls that, so they can decide the rule that determines what gets blocked and what gets through. That way they don't have to wonder what SHOULD have been delivered to them and wasn't.

My personally trained bayesian filtering has an absolutely abysmal track record.

Spammers have gotten good at throwing enough random junk into E-mails to confuse Bayesian filters.

On the spam aimed at the false positive handling address, which by design has _no_ filtering, Bayesian has an effectiveness rate of about 50%. Yuck. No amount of personal twiddling, custom rules, explicit pattern matching in my UA is going to make much difference to that.

Some E-mails are going to get through. But making sure that they are (a) small, and (b) not "dangerous" at least reduces the impact of those.

And meanwhile, giving the recipient the ability to at least not see the SAME kind of stuff over and over again, if they choose to use those features, demonstrates the ISP's trying to give the user the tools to reduce the frustration.


Gordon Peterson
http://personal.terabites.com
1977-2007 Thirty year anniversary of local area networking

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg