Re: [Asrg] Re: bounces, and anti-spam principles

I'm grouping together responses to several individualpoints on this thread.


[comment #1]

In any case, I still contend that simplistic blocking byIP addressor domain name is a very poor approach, and for a wholevariety ofreasons.

I will contend that there cannot be a content filterthat canreliably separate spam from non spam.


It doesn't NEED to be 100.000% accurate.

The bulk of mail most people receive comes from peoplethey are familiar with, and which fits certain patterns.A given sender (mailing list etc) will typically have asignature file, for instance. I know that Aunt Matilda isNOT going to send me an E-mail containing a JavaScriptdecryption routine, or an ActiveX enclosure. She also isnot going to send me an executable attachment. If stufflike that arrives here, it is safe to presume it is NOTfrom her, no matter what the From: address says (and evenif it WAS sent from her computer).

If you know what mail from your Yahoogroups AfricanVioletsmail looks like, you could for example specify to look forthat common content in mail claiming to come from thatmailing list.

Any non-spam message receivedby one person would be spam if it were sent to 10million harvestedaddresses.

Sure, and that's why the recipient knowing the sender isone of the key criteria. Stuff that you might accept fromsoomeone you know and trust might be spam if someone you'dnever heard of sent it to you. It's ABSOLUTELY not enoughto test subject/from/IP/domain.

OTOH, taking the definition of spam asUnsolicited BulkEmail makes detecting a spamming IP address almosttrivial.

I guess that depends on what you call "bulk", and how youpropose to detect it. Again, whatever rule you put intoeffect (on a global-type basis) is going to be discoveredby spammers and they will engineer their sending patternsto avoid violating it. That's why you need a reallynarrow and twisty 'gauntlet' they must negotiate, withDIFFERENT RULES for different recipients, where they don'tknow and basically can not figure out what rules theywould have to comply with to get a message through to aparticular person.

That said, there should be a default set of rules whichwill get "safe/small" mail through from unknown senders,as long as it doesn't "look like" spam (again,SpamAssassin is not perfect, but it's pretty good onceHTML, scripting, and attachment ruses are denied to themfor the purpose).

The trickis to stop accepting mail from that IP address onlyuntil it hascleaned up.

Again, when you have a LOT of users (and possibly MANYservers) behind a NAT router, denying mail from that IPaddress results in simply too much collateral damage.More to the point, it's a very blunt instrument for thejob, and it's relatively simple to do very much better.

Once the spam is gone there is no need toblock theaddress unless it has proven to be a repeat offenderwithout aneffective process for shutting the spammers down.

What about when the flow of spam is interleaved with allsorts of good/important traffic as well?


[comment #2]

Speaking as an ISP, what's unrealistic in these utopianend-user
filtering only arguments is costs.
Suffice it to say that we can add fast, capable mailservers and see
them flooded in a matter of hours.

I'm not saying that end-user spam filtering is the ONLYapproach that should be used. On the other hand, it islikely to be the most accurate and least objectionablefrom a user standpoint. Plus, it is the most likely toreject spam in a way that corresponds with how a USERwould decide it's spam. (I will open mail from a friendwith the same subject line that I would discard if it camefrom someone I didn't know...)

Pushing all the filtering to the end-user would makethat much worse.

There is a lot of spam which is obvious. That includesmessages which contain links to known-spam-promoted Websites (at least in the absence of contradicting factors,say being from a list discussing spam senders!)

It also includes, for example, messages which areidentical to messages that some number (dozens?hundreds?) of other recipients at the same ISP havealready reported as being "spam". One would think thatISPs could locate and perhaps recategorize identicalmessages (again, perhaps tempered by a specific recipientrule) which are still queued and have not yet beendelivered to their remaining customers.

Yahoo, for all their claims, does a pretty fair job ofonly sequestering spam messages, although an awful lot ofobvious spam still curiously slips through their filters.

...and a user should be able to selectively preventblocking of mail that otherwise would get blocked.

But let me state again (and this is part of what made merespond, starting this sub-thread) is that it is virtuallyNEVER a good idea to send a bounce messageafter-SMTP-time, because you can't be sure where to sendit, and most likely you are just harassing anotherinnocent victim. Far better to just toss the mail. Ifyou are going to alert anybody, it makes more sense tooffer the offending mail (tagged accordingly) to theintended recipient so that THEY can make the finaldecision on what to do with it.

Being able to "slam the phone down" on miscreant IPblocks at theaccept() or helo is much, much, less processing thangoing thru theentire SMTP interaction and whatever it takes to passprocessing off to an end-user.

It's true that it costs less, but it's also true that itblocks a lot of innocent and legitimate mail that might beoriginating from the same IP address (NAT router?). Therecould be dozens, hundreds, or even thousands of innocentusers affected.

IMHO, such innocent users who found their messages blockedmight have legal recourse against SOMEONE... it's simplyfar too blunt an instrument.

Put another way, you can have almost unfiltered accessand
near-perfect spam filtering!

Here's how to do it:

Get your own link to the backbone.

Set up your own mail servers etc.
Hire one or more secretaries to pre-screen your emailaccording to
rules you have trained them in.
It might cost a few thousand a month, but surely in theface of allthis expressed urgency about the pitfalls of centralizedfiltering
it's a small price to pay.

As more and more businesses become dependent on theInternet, and timely delivery of communications, such acavalier attitude is going to lead to business failures atISPs who don't realize that this isn't "just a hobby forcomputer geeks" anymore.



[comment #3]

Absolutely, and that's a good reason why blocking byeither IP address or domain name is such a bad solution.A fine-grained whitelist which specifies ALLOWEDbehavioron a per-sender basis, on the other hand, can easilyallowor block messages from a given sender ON AMESSAGE-BY-MESSAGE basis, so that their legitimatemessages get delivered but the (zombie) messages beingsent by their same (infected)machine, using the samemailservers and same permissions/certifications but which donot look the way that sender's messages are expected tolook (by the recipient!) are efficiently and accuratelyidentified and blocked.
So "rehabilitation" isn't even an issue.
So the zombie becomes unable to emit spam, but there'sno incentive tofix it so it's still available to the botmaster for useas a C&Cmachine, web/DNS server, and DDoS participant. I'dprefer that it get
uninfected.

Obviously, that is ideal, but the problem is that after(first!) SMTP time, the (intermediary, or final) recipientdoesn't really know who they ought to notify...!Notifying the wrong person, or someone who has no controlover the situation, probably does more harm than good.

Again, I don't believe it is possible to prevent unwantedmail from being injected into the Internet. Whatultimately will stop it is once its likelihood of successis SO small that it's simply not worth attempting it.

People don't write viruses for Coleco ADAM computerssimply because there are very few of those connected tothe Internet. The chances of the author's creationencountering a vulnerable system is simply too low.


[comment #4]

"spam" is a slang word, which is often used to describe*A SUBSET OF*unwanted email. Some legal jurisdictions havelegislation that definesspam very narrowly. If you insist on blocking "spam",you *WILL* end up
spending a lot of time and money in court cases where...
1) the spammer insists that his spam is "not-spam"because of sometechnicality. Expect to see lots of legal "is not spam;is so; is not;is so; is not" being billed at lawyers' regular rates.And of course,you can rest assured that the politicians who enactlegislation willmake exemptions for solicitations for campaigncontributions. Any"spam-filters" that block any "not-spam" *WILL* get hitwith
cease-and-desist orders

That is one further reason why the RECIPIENT should be theperson to judge what they are and are not willing toreceive, and from who. Senders basically have no legalrecourse if somebody chooses to delete that sender's mailfrom their Inbox, whether they have read it or not.

2) saying that Joe Blow sends spam is equivalant tocalling him aspammer. Watch the defamation (libel/slander) lawsuitsfly.

There have already been such suits against blacklistmanagement organizations.

However, if you block "unwanted email" rather than"spam"...
1) spammer says "wahhh, wahhh, wahhh, my 'valuableinformation' is'not-spam'" and you can enthusiastically agree. The thecustomer stilldoesn't want it. "Because I said so" should besufficient reason.

Right. And the recipient can reasonably set (evencompletely arbitrary!) rules to determine that they do anddon't want delivered to their Inbox.


[snip]

Similarly, don't try to define "the S-word" intechnical terms. Abunch of geeks sitting at their keyboards are no matchfor a nit-pickinglawyer who was the captain of his class debating team.It's effectivelya pro se defense against high-powered lawyers, and theresults are verypredictable. Don't engage in a battle you can't win.Go with...- our customer says he doesn't want your emails. No,we don't know
   why he doesn't want your emails.
 - the customer is always right; end of story.
 Don't give the spammers' lawyers anything to attack.


Bingo.

 - I am a customer of clss.net (Aurora Internet)
- they have a modified Qmail that generates 550SMTP-stage rejects(i.e. *NOT* a DSN) based on a customer-configurablecontrol file inthe customer's home directory. There are separaterule files forsub-accounts. E.g. I point my domain MX at theirserver. abuse andpostmaster are basically unfiltered compared to thisaddress.
- step 1 is to declare a whitelist of emails that Iaccept
   unconditionally

That's good, but I basically want finer control thanthat... I want to be able to open up the window (like thekeyway on a lock) to allow the messages in that I expectfrom each sender. Even a sender that I would accept anexecutable attachment from, I might refuse a messagecontaining ActiveX or JavaScript.

- I don't want email from residential machines ondynamic IP addressessending direct-to-MX. So I block based on dynamic IPDNSbls, regexpfilter against rDNS, and obviously block email frommachines with no
   rDNS whatsoever.

Obviously you can (and should) set the rules however youwant, as recipient. I wouldn't want, for example, myISP(s) forcing those same rules on me.

- I don't talk to myself. I don't want email frompeople who lie intheir email, by including "waltdnes.org" in the HELOor return-path.
   So I block those emails.


Certainly reasonable!

- I don't want email from certain countries, so I blockthem, using
   country-codes in rDNS and return-path

Also reasonable enough, as long as you are setting thoserules for yourself. Personally, I WILL accept(legitimate) mails from just about any country anywhere(including particularly countries I have visited, andthat's a list of almost 50 countries). And on my travels,I have sent E-mails from (say) Beijing. I would beannoyed if those E-mails had been blocked just because Ihappened to have sent them from China.


Again, your Inbox, your rules.

 Executive summary...
- blocking email, because it meets some technicalcriteria, is easier
   on the technical side, but introduces legal problems
- blocking email, because the customer said so, may beharder
   technically, but avoids legal problems
- any complications on the anti-spam side areoutweighed by equivalantcomplications on the spammers' end. ISPs will haveto enable endusers to configure their own rules, and everybody'sfilters andwhitelists will be slightly different. Imagine howspammers willfeel knowing that each of several million targets fora spam-run hasa slightly different defense, that has to be overcomein order to
   deliver the email.

EXACTLY. But also, knowing that all the classical rusesto avoid spam classification (text as image, embeddedlinks, attachments, scripting, disguised HTML links, etcetc) are a priori denied them.... certainly takes a majorbite out of spammers.

And only allowing executable attachments, HTML, and "big"messages from known/trusted senders basically eliminatesE-mail as a vector for virus/worm propagation, which takesa big bite out of spambot zombie recruitment. That, allby itself, is a huge improvement in the spamdetection/blocking situation.



[comment #5]

All I can say is, you are certainly welcome to block anymail you please,and no cooperation from other MTA operators is required,nor is anymeeting of the IETF. The only purpose for the IETFinvolvement is tocoordinate cooperative action. Since the IETF isvoluntary, the actionneeds to be of benefit to all participants, and thatgreatly restricts thefield of actions practical for widespreadimplementation. But it doesn'tin any way restrict what you as an individual can do.

That's certainly true, and one advantage of fine-grainedrecipient blocking is that it doesn't require any greatworldwide consensus, nor any re-engineering of Internetinfrastructure.

What WOULD be helpful, though, would be a recognition bythe IETF that:

a) such fine-grained per-sender by-recipient blocking(and hopefully augmented by subsequent content scanning)is an effective and desirable approach to the problem, and

b) in the general case, blocking of all non-whitelistedE-mails containing HTML, scripting (probably covered underHTML... is it possible to put in scripting without HTML?),or attachments is a "best practice". (It is probably agood idea to suggest including a maximum message size,too, as a way of preventing "denial of service" attacks bysending big E-mails to someone which would be expected tofill their E-mail inbox to overflowing, blockingsubsequent legitimate E-mails).

That would at least provide a direction forward whichwould make for a huge improvement, avoid the legal issuesof blocking e-mails too crudely, and take a big bite outof spambot zombie recruitment. What's more, (as waspointed out by another post), having millions of differenttarget recipient, each with different delivery criteria isa far more daunting challenge to spammers.

Since your method requires no cooperation from any otherMTA operator, itdoesn't require any endorsement from this group.

Right, no endoresement is NEEDED, but (like theintroduction of the original IBM PC) it would be nice tohave it recognized as a useful direction. Spammers arefar more likely to be dissuaded from attempting to sendHTML-based or attachment-based spam if it is RECOGNIZEDthat it's unlikely to be delivered, rather than it justdisappearing down a black hole somewhere and leaving thembelieving that it's still a viable technique.

That is fine - it doesn'tmake your method illegitimate or anything like that. Butmost users wishfor a cooperative anti-spam technique, because theyreasonably expect itwill work better, and they reasonably expect many otherMTA operators tocooperate with them.

And, if that's enough to satisfy them, chances are goodthat the (cooperative!) "default" case (no HTML, noattachments, messages < some maximum size, and messagepassed by SpamAssassin or similar) would alreadyconstitute a MAJOR improvement over existing spamblocking. The whitelisting capability mostly just givesthe recipients the opportunity to tweak things further,opening the keyway to allow more risky mail if they sodesire, or to block stuff they don't want that the ISP'sdefault scanning would still let through.

This has been true in the past -consider the manyDNSBLs and other activities against spam. When we kept alist of spammingIP addresses sending to our MTA, we found after 2 weeksthat only 1% ofthe IPs had send more than one message. Our subscriptionto Spamhaus killsabout 65% of incoming messages. That is a victory forcooperation and itmakes us think that more cooperation might be better.

Again, the problem is the degree of collateral damage thatIP-based blocking produces. I consider that to beunacceptable, and perhaps creating legal liability. Now,if the USER implements IP-based blocking, that's THEIRchoice and I don't believe any court would rule againsttheir right to do that. But an ISP is a very differentsituation.

It is true that cooperative actions attract lawsuits,but that is onlybecause it isn't practical to sue an individual forrefusing mail,

Not only is it not practical, but they have the ABSOLUTEright to read or not read anything given to them(certainly at least anything delivered by E-mail!).



[comment #6]

[how users configure their whitelist rules]

The problem being that out of the 60,000 seats here,perhaps less than10 of them are able to competently configure a set ofrules like whatyou have.

That's a software implementation issue, not an inherentproblem in the approach. I envision a button to click onthat simply says "allow E-mails like this from the samesender in the future" and where the software will open thekeyway JUST enough to allow that type of message if seenagain from that sender. How that recognition isaccomplished, whether by something crude like simpleGREP-type scanning, or something brain-damaged like RegExpattern matching, or something still more sophisticatedlike the pattern matching SNOBOL/SPITBOL offers, or even adifferent sort of statistical ranking/rating approach likecontent scanners use... will vary from one implementationto another. The final products will probably use acombination of techniques.

Many of them don't even have a clear notionof the concept of"source IP" is, let alone being able to make reasonablechoices of, say,knowing why you'd want to block dynamic IPs or IPs inKorea.

Again, I consider IP-based blocking to be inherentlyflawed, to the point where I consider it a dead-end.

Furthermore, and with complete irony, I'll note that theonly reason Iread this thread is that my very own, personallytrained, UA bayesian
filtering flung it all in the junk folder! ;-)

:-)

Yeah, I admit that I usually at least cast a cursoryeyeballing of the Yahoo mail "spam" folder too, ratherthan just emptying it. Occasionally I -do- find anon-spam message there. (Although that happens seldom, asI almost never give that E-mail address to anybody... It'salmost useful as a "personal honeypot" to see what's beingspammed out, before going to my more usual E-mail accountsand possibly wondering if that curious E-mail just MIGHTbe legitimate).

We're achieving effectiveness rates in excess of 98%with our "one setof rules" server based defences. My personal account,which receives400-600 emails/day, has 100 or more spams/day filteredout by thecentral server solution. I usually go a week or sobetween spams thatget past those central filters - I see _many_ more FPswith my bayesian
than I see spam getting through.

There will be FPs and spams get through, probablyregardless of what filtering technique you use. Theimportant thing is that the RECIPIENT controls that, sothey can decide the rule that determines what gets blockedand what gets through. That way they don't have to wonderwhat SHOULD have been delivered to them and wasn't.

My personally trained bayesian filtering has anabsolutely abysmal trackrecord.

Spammers have gotten good at throwing enough random junkinto E-mails to confuse Bayesian filters.

On the spam aimed at the false positivehandling address, whichby design has _no_ filtering, Bayesian has aneffectiveness rate ofabout 50%. Yuck. No amount of personal twiddling,custom rules,explicit pattern matching in my UA is going to make muchdifference to that.

Some E-mails are going to get through. But making surethat they are (a) small, and (b) not "dangerous" at leastreduces the impact of those.

And meanwhile, giving the recipient the ability to atleast not see the SAME kind of stuff over and over again,if they choose to use those features, demonstrates theISP's trying to give the user the tools to reduce thefrustration.



Gordon Peterson
http://personal.terabites.com

1977-2007 Thirty year anniversary of local areanetworking


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg