ietf-asrg
[Top] [All Lists]

Re: [Asrg] Quarantines and block lists

2007-01-27 20:53:58
[On second thought, I DON'T want to define spam as UBE. Labeling UBE as spam labels the sender as a spammer which carries a negative connotation. I just want to block UBE without calling anyone a spammer and getting sued]

Again, all these problems are eliminated once you allow the RECIPIENT to determine what they do and don't want to receive and read.

I think it's also worth considering a subtle but perhaps important distinction between recommending that the recipient customer delegate the choice(s) to the ISP, as compared to "suggesting" rules that the recipient customer can choose to select (or not).

A litigious spammer could still sue an ISP for implementing rules (especially if those rules change after the customer recipient delegates the authority) which disadvantage the spammer.

Here is my proposal. I'm sure I haven't thought of everything so please tell me where it breaks or how spammers are going to game it. The goal is to stop the majority of the spam while not loosing ANY legitimate email (broken MTAs don't count).

To begin with, crude IP blocks will ALWAYS lose legitimate mail, because a single IP address can send both legitimate and zombie-generated mail. I propose that NO ISP-based software will be capable of differentiating one from the other... but the recipient possibly will know how to do that, at least for familiar senders.

This is not intended to replace existing blocklists but to handle the initial onslaught from a spam source until the source can be properly categorized and listed.


Proposal #1 is a quarantine for new sources:

Connections from unknown source addresses would be initially rejected by returning a temporary error code (4xx) for a probation period. The probation period could be as short as 5 minutes depending on how fast the blocklists can respond. The reject would occur after the recipient list is accepted so the recipients can be checked against local spamtraps and valid user lists. After the probation period, global blocklists would be queried to find the current reputation of the source.

While spamtraps and valid user lists are helpful sometimes, they will not trigger if the spammer is using "fresh" E-mail address lists which are also devoid of spamtrap E-mail addresses.

Proposals #2 & #3 are for a fast global block advisory and distribution network

Any email addressed to a spamtrap would generate a report to a local trap handler. The local trap handler would forward appropriate reports to the regional or global blocklist managers. The blocklist managers would publish advisories based on the reputations and counts of the traps that were hit by the sending IP.

IP address blocking, again, is simply too crude. A further and more serious problem is that you have basically no way to be 100% sure of who the "sending IP" actually is.

For just one example, Yahoogroups send millions of E-mail messages per day, from (doubtless) a large server farm. Different Yahoogroups vary dramatically in terms of how attentively they are moderated. Some are moderated with great care; other groups have been abandoned or stranded by their owners and have been operating on autopilot, some for many years, and many such groups have been 'discovered' and are heavily abused by spammers. (Indeed, it seems that many Yahoogroups send very little OTHER than spam). So what IP address(es) do you plan to block? All of Yahoogroups? That's obviously a non-starter.

Or let's say that Aunt Gertrude's machine gets infected by a worm, which starts sending out E-mails using Aunt Gertrude's account at the ISP, via the ISP's mail server. Are you going to block everything coming from that mail server? What if it is a VERY large ISP? Obviously, it would have been nice if that ISP's outgoing mail rules had caught the messages, but in this case let's recognize that they didn't.

If the source is emitting a sufficient volume of spam this happens before the initial probation period expires.

Again, that's part of what I said about spammers engineering their mails so they squeak under whatever threshold is established to trigger spam alarms. That can be done on content, sending rate, or any other generally/widely used criteria.

DNS is probably a poor choice for distribution of the global fast response block advisories.

Obviously, since a change in status doesn't propagate quickly enough.

I think a form of flood distribution through a mesh of peer nodes would be more robust. The distribution would feed local databases that could be queried directly or through a DNS front end. The mesh can handle data from multiple sources and needs no central authority.

And how do you propose to prevent worms from targeting and flooding or doing other Denial of Service attacks on that mesh?

Proposal #4 is back reporting unsolicited email:

If email addressed to certain spamtraps is received from a sender that has registered to accept back reports,

Original sender? A forwarding sender? An aggregating sender? How does the objecting MTA make sure?

accept the mail but return a specific code in the response that the sending MTA can interpret to make a determination if the sender is spamming. Alternatively, the site may register a different protocol to use for the back report. The spamtraps used for back reporting should be different from those reported to global blocklists to prevent disclosing the traps to rogue ISPs that would forward them to spammers. Only some of the traps would be used for back reporting to any given network or host.

I'm not convinced that returning ANY "spamming!" status to any imagined source is really worthwhile. To begin with, I think that most such reports will be discarded or archived, and just create additional worthless traffic on the Net. Secondly, if the "back reporting" can be misdirected (by counterfeiting E-mail or ISP addresses) then the creative spammer can induce the bounce message MTA to effectively be the "proxy" "sending source" of the spam, tricking it into sending the bounce message to the third party. (I used to get a fair amount of such apparently-intentional bogus bounces... and indeed, I still get a lot of antispam bounces into my "catch-all" E-mail account at a vanity domain, generally resulting from spammers who are counterfeiting randomly generated "From" addresses supposedly within my domain).

Proposal #5 is a quarantine while a site is listed in a block advisory.

Established sources that get listed in local or global block advisories should have their mail quarantined while the mail is sorted out. The best place to quarantine the mail is on the sending server. It's already queued there and if it is eventually determined that it should be rejected,

And on what basis or set of rules do you propose that such determination be made? I will suggest that (at least for some non-obvious cases) that determination could be difficult or even impossible to do accurately.

Apart from (as previously noted) the problem of determining after the fact with any real accuracy just who the "sending server" actually is, particularly for multi-hop mail deliveries. It's easy enough to forget that not all E-mail deliveries are simple user->ISP->ISP->user transfers.

Vanity domains that provide mail forwarding are just one example of a way that intermediaries enter into a mail delivery chain. Aggregators/forwarders like Yahoogroups are another.

that can be done with a 5xx on the next delivery attempt instead of sending a bounce.

And what if the next E-mail coming from that IP address is legitimate, non-spam E-mail?

If the sending ISP is on their toes an offending user or host will be shut down quickly and the remaining mail will be delivered when the block advisory is lifted or expires.

And what happens to their legitimate outgoing mail in the meantime?

Local whitelists can be used to allow mail to bypass the quarantine.

That's fine, but (again) I believe that traditional, simple "whitelists" are not nearly adequate. I believe that a viable whitelist MUST comprise a recipient/sender pair of addresses, AND additional criteria regarding what the recipient expects to receive from the sender in question.

More to the point, once THAT is done, I believe that IP address blocking becomes essentially superfluous (and indeed, COUNTER productive).

-- Dan Oetting


[comment #2]

Subject: HTML-burdened email (Re: [Asrg] Re: bounces, and anti-spam principles)

} I don't consider that ANYBODY has the right to blindly } haul off and presume that I'm willing to accept } HTML-burdened E-mails from them.

You're welcome to take that position, but it doesn't seem to fit with the rest of your argument that what gets accepted should be controlled on a per-user basis.

Yes, it does.

Certainly a user COULD configure their default behavior to allow HTML (perhaps a limited 'subset' of HTML) from non-whitelisted senders. I wouldn't consider that anything remotely like a "best practice", and one would hope that the great majority of users would use the recommended defaults if they wanted effective spam (and virus/worm) blocking. Educating the users toward that end would be to the ISP's advantage, obviously.

Elsewhere you wrote:

} Even 2% legitimate traffic is too much to block, if that } 2% is critically important stuff.

How is blocking legitmate email because it contains HTML less wrong
than blocking it based on point of origin?

1) You in the general case will have difficulty determining the REAL "point of origin". Whether a message contains HTML or not is far less subjective.

2) If the sender has not previously been in contact with the recipient, they can not be sure that the recipient is capable of handling HTML-burdened E-mail (they might have, for example, a text-only E-mail client). Sending HTML-burdened E-mail BEFORE establishing that the recipient is willing and able to handle it violates every "best practice" for that area. (I have gotten HTML-only mail from senders with "clickable links" to say I want plain-text mail.... which obviously is laughable for the great majority of less-tech-savvy users).

3) Content-based antispam determination can be FAR more efficient and FAR more accurate if all the various HTML-based ruses are simply banned from the get-go. Users who suddenly see a great reduction in spam, pfishing, viruses and worms (and with the explanation of why) are likely to quickly become believers. :-)

} It doesn't matter. You simply block (by default) ALL } images (whether embedded or attached) coming from } unfamiliar/first time senders. If they want to send } images, they first negotiate that permission with the } intended recipient. (And that permission, of course, can } be revoked by the recipient if the privilege is abused).

[...]

} There is NEVER the NEED to send HTML/attachments/scripting } to someone in an INTRODUCTORY E-mail. Not until you've } established that they are willing and able to receive that } kind of content from you.

This presumes a whole lot of effort on the part of the recipient in a
variety of circumstances.

No. If the software is well-written, the effort required by the recipient is relatively trivial.

It could be as simple as NO effort, if (for example) the introductory E-mail from a not-previously-seen sender is NOT marked as spam by the recipient upon reading it (which might enable, for example, a slightly more liberal filtering on subsequent mail from that same sender). Again, that's a detail of the implementation of the client software, where presumably clever programmers can do good things. :-)

(1) I'm shopping at a major retailer's website and decide to sign up for their newsletter. What happens if the confirmation message uses HTML content (not uncommon)?

Obviously, it SHOULD NOT use HTML content, for the simple reason that said retailer CAN NOT be certain that the person asking to be on their newsletter list is able to handle (or WANTS to receive) HTML-burdened mail messages.

The first such "welcome to our mailing list" message ought to be sent in plain text, asking for permission to send HTML (or, perhaps better, at the time the user requests to be placed on the mailing list, they should be able to specify HTML or plain text, and with the explicit mention that if the HTML option is chosen, they might need to enable that option from the sender in question).

But let's presume everyone has bought into your scheme, so a plain text confirmation arrives. I now have to take the additional step of confirming (where? has my email client but updated to include an interface for this?)

Ideally, yes, the client software ought to include awareness of and support for this capability. I've been trying to talk Microsoft into adding such a capability into both Outlook and Outlook Express, for example. (Who knows whether I've caught the ear of anybody worthwhile with the concept yet, or not).

that I'll accept HTML from the same source in the future.

Hopefully that's capable. One approach would be for such an "aware" client to pop up a box with a "this is the first time you've gotten E-mail from this sender, what do you want to do with mail from them in the future?" dialog box. There could be a variety of options (discard, forward, redirect, archive, I'll decide later/ask me again for future nonconforming mail from this sender, whatever). There could be different choices for "conforming" versus "nonconforming" messages, perhaps. I don't think we have to design the user interface to that level HERE, in any case.

Assuming that the confirmation
and the newsletter come from the same source, that is, which also is
not always the case.

True, although if that is recognized as important to legitimate mailers, they can certainly correct such questionable behavior.

(2) I'm working with an HTML design firm. They're doing a rush job for me, which has to be confirmed on Saturday so it can be turned over to my IT staff to be installed on the company website before Monday morning; but I'm going to be out of town, so I want a copy of the work sent to another address such as a freemail service so I can easily review it.

1) You can certainly simply have them post their work to a Web site (theirs?) and send you an e-mail with the URL.

2) If you're working with an HTML design firm, obviously you can whitemail their E-mail address to send you HTML.

The HTML contains images and scripts, but the design firm has never sent mail to that freemail service before, and I don't know which of their team of designers might be sending the final product.

Either ask, or have them post the page to a (private) Web site where you can go and review it.

Even if you don't know which specific designer might be sending you the message, nothing in my proposal would prevent (for example) the capability of the software allowing you to whitelist by domain, regardless of what user there sent the mail (you might have a different "catch-all" default permissions list than the per-user permissions list).

How much effort must I go to up front to be able to tell them they can
send me an attachment that I know will reach me?

Again, I don't see a problem here.

(3) My biggest customer is a giant corporation which requires its employees to use Microsoft Outlook and to attach a vCard-like signature to every email, using a background stationery and having an image of the company's latest logo because they're engaged in a major rebranding initiative. Not surprisingly, they have massive turnover, so the people there who send email to a variety of people at my company changes every several weeks. Who's responsible for upkeep of all of those individual agreements to exhange HTML with each other?

Again, the recipient could presumably set up a "catch-all" whitelisting which would enable selected E-mail HTML features and selected attachment types from any user at a given company's domain. For example, allow (hell, even REQUIRE) V-card attachments or JPGs for mail from that domain, but (for example) not allowing scripting, executable attachments, or ActiveX.


[comment #3]

Subject: Re: [Asrg] Re: bounces, and anti-spam principles

Our overall system is designed towards _zero_ false positives. In at least one way, we're theoretically (and near effectively) there. With
DNSBLs and other techniques.

Obviously, I'd have to reserve judgement until I've experienced your "zero false positives" idea. Let's just say that I've got a healthy
degree of skepticism.

I have 10+ years worth of exposure to this model (I built the thing) with a user population of 60-120K and extreme FP aversion all the way up to the CEO, you're welcome to come visit and I'll demonstrate it for you.

It only fails to be "zero false positive" when the legitimate sender fails to read/follow the rejection notice. Which happens a lot more than I'd like, but if the message isn't worth that much to the sender, it's probably not worth doing much about it on our end either.

1) How do you determine who the actual "legitimate sender" is?

2)  How do you make sure they receive it?

Not everyone has that small a set of correspondents to cope with, and
the "new correspondent" issue remains a big problem.

The total set of correspondents doesn't have to be small. It just needs to have a relatively small number of NEW correspondents that NEED to use "advanced"/riskier features and therefore require whitelisting.

The question is how do you even know of the new correspondents, when you're being bombarded with dozens or hundreds of new correspondents per
day.

To begin with, conforming E-mails from new correspondents (no HTML, no attachments, size within limits, and passed by SpamAssassin or similar) will sail right through. No problem.

Presumably each recipient could configure the "what to do with nonconforming e-amils from unknown senders" option as the user wished. Discard mercilessly, allow me to look at the text part, or whatever.

It's all well and good, if the spammers aren't forging names, to allow them one bite before you null the sender, but today's reality isn't like that. The "one bite" on sender, or even on url or content, essentially means you have to eat almost all spam. Because spammers mutate their content and senders that much.

Absolutely. They generate random From; addresses and random Subject: information (and random other stuff too, like Received: lines indicating bogus original senders). But of course, we here all ought to know that full well.

That's why my personal recommendation for "best practice" is to simply block -all- HTML messages from unfamiliar senders, on the idea that they are trying to conceal something, or at the very least are being inconsiderate and clueless enough that I don't want to hear from them.

Others might well choose diffeently, of course.

We're tracking spammers who use in excess of 1000 different domains in urls. Can per-user techniques cope
via whitelisting or blacklisting?  Not a chance.

That's what makes blacklisting not work... there are infinitely many E-mail addresses you need to blacklist.

A fine-grained whitelist (and one which allows mail from reasonable but non-whitelisted users) DOES offer a finite task, that can be relatively easily handled.

The only way your techniques can work effectively is if you assume that the recipient is getting very little spam.

I don't think it's particularly volume-sensitive... other than the fact, of course, that ANY system could conceivably be flooded by a sufficiently determined DoS attack.

Well, 50% of our users are
being sent almost no spam.

Your users are very fortunate. Perhaps they haven't had those E-mail addresses for very long, or don't do much with them.

But that doesn't help with the guy getting
sent 4000/day.  Or even the hundreds getting 50 or more.

I don't see why my approach won't handle such cases with ease.

The only way your technique works is if you're one user with a limited variety and volume of spam. But not everybody is in the same situation
as you.

Again, I don't agree with your conclusion. My proposed default rule and fine-grained whitelisting (combined with SpamAssassin or similar content scanning on the remainder) with have a VERY high efficacy at detecting and dealing with spam.

Ideally, each such correspondent only requires ONE click (one time) for the user to agree to allowing them to use the more advanced features. That might be done following an initial negotiation E-mail where the sender introduces themself and requests the ability to send more
elaborate mails.

You seem to be focussing on only blocking "advanced features". A large percentage of spammers don't use them anyway. If all your spam consists of mutating text-only spam with mutating headers, your technique has no
effect.

That's true, if the SpamAssassin component is unable to make a reasonable determination. On the other hand, mail that has passed the initial fine-grained-whitelist criteria is at least going to be devoid of most of the tricks which make things VERY problematical for the content scanners.

Some users COULD choose to ONLY allow E-mail from the whitelisted E-mail addresses, if they decide that is the approach that makes them happiest. (e.g. set the "default maximum size accepted" to zero bytes).

Introductory E-mails should NOT automatically presume the desire or willingness of the recipient to receive HTML-burdened E-mails.

But if all spam were indistinguishable from "introductory e-mails",
where are you then?

No worse than at present.

And BETTER, since the "no attachment, no HTML" default rule would seriously cripple zombie spambot army recruitment.

And indeed, probably most spam isn't distinguishable from some vague
notion of "introductory e-mails" by your technique.

That at least switches the problem to a relatively simpler problem of improving the antispam content-based detection... i.e. making a better version of SpamAssassin.

At least the future SpamAssassin is relieved of nearly all of the nearly impossible cases it is forced to (not?) deal with today.

The main problem with content filtering is caused by ruses based on HTML and attachments. These techniques serve to obscure the content of the E-mail, and ALL BY THEMSELVES the presence of such content in E-mails (at least in E-mails from unfamiliar senders) can be a priori evidence
of hostile intent, or spamming.

If you can _detect_ those ruses.

Of course. Some of my filtering code I've written, for example, looks for things like "obscured" URLs, or obfuscated IP addresses in URLs.

You and I may be able to detect it in on a per-message basis with visual inspection, but computers generally can't, say, identify sentences generated using random words. Try building a filter for hipcrime some day... Plain text. Random headers.
No "advanced features".  Good luck.

You're also faced with things like foreign language E-mails (Chinese, for example). Even using grammar tests, E-mails might have a large amount of bogus respectable-looking content (say, ten jokes followed by the 'payload' spam). Again, at some point you're going to see some spam slide through. I believe my approach would take a BIG bite out of it, though.

Much of the recent evolution in spam content techniques has been to get away from ruses that are detectable as ruses. It used to be a common technique to insert random invalid html tags in spam. Defeatable by simply looking for invalid html tags or meta (eg: tag)/content ratios. But they don't do that anymore, do they?

I don't know... since the filters I most recently wrote T-canned all that stuff anyhow. But I don't doubt that spammers will eventually mutate to try to avoid schemes that don't work anymore.

One of my goals is to HUGELY increase that set.  :-)

And meta/content ratios can be _extremely_ high in legitimate email (just look at the gunk that outlook
produces for simple emails some day).

Are you talking about HTML?

Look for identical gifs? Well, they stopped emitting identical gifs
months ago.

I'd block ALL mail containing GIFs (or any other kind of attachment) by default, from unknown senders. Such tricks hit a solid brick, impenetrable wall.

I had some luck with looking for gif geometry (scan for gif prefixes containing identical image dimensions). That stopped working too.

Yeah, I don't think such schemes have much potential.

Again, when you have a LOT of users (and possibly MANY servers) behind a NAT router, denying mail from that IP address results in simply too much collateral damage. More to the point, it's a very blunt instrument for the job, and it's relatively simple to do very much better.

So far, there's no indication of the latter being true.

I do note you don't refute my objection on principle. :-)

But in fact, I do.

Let's cut to the chase. Your argument about source IP blocking being a blunt instrument is well-taken and true. It _is_ a blunt instrument.

So's a sledgehammer. You wouldn't use a sledgehammer to install window trim would you? But a carpenter probably still has and uses a sledgehammer for a different part of the same job of building a house.

Your argument assumes that IP-blocking is the ONLY technique being used - - if all you have is IP-blocking, trying to discriminate between good and bad content from a given IP (say an ISPs MTA) doesn't work.

Well, yeah, duh ;-)

Your argument assumes that the administrator hasn't made any effort to make an intelligent choice on which DNSBL to use - and is hence at the
mercy of capricious listings of any old random IP.

Well, yeah, if you've chosen to use BLARS, you get what you deserve.

Yup both true.  But so what?

What's to prevent you from doing the intelligent thing and using multiple complementing techniques simultaneously?

Nothing at all, and in fact that's what I'm proposing.

I just happen to believe that IP address reputation is neither necessary nor sufficient (and perhaps not even particularly desirable) as a component in that.

The XBL, for example, is _extremely_ reliable at detecting compromised end-user machines. All by itself it will block 70-85% of all spam.

Fine. But it also blocks LEGITIMATE mail coming from those machines, right?

By it's very nature it doesn't list real MTAs (eg: ISP smarthosts).

What if the spam is sent through ths "smarthost"? You might claim that's not being done today, but nothing prevents spammers from doing that.

It has
a false positive rate lower than virtually any content filter. Certainly _way_ lower than any notion of trying to block spam based on
detecting "advanced features".

Fine, if you're a believer in IP address filtering, then use that as a component. But I still believe that it's not as good as a better differentiation technique, with less collateral damage.
Why not let the users choose? How many of your users know what the XBL is? About 5 of mine _might_. I'm not even sure you do. How many of them have spent the manhours needing to learn what knobs do a good job?

I think the nature of 'knobs' given to end users will OF COURSE be different than the type of 'knobs' given to IETF-types.

Less than that. How many of them would do nearly as a good job at it?

Few, IF they were adjusting IETF-type knobs. That's NOT what I'm proposing.

Even I, with over 10 years of experience in this game, could not do an acceptable job on email landing in my _own_ mailbox using your techniques.

I guess a lot depends on what you consider "acceptable", and how good the content scanner is that you have.


[comment #4]

Subject: Re: [Asrg] Re: bounces, and anti-spam principles

[how users configure their whitelist rules]

>The problem being that out of the 60,000 seats here, perhaps less >than 10 of them are able to competently configure a set of rules like >what you have. That's a software implementation issue, not an inherent problem in the approach. I envision a button to click on
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I guess that's the main reason why your ideas aren't met with wild enthusiasm here. You "envision" that this would be simple to build and to use, but it's not even clear that you have even built a prototype which you use yourself, much less deployed the system for "ordinary users", which aren't quite sure what an "HTML mail" is, and have never
heard about an "ActiveX component".

I've built and have used portions of what I propose. Obviously it will work best if the proposed default rules are widely used.

As for "deploying it for ordinary users", I think I could make the same claim regarding all these DNS-based proposals that y'all are so fixated on. Again, good software implementation doesn't have to depend on users understanding those details (but it COULD offer that level of detail, for advanced users who cared). Since I don't work at an ISP, and I'm not being paid to do this stuff, what I build and use myself is more a function of what the effort/payback looks like for ME as an individual user.

I don't think where things fall on that particular equation are especially relevant to whether the approach would be a good one for more widespread adoption. :-)

that simply says "allow E-mails like this from the same sender in the future" and where the software will open the keyway JUST enough to allow that type of message if seen again from that sender.

So Uncle Bob will see that a mail from Aunt Matilda was blocked because it contained an "executable attachment". Since he wants to get mail from Aunt Matilda (and Aunt Matilda is a nice lady, she wouldn't send him anything bad, would she?) he clicks on "allow E-mails like this from the
same sender in the future". Oops!

Presumably he could tell from the rest of the content in the mail message that it really didn't look like E-mail from Aunt Matilda. And clicking on the option could be followed by a suitably dire dialog box, explaining the risks of executable attachments and asking if he is REALLY REALLY sure that the mail in question is REALLY from who it claims to be. It could even suggest that the user call the sender on the phone to verify the legitimacy of the E-mail message, if they are not 100% sure.

And, presumably if he later found that his faith was misguided, he ought to be able to later revoke that permission...!

How that recognition is accomplished, whether by something crude like simple GREP-type scanning, or something brain-damaged like RegEx
pattern matching,

You know what the two middle characters in "grep" stand for, do you?

Regular expressions.

Which can be used in more elaborate ways than a simple pass with something like GREP normally allows, thus the distinction I tried to (ineffectively, it seems) make.

Gordon Peterson
http://personal.terabites.com
1977-2007 Thirty year anniversary of local area networking

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg