-----BEGIN PGP SIGNED MESSAGE-----
Our overall system is designed towards _zero_ false positives. In at
least one way, we're theoretically (and near effectively) there. With
DNSBLs and other techniques.
Obviously, I'd have to reserve judgement until I've experienced your
"zero false positives" idea. Let's just say that I've got a healthy
degree of skepticism.
I have 10+ years worth of exposure to this model (I built the thing)
with a user population of 60-120K and extreme FP aversion all the way up
to the CEO, you're welcome to come visit and I'll demonstrate it for you.
It only fails to be "zero false positive" when the legitimate sender
fails to read/follow the rejection notice. Which happens a lot more
than I'd like, but if the message isn't worth that much to the sender,
it's probably not worth doing much about it on our end either.
Not everyone has that small a set of correspondents to cope with, and
the "new correspondent" issue remains a big problem.
The total set of correspondents doesn't have to be small. It just needs
to have a relatively small number of NEW correspondents that NEED to use
"advanced"/riskier features and therefore require whitelisting.
The question is how do you even know of the new correspondents, when
you're being bombarded with dozens or hundreds of new correspondents per
It's all well and good, if the spammers aren't forging names, to allow
them one bite before you null the sender, but today's reality isn't like
that. The "one bite" on sender, or even on url or content, essentially
means you have to eat almost all spam. Because spammers mutate their
content and senders that much. We're tracking spammers who use in
excess of 1000 different domains in urls. Can per-user techniques cope
via whitelisting or blacklisting? Not a chance.
The only way your techniques can work effectively is if you assume that
the recipient is getting very little spam. Well, 50% of our users are
being sent almost no spam. But that doesn't help with the guy getting
sent 4000/day. Or even the hundreds getting 50 or more.
The only way your technique works is if you're one user with a limited
variety and volume of spam. But not everybody is in the same situation
Ideally, each such correspondent only requires ONE click (one time) for
the user to agree to allowing them to use the more advanced features.
That might be done following an initial negotiation E-mail where the
sender introduces themself and requests the ability to send more
You seem to be focussing on only blocking "advanced features". A large
percentage of spammers don't use them anyway. If all your spam consists
of mutating text-only spam with mutating headers, your technique has no
Introductory E-mails should NOT automatically presume the desire or
willingness of the recipient to receive HTML-burdened E-mails.
But if all spam were indistinguishable from "introductory e-mails",
where are you then?
And indeed, probably most spam isn't distinguishable from some vague
notion of "introductory e-mails" by your technique.
The main problem with content filtering is caused by ruses based on HTML
and attachments. These techniques serve to obscure the content of the
E-mail, and ALL BY THEMSELVES the presence of such content in E-mails
(at least in E-mails from unfamiliar senders) can be a priori evidence
of hostile intent, or spamming.
If you can _detect_ those ruses. You and I may be able to detect it in
on a per-message basis with visual inspection, but computers generally
can't, say, identify sentences generated using random words. Try
building a filter for hipcrime some day... Plain text. Random headers.
No "advanced features". Good luck.
Much of the recent evolution in spam content techniques has been to get
away from ruses that are detectable as ruses. It used to be a common
technique to insert random invalid html tags in spam. Defeatable by
simply looking for invalid html tags or meta (eg: tag)/content ratios.
But they don't do that anymore, do they? And meta/content ratios can be
_extremely_ high in legitimate email (just look at the gunk that outlook
produces for simple emails some day).
Look for identical gifs? Well, they stopped emitting identical gifs
I had some luck with looking for gif geometry (scan for gif prefixes
containing identical image dimensions). That stopped working too.
Again, when you have a LOT of users (and possibly MANY servers) behind a
NAT router, denying mail from that IP address results in simply too much
collateral damage. More to the point, it's a very blunt instrument for
the job, and it's relatively simple to do very much better.
So far, there's no indication of the latter being true.
I do note you don't refute my objection on principle. :-)
But in fact, I do.
Let's cut to the chase. Your argument about source IP blocking being a
blunt instrument is well-taken and true. It _is_ a blunt instrument.
So's a sledgehammer. You wouldn't use a sledgehammer to install window
trim would you? But a carpenter probably still has and uses a
sledgehammer for a different part of the same job of building a house.
Your argument assumes that IP-blocking is the ONLY technique being used
- - if all you have is IP-blocking, trying to discriminate between good
and bad content from a given IP (say an ISPs MTA) doesn't work.
Well, yeah, duh ;-)
Your argument assumes that the administrator hasn't made any effort to
make an intelligent choice on which DNSBL to use - and is hence at the
mercy of capricious listings of any old random IP.
Well, yeah, if you've chosen to use BLARS, you get what you deserve.
Yup both true. But so what?
What's to prevent you from doing the intelligent thing and using
multiple complementing techniques simultaneously? What's to prevent you
from making intelligent choices of _which_ techniques to use?
The XBL, for example, is _extremely_ reliable at detecting compromised
end-user machines. All by itself it will block 70-85% of all spam. By
it's very nature it doesn't list real MTAs (eg: ISP smarthosts). It has
a false positive rate lower than virtually any content filter.
Certainly _way_ lower than any notion of trying to block spam based on
detecting "advanced features".
Does that work to stop Nigerian 419s coming from Hotmail's mail server?
Well, short of blocking them entirely, no. Does the XBL "fire" on the
Hotmail mail servers? No. So you add other techniques to the filters
that will do it.
Does that mean that the XBL is useless? No. Does that mean it has
excessive false positives? No. Does that mean that the XBL has
excessive false negatives - well, yeah, entirely by itself, yes - but so
_what_? Just add more filtering techniques. Do those techniques used
for blocking 419s from Hotmail make XBL unnecessary? Because they're
not perfect either.
Why not let the users choose? How many of your users know what the XBL
is? About 5 of mine _might_. I'm not even sure you do. How many of
them have spent the manhours needing to learn what knobs do a good job?
Less than that. How many of them would do nearly as a good job at it?
Probably almost none. It simply doesn't scale. Unless you're a spam
geek. My management pays me to be the spam geek, so they don't have to
spend hundreds of times as much turning everybody into a spam geek.
Even I, with over 10 years of experience in this game, could not do an
acceptable job on email landing in my _own_ mailbox using your
techniques. We have to resort to the leverage provided by techniques
only possible/practical thru server-based, and discussed in places like
here and other mailing lists. I'm the only one of our users that is on
this list. And most other anti-spam mailing lists. That (mailing
lists) is an important technique too. But it obviously doesn't scale to
everyone in the company ;-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3-nr1 (Windows XP)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
Asrg mailing list