Re: [Asrg] where the message originated (was: DKIM role?) (SM)

On Wed, Jan 21, 2009 at 10:48:08AM +0000, Ian Eiloart wrote:

I guess that depends on the nature of the RBL. Some of them really are  
reputation systems. IP addresses get listed because someone has seen spam 
coming from them. Spamhaus' SBL is an example. If you don't agree that  
that's a reputation service, please explain.


I don't agree that's a reputation service.  It's a binary flag (on a per-IP
basis) saying "this address has been observed (at one or more observation
points) as sending spam".  And that's all it says -- and it only says it
about a subset of IP addresses (those seen emitting spam) and only about
a subset of those (those seen by Spamhaus-affiliated sensors) and only
about a subset of those (add some notion of "recently" -- an address
that spewed in the past may not be listed now).

That's highly useful information if my goal is to block spam, but it's
only marginally useful if my goal is to do anything else.

(BTW: there's only one RBL, and Spamhaus doesn't run it.  They run DNSBLs.)

Maybe I'm just quibbling over the definition of "reputation service",
or maybe my definition isn't broad enough.  But I don't think of DNSBLs
or RHSBLs that way, yet.  I'll mull it over.

However, currently it's hard to know what to whitelist. There's only one  
widespread, easy to use mechanism for managing information about which IP 
addresses an organisation is likely to send messages from - that's SPF. 
OK, so if you wanted to be sure to get mail from me, you could whitelist 
my /24 address block, but are you sure that I'd keep you updated if we 
outsourced our email?


But this is not one of my pressing concerns: oh, it's not totally
off my radar, but it's far down on the priority scale.  I'll try
to explain below.

Yes, the problem of course is when a spammer forges a domain that I'd 
like to trust. If I'm filtering mail from the domain of my chief funders, 
then false positives can be really painful. If I whitelist them, then 
spammers can easily bypass my filters. So what I'm discussing IS all 
about forgery.


If you are efficiently blocking spam, then this may be somewhat
of a non-issue.  (It pretty much is for me.)  Let me illustrate
with an example: traffic was presented a little while ago on
port 25 from 123.140.212.144:

        I could have sanity-checked the HELO.

        I could have run it through SPF or similar, but didn't.

        I could have waited for the data phase and run the content
        through SpamAssassin and/or ClamAV, but didn't.

        I could have looked up rDNS to see if it existed, but didn't.
        (Or, having it looked it up and found it to exist, checked
        the domain against various RHSBLs.  And/or for MX sanity.)

        I could have checked the IP address it against various DNSBLs.

        Instead, the mail system noted that it's in Korean IP space,
        which for that mail server is a 100% source of spam and a 0%
        source of mail.  So it was immediately rejected.

So maybe it was a phish with forged sender address at Paypal --
and maybe I could have figured that out via one of the methods
that have been discussed here.  But my point is that I didn't
need to, because I knew it was spam before getting that far.

Repeat this for myriad variations -- use of various DNSBLs, use of
the Spamhaus DROP list, various country allocations, thousands and
thousands of spammer domains, dynamic/generic IP and name space,
and so on.  What I've found is that if I'm sufficiently aggressive
about blocking spam sources (including pre-emptive blocks)
that I don't need to worry so much about what's in the spam -- forged
headers, bogus URLs, etc. -- because it never makes it to the point
where I have concern myself with any of that.

So my response to someone who says "I'm getting a lot of forged
traffic claiming to be from Paypal and I need an anti-forgery
method to figure that out" is "No, you need to be a lot more
aggressive about blocking spam.  AFTER you do that, you should
re-assess, see if this is still actually a problem for you, and
then, maybe, you might consider anti-forgery technology of one
sort or another, or even ad hoc local checks for some specific
cases, like maybe the credit union that serves your university."

Does this kinda clarify where I'm coming from?

I think perhaps I have this viewpoint because my focus is on my biggest
(ongoing) problem: what to do about the 99% of incoming mail that needs
to be rejected outright before it can get anywhere near a user.


You mean you want to know how to identify it? Or what to do with it after 
you've identified it?


The former -- because my approach to the latter is "issue a reject,
hang up, move on" in all cases.  (Although I should mention in passing
that DNS lookup failures get a 4XX 'cause maybe their DNS is hosed,
maybe mine is hosed, maybe transport is fubar.)

The former is tougher, because -- despite my aggressive approach
to spam, I don't want to deal with a high FP rate.  I've come to
conclusion that one approach which seems to work boils down to
"know your email": study traffic patterns, inbound and outbound.
Every mail server (that I've ever looked at) has different characteristics,
and if you can figure out what they are, then you can twiddle the
knobs in very server-specific ways that minimize FN and FP at the
same time.  (See example above, which clearly would not work at
all on some other servers -- say for a research university.)

Of course, this takes time and patience -- but I'll argue that
we should be trying to extract the most from the methods we already have
(like the ones I tossed out above), that we understand fairly well,
and that we know work on a large scale in production environments,
before we try to invent and deploy new methods.

---Rsk
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg