ietf-asrg
[Top] [All Lists]

Re: [Asrg] where the message originated (was: DKIM role?) (SM)

2009-01-23 05:41:19


--On 22 January 2009 23:47:53 -0500 Rich Kulawiec <rsk(_at_)gsp(_dot_)org> 
wrote:

On Wed, Jan 21, 2009 at 10:48:08AM +0000, Ian Eiloart wrote:
I guess that depends on the nature of the RBL. Some of them really are
reputation systems. IP addresses get listed because someone has seen
spam  coming from them. Spamhaus' SBL is an example. If you don't agree
that   that's a reputation service, please explain.

I don't agree that's a reputation service.  It's a binary flag (on a
per-IP basis) saying "this address has been observed (at one or more
observation points) as sending spam".  And that's all it says -- and it
only says it about a subset of IP addresses (those seen emitting spam)
and only about a subset of those (those seen by Spamhaus-affiliated
sensors) and only about a subset of those (add some notion of "recently"
-- an address that spewed in the past may not be listed now).

That's highly useful information if my goal is to block spam, but it's
only marginally useful if my goal is to do anything else.

(BTW: there's only one RBL, and Spamhaus doesn't run it.  They run
DNSBLs.)

I stand corrected.

Maybe I'm just quibbling over the definition of "reputation service",
or maybe my definition isn't broad enough.  But I don't think of DNSBLs
or RHSBLs that way, yet.  I'll mull it over.

OK, I see where you're coming from now. I guess we might agree that DNSBLs collectively provide a minimalist reputation service.

I do think that DNSBLs speak to the reputation of some email emitters. In that respect, I'd argue that even a single data point constitutes a comment on reputation, and it it's available to me then it's providing me with a service - even if the service is no good. However, even a single datapoint could be a useful service if it stopped lots of spam for me.

Collectively, DNSBLs provide more than binary information about a single IP address. For example, you could consult a dozen of them, and compare a weighted sum of their responses against a threshold determined in your local anti-spam policy.

I disagree that they say nothing about addresses that they don't list. Silence is a comment, in this context. Provided that list policies apply to all IP addresses equally, then they are commenting on all IP addresses. However, if a DNSBL had a policy (explicit or implicit) that it would never list a certain address range, then the service ceases to be comprehensive. It doesn't cease to be a service, though.

However, currently it's hard to know what to whitelist. There's only one
 widespread, easy to use mechanism for managing information about which
IP  addresses an organisation is likely to send messages from - that's
SPF.  OK, so if you wanted to be sure to get mail from me, you could
whitelist  my /24 address block, but are you sure that I'd keep you
updated if we  outsourced our email?

But this is not one of my pressing concerns: oh, it's not totally
off my radar, but it's far down on the priority scale.  I'll try
to explain below.

Yes, the problem of course is when a spammer forges a domain that I'd
like to trust. If I'm filtering mail from the domain of my chief
funders,  then false positives can be really painful. If I whitelist
them, then  spammers can easily bypass my filters. So what I'm
discussing IS all  about forgery.

If you are efficiently blocking spam, then this may be somewhat
of a non-issue.  (It pretty much is for me.)  Let me illustrate
with an example: traffic was presented a little while ago on
port 25 from 123.140.212.144:

        I could have sanity-checked the HELO.

        I could have run it through SPF or similar, but didn't.

        I could have waited for the data phase and run the content
        through SpamAssassin and/or ClamAV, but didn't.

        I could have looked up rDNS to see if it existed, but didn't.
        (Or, having it looked it up and found it to exist, checked
        the domain against various RHSBLs.  And/or for MX sanity.)

        I could have checked the IP address it against various DNSBLs.

        Instead, the mail system noted that it's in Korean IP space,
        which for that mail server is a 100% source of spam and a 0%
        source of mail.  So it was immediately rejected.

So maybe it was a phish with forged sender address at Paypal --
and maybe I could have figured that out via one of the methods
that have been discussed here.  But my point is that I didn't
need to, because I knew it was spam before getting that far.

Repeat this for myriad variations -- use of various DNSBLs, use of
the Spamhaus DROP list, various country allocations, thousands and
thousands of spammer domains, dynamic/generic IP and name space,
and so on.  What I've found is that if I'm sufficiently aggressive
about blocking spam sources (including pre-emptive blocks)
that I don't need to worry so much about what's in the spam -- forged
headers, bogus URLs, etc. -- because it never makes it to the point
where I have concern myself with any of that.

So my response to someone who says "I'm getting a lot of forged
traffic claiming to be from Paypal and I need an anti-forgery
method to figure that out" is "No, you need to be a lot more
aggressive about blocking spam.  AFTER you do that, you should
re-assess, see if this is still actually a problem for you, and
then, maybe, you might consider anti-forgery technology of one
sort or another, or even ad hoc local checks for some specific
cases, like maybe the credit union that serves your university."

Does this kinda clarify where I'm coming from?

Yes, it does. However, I'm an admin for a University with students from most countries in the world, and academics that work in most countries in the world, and studying every topic under the sun - including SPAM! So, it's very difficult for me to be that aggressive. I certainly block IP addresses according to country allocation.

That's why I need more information about who the IP addresses belong to. Without that information, and with the prevalance of sender address forgery, the IP address is the only real information that I have about a message before it's too late to reliably apply recipient specific filtering.

I think perhaps I have this viewpoint because my focus is on my biggest
(ongoing) problem: what to do about the 99% of incoming mail that needs
to be rejected outright before it can get anywhere near a user.

You mean you want to know how to identify it? Or what to do with it
after  you've identified it?

The former -- because my approach to the latter is "issue a reject,
hang up, move on" in all cases.  (Although I should mention in passing
that DNS lookup failures get a 4XX 'cause maybe their DNS is hosed,
maybe mine is hosed, maybe transport is fubar.)

The former is tougher, because -- despite my aggressive approach
to spam, I don't want to deal with a high FP rate.  I've come to
conclusion that one approach which seems to work boils down to
"know your email": study traffic patterns, inbound and outbound.
Every mail server (that I've ever looked at) has different
characteristics, and if you can figure out what they are, then you can
twiddle the knobs in very server-specific ways that minimize FN and FP at
the same time.  (See example above, which clearly would not work at
all on some other servers -- say for a research university.)

Of course, this takes time and patience -- but I'll argue that
we should be trying to extract the most from the methods we already have
(like the ones I tossed out above), that we understand fairly well,
and that we know work on a large scale in production environments,
before we try to invent and deploy new methods.

---Rsk
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg



--
Ian Eiloart
IT Services, University of Sussex
x3148
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg