Re: [Asrg] What is Reputation Service

On Tue, Jan 25, 2011 at 8:31 PM, Steve Atkins <steve(_at_)blighty(_dot_)com> 
wrote:


On Jan 25, 2011, at 5:06 PM, Dotzero wrote:

On Tue, Jan 25, 2011 at 4:16 PM, Paul Ferguson 
<fergdawgster(_at_)gmail(_dot_)com> wrote:

On Tue, Jan 25, 2011 at 1:14 PM, John Leslie <john(_at_)jlc(_dot_)net> 
wrote:


  Reputation (as the name implies) is a prediction of the likelihood of
near-future behavior.


...based on previously observed behavior.

- ferg


So, what exactly does this mean when behavior suddenly changes? If a
domain or IP address (was well behaved yesterday) but begins spewing
badness today, what will your company do as an arbiter of whether mail
is accepted by your customers? Will you allow that domain or IP
address to spew badness?


"badness" is hard to measure at scale once you've removed the
obvious botnet spew from your mailstream.

I highly doubt it.


Depends. If it's sending a lot of wanted email, a spike of obviously
unwanted email and a middle ground of mail which you can't
decide about you're always going to want to deliver the wanted
email, and you're always going to want to block the obviously
unwanted email.


Wanting and doing are two different things. In many cases the
evaluator isn't going to get that granular. This is particularly true
where the unwanted mail is somehow malicious, not simply unwanted.

But the mail in the middle is harder to decide what to do about,
and that's where sender reputation helps.


I'll grant you that about the middle. But that's not necessarily "badness".

As some point once the
spewing has subsided you may (automatically or manually) again start
allowing traffic through from that domain or IP address. But that
isn't really reputation in the traditional sense of the word.

But that brings me back to my original question. If reputation doesn't
prevent a site from getting throttled or blocked when it goes bad,
what does reputation mean?


Reputation is, loosely, the past history of the sender from a decade
ago through to a second ago. It's not a simple integer (0=bad, 100=good)
however much people want to map it onto that. It includes, at least,
traffic volumes over time and fraction of email that was wanted vs
not wanted over time.

Comparing the past history of the sender (over a period of months)
to the current behaviour of the sender recently (minutes to hours) can
help you guess what the sender is up to, categorize them into one
of several "bins" automatically and treat mail from them appropriately.


But the case that originally triggered the discussion is false
positives. That is a case of treating those messages inappropriately.

If a sender has a history of not sending any email at all and you suddenly
see a lot of email from it then you can categorize it as "probably a
compromised end-user machine".


A slightly different issue. For emitters of large volume mail streams,
the bad stream may still only be a percentage of the overall mail
stream.

If it has a history of sending 1000 emails a day and it suddenly starts
spewing 100k then the change in behavior lets you categorize it
as "tiny smarthost with compromised system" and maybe block it outright.

But if a site has a history of sending you large volumes of wanted
email over a long period, and you suddenly see a spike of unwanted
email then you're likely to assume that it's a transient problem (bad
customer, perhaps) and that once you've notified them of the problem,
they'll fix it. Meanwhile you'll keep delivering most of the email from
them on the assumption that it's mostly wanted.


I think you'll find (At least what I have seen) is that they will
throttle the overall stream (for example Yahoo! - 421 too many
complaints) or implement a temporary block (AOL), etc.

They're all reasonable decisions. And they're things that can be
implemented as a set of business rules driven by, amongst other
things, the short term and long term history of the sender ("reputation")
in an automated way that doesn't require much per-sender configuration.
Plug n your policy, let it run, watch your statistics and tweak.


I'm still radiating skepticism. While I agree that what you describe
is the optimal approach, I'm not sure it matches reality.

It doesn't particularly protect the site
from the immediate consequences of going bad. It appears that the
responses are authoritative (this domain or IP is currently emitting
badness) rather than reputational (this site has a good reputation so
I will accept badness from it on the presumption they are going to
address it). I will grant that there may be some small slack cut based
on reputation but does it really extend that far?


Yes. Filtering out "obvious" spam is easy. Recognizing "obvious"
1:1 email between regular correspondents isn't too hard.


Various mailbox providers exhibit varying degrees of dealing with the
issue on a message basis rather than a stream basis. To the extent
that messages are diverted to a spam folder, we can infer the
confidence level that the receiver has in the capability of their
systems. That is, if they felt they could analyze individual messages
perfectly thent here wouldn't be a need for a spam folder. Ham would
get delierd and spam would go to /dev/null.

Dealing with the big grey area in the middle is hard, and sender
reputation is about the only thing that gives you anything to base
delivery decisions on there, so for the big middle ground of email
good reputation will take you a long way.


Excellent response Steve (As usual) and I agree with you about the
middle. Remember the context of the discussion though. The triggering
question was whether one can make decisions directly based on SPF,
DKIM or a combination of the two. Doug stated that one could not. My
position is that you generally can (combinations)  for domains that
have good control over the mailstreams (particularly abused brands
such as financials).
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg