spf-discuss
[Top] [All Lists]

Re: Authentication, Accreditation, and Reputation

2004-08-14 13:12:50
On Sat, Aug 14, 2004 at 03:21:09PM -0400, John Glube wrote:

* A domain has a good or bad reputation based on past
observed behaviour by receiving MTAs, reports received from
individual recipients and others, including operators of
black lists. 


Well, now you're moving beyond conceptualizing reputation, and into
implementation.  More specifically, you're talking about how a
particular reputation system "closes the loop" via feedback, to adjust 
the reputation it reports.  In general terms, a reputation system needs
/some/ feedback mechanism to ensure its evaluation of observed behavior
is consistent with the evaluation of the community it serves.


Question: Will not the quality of these reports vary
depending for example on:

* what is treated as spam versus ham from the recipient's
perspective?

* the criteria used by black list operators in declaring an
activity indicative of spam versus ham?


Yes.  However, this again depends on what a particular reputation system
allows as feedback.  For example, GOSSiP is designed to only accept
automated feedback from a trusted spam filter, and only for a limited
time window after receipt and processing of the particular email for
which feedback is being submitted.  This decision was made in an attempt
to minimize manual gaming of the system, as well as to ensure that 
feedback is timely, uses a consistent metric, and that there is feedback
for most, if not all, messages received.

Other reputation systems may allow human feedback submissions,
submissions from untrusted (and/or anonymous) sources, feedback from
blocklists, etc.  It's an implementation decision based on a more
general reputation system architecture.


Let me elaborate. 

A recipient may say, this message is spam because it is an
email message which I do not want and did not ask for. 

However, an observer may say, (I am going to use an
example) but you subscribed to the mailing list and
verified your consent to receive email about how to build
model airplanes and marketing material concerning model
airplanes, building model airplanes and the like. 

Now the particular message in question tells the recipient
how to build a model sopwith camel and includes an
advertisement for a kit to build sopwith camels. 

The observer may ask "How is this spam as opposed to ham?"

Is this is a correct concern? If so, how does a 'heuristic
measure of behavior' deal with this concern?

Well, my particular solution is as I described above:  I don't allow
human submissions.  Now, you may still have a spam filter that
programmatically instantiates the circumstance above.  GOSSiP deals with
this using several mechanisms:

 * peer reports are kept separate from local observed behavior
 * peers have a trust metric assigned to them and adjusted according to
   how well they agree with other peers, as well as how well they agree
   with local data
 * peers communicate not only their opinion of an identity's reputation,
   but also how confident they are in that reputation rating.

Also, it's important to note that a reputation system provides nothing
but opinions.  Some systems expect you to act on those opinions as
though they were facts and/or directives.  GOSSiP is, as its acronym
implies, a system that views such opinions as gossip.  As such, any
externally-provided information is viewed in the context of how much the
listener trusts the opinion, how much the opinion-holder trusts the
opinion, the listener's opinion of the opinion-holder, and the
listener's own opinion of the identity about which the opinion-holder is
communicating an opinion.

Finally, it's extremely important to note that reputation systems don't
build or report reputations about a particular message.  The reputations
are for identities.  So, in your example, the recipient may well think
the identity that's sending those messages is sending spam, whereas the
observers may think that identity sends nothing but ham.  This isn't a
problem with reputation systems per se, but rather with the fact that
the recipient and the observers may represent two different communities.
If the recipient is one of many who hold the opinion that the identity
in question is a source of spam, then it can be said that the
recipient's community and the observers' communities are divergent.
In situations like this, you tend to run into problems with things like
blocklists.  In a system like GOSSiP, those communities are allowed to
diverge, and the local community's standards would tend to prevail over
time.  Does this mean that one person's opinion will be forced upon that
person's community, or upon other, divergent communities?  Not at all.
The opposite, in fact:  the prevailing local community's standards
become instantiated in the local GOSSiP node's data, reflected in the
data stored for various entities.  So, the community that thinks that
identity is sending spam is happy, because their local GOSSiP node
understands this and acts accordingly, and the community that thinks
that identity is sending ham is happy, because their local GOSSiP node
understands that as well.

in GOSSiP, part of the strength of the concept is that the nodes form a
social network that evolves over time, with emerging node communities
that reflect groups of nodes that tend to have similar opinions about a
large number of identities.  Of course, this marginalizes those few
users who tend to view certain messages as spam when the rest of the
community thinks they're ham, but for those few users, individual spam
filters and things like procmail still exist.


Is the analysis, in assessing reports, we treat all reports
as fact, aka spam or ham, and then adjust reported
statements based on learned measures of how recipients
generally make mistakes or specifically make mistakes
concerning the particular sender?

In GOSSiP, feedback is viewed as factual and trusted for the time being,
and is only from local, automated sources, as described above.  Other
reputation systems may do this differently.  As I mentioned, this was a
design decision on my part.  Some people don't like it, others are okay
with it.

Please also note that "feedback" is a separate thing from "peer
reports", which are a different beast entirely.  The latter are received
prior to message queueing, the former after message delivery.  I'm not
entirely sure which you're referring to here.


To do this, don't you need sufficient volume levels to gain
an accurate measure, given the error factor, or is there
another approach?

Yes, but until those volumes are reached, you can simply say "I don't
know enough to make a decision -- just ignore me."

Reputations, good or bad, take time to build.  In GOSSiP, there are a
few parameters that can be set that determine how long this takes, and
how quickly they change with new input, but -- just as in real life --
you can't build a reputation overnight.  Of course, social networks
provide a convenient shortcut to having to directly interact with every
identity you need information about.  Rather than having to have, say,
1,000 interactions with your new neighbor, you can just ask other
neighbors and friends their opinion of the neighbor, and weight those
more heavily until you've built up enough experience to both evaluate
your neighbor and your peer's opinions of your neighbor.


* Is not one problem that some domains may not send enough
email to allow for a realistic behaviour assessment based
on existing methods used to establish reputation using
recipient's data?

It's a truism, but not a problem.  Identities who send such a low volume
of email that they never establish a good or poor reputation in a
reputation system such as GOSSiP would simply be handed off to the next
evaluation mechanism (spam filters, SPF checks, etc.) without the added
benefit of action taken based on reputation, because they have none.

If you never leave your house, never interact with anyone, nobody's
going to know who you are, and they won't have any context in which to
evaluate your actions.  True in life, true in GOSSiP.  True with many
reputation systems.


Let's create an example.

Joe Smith who lives in Upper Cove, Nova Scotia has a web
site and his own mail server. 

Over a one year period he has built a verified opt-in
mailing list of 10,000 subscribers from all over the world. 

He sends mailings to his subscribers on a bi-monthly basis.

* Would these type of mailing volumes generate the required
data from all potential recipient sources to allow a
reputation service to come to a correct conclusion as to
the sending characteristics of Joe's domain?

Again, this depends on the particular implementation.  In GOSSiP,
probably not for several years.  However, the volume isn't the sole
arbiter.  There's also the opinions of the community which each node
serves.  If they think this mail's ham, then it's ham.  If they think
it's spam, it's spam.  However, it depends on the MTA distribution of
those 10,000 users.  if they're all served by a small handful of MTAs,
it would only take a few months before enough data has been collected.
If each user is served by a separate MTA, it may well take years.
Until then, the reputation is a non-issue as far as accept/reject/filter
decisions go.


(I am presuming a formula which includes mail volume, time
and spam reports from recipient sources, or is this wrong?)


I'd suggest you read the architecture draft available at
http://sufficiently-advanced.net/architecture-draft-02.txt for formula
details.


On the other hand, we have Big Co, with headquarters in New
York City. Big Co does regular mailings on a weekly basis,
sometimes mailing up to 2,000,000 pieces at a time.

* Again the same questions.

Same answers.


Now if the email volume coming from Joe's mail server does
not allow a reputation service, to assess whether Joe has a
good reputation or not, what to do?

Proceed as normal.


Is this not one aspect of the problem?

With GOSSiP it's not a problem.  With other reputation systems, it may
or may not be.


As a result, what reputation is applied to Joe's domain?

With GOSSiP, it's pretty much null.  With other reputation systems, who
knows?  Again, you're asking implementation questions, rather than
general reputation system questions.


From the receiving MTA's perspective, the operator may decide:

* Okay, I know the sending MTA was authorized to send this
message for this domain. But this does not tell me whether the
particular message is spam or ham.

To fine tune my query I will ask my favourite reputation service:


Reputation systems don't provide feedback for individual messages.  They
provide metrics for identities.  Those metrics happen to reflect a
history of behavior constructed from individual messages, but a
reputation has nothing to say about a given message, except whether the
identity sending it has a good or poor reputation.

* Does the domain which has authenticated the sending MTA have a
good or bad reputation?

Again, an implementation detail.  In GOSSiP, an identity is a
combination of the right-hand-side of the address in the RFC2821 MAIL
FROM: header, and the IP address associated with the connection to the
MTA.


* If the domain has no known reputation, or perhaps because I
want a more refined statement than merely good or bad, I will
query accreditation services x, y and z to see whether anyone is
prepared to make a statement about this domain's sending
characteristics?

With GOSSiP, you can do whatever you want with the message, perform any
queries you want in addition to a GOSSiP repuation query.  The
reputation value's just one more data point for filtering.  You have the
option of telling the local GOSSiP node to instruct your MTA to reject
the message if certain criteria are met, but otherwise it's just another
bit of data to be used for filtering after the message has been queued.


-- 
Mark C. Langston            GOSSiP Project          Sr. Unix SysAdmin
mark(_at_)bitshift(_dot_)org   http://sufficiently-advanced.net    
mark(_at_)seti(_dot_)org
Systems & Network Admin      Distributed               SETI Institute
http://bitshift.org       E-mail Reputation       http://www.seti.org