Re: [spf-discuss] Perils of reputation

On Tue, Feb 06, 2007 at 08:06:26PM -0500, Stuart D. Gathman wrote:

Consider the case of this sender:

2007Feb06 02:07:11 [1324] connect from cmn1lsm3.beliefnet.com at 
('129.33.230.137', 43757) EXTERNAL
2007Feb06 02:07:12 [1324] hello from cmn1lsm3.beliefnet.com
2007Feb06 02:07:12 [1324] mail from 
<listadmin4(_at_)partner(_dot_)beliefnet(_dot_)com> ()
2007Feb06 02:07:12 [1324] Received-SPF: pass (smtp.example.com: domain of 
partner.beliefnet.com designates 129.33.230.137 as permitted sender) 
client_ip=129.33.230.137; 
envelope_from="listadmin4(_at_)partner(_dot_)beliefnet(_dot_)com"; 
helo=cmn1lsm3.beliefnet.com; receiver=smtp.example.com; 
mechanism="a:cmn1lsm3.beliefnet.com"; identity=mailfrom
2007Feb06 02:07:12 ham: 0, spam: 25
2007Feb06 02:07:12 ID partner.beliefnet.com:SPF reputation: 
-76.159416,2.014513
2007Feb06 02:07:12 [1324] X-GOSSiP: uqaWJvNVWKgzP7TsOY9.Jg,-76,2
2007Feb06 02:07:12 [1324] rcpt to <jackiel(_at_)example(_dot_)com> ()
2007Feb06 02:07:12 [1324] REJECT: REPUTATION

They are not actually spamming.  They have a very nice SPF record.  Users
at this company actually signed up for their mailings.  Their mailings
*are* laden with advertising.  That is, after all, how their operation
is funded.  This similarity with actual spam causes a message or two
to be quarantined.  The user doesn't actually care that much about reading
the messages, and doesn't bother releasing them from the quarantine.  They
never send any email to the domain, so no auto-whitelisting takes place.
The stats snowball until all messages are quarantined.  The reputation
takes a nosedive, and the system starts rejecting all messages.  Quite
reasonable, since they were just sitting in quarantine until deleted anyway.

This is an example of practical spam.  Stuff that users sign up for, but
don't actually have time to read.  Kind of like those magazine subscriptions
that pile up in the bathroom, or those newpapers sitting in your recycling
bin that you never get around to reading.  It is a good thing that 
the system eventually learns to refuse delivery.  However, I feel like
there should be a different kind of demerit for this kind of "spam", because
the company is not actually doing anything wrong.  The reputation should
have a high "lost interest" score, that is distinguished from a
high "criminal spammer" score.  But I am not sure how to capture that
distinction from end users.   

Certainly, the best way to do this is to charge recipients for the
subscription.  That will certainly motivate them to whitelist the sender.
And if they never read it, they don't have to renew.

However, advertising funded content is very popular.  I suppose that
messages actually reported as spam or sent to a honeypot mailbox should
get a different kind of demerit than messages that are simply left in 
quarantine.  So there would be three counts: ham, spam, cageliner.  The
last two would count together for purposes of quarantine and rejection,
but only the spam stat would determine the "evilness" of the sender.

Which might affect how the system GOSSiPs about senders.
When responding to a reputation query, the cageliner messages should
count as ham, rather than spam.

Comments?  Insights?



I see two distinct issues here.

if you base your reputation scores on things like textual similarity 
or similarity of SA scores, rather than whether the email is actually
wanted by the recipient, then you are going to see anomalies.


The quarantine delays user involvement in feedback, the system feeds back
on itself until it comes to the attention of a human being.  if you delay 
human feedback, when it becomes necessary the mechanical feedback will 
have had longer to work.

Arguably, since quarantines mostly work (sort of), this is a feature of
a quarantine system and a trade-off to be considered when using one.

I really like the term cageliner :-)

An interesting question is the relationship individual users have with 
cageliner material.  One user may go around signing up for stuff and not
really realising they have, and not wanting it, or even reacting to it
as spam (reporting it), while another user might sign up for the same
material and really want it and be annoyed if it is filtered
inappropriately.  

I also question whether there is really a clear
black and white boundary between ham and spam or whether when you look
closely it isn't more continuous - shades of grey.  I think that the
fact that you might treat the exact same circular differently for
different users is in fact the extra dimension you are looking for with
your three counts, and it's a real question mark as to whether three
counts will turn out to be a useful model of that. I'm sorry I can't
offer more help with what might make a good model, but I would be
looking in the direction of statistics, eg: mean and sd, so instead of a
count of ham/spam scoring you get a distribution.

Regards,
Paddy

-------
Sender Policy Framework: http://www.openspf.org/
Archives at http://archives.listbox.com/spf-discuss/current/
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/?list_id=735