ietf-mxcomp
[Top] [All Lists]

Re: So here it is one year later...

2005-01-28 14:07:29

On Fri, 28 Jan 2005, Justin Mason wrote:

Actually, you're wrong there. This is SpamAssassin's "hit-frequencies"
tool output.  Those are percentages, not message counts, so simply summing
SPAM%+HAM% will not add up to OVERALL%.

Here's a quick walk through the pertinent parts. (I'm discarding the 1-3
and 3-6 month ranges -- those are old mails so that data isn't very useful
for network tests -- and just concentrate on the 0-1 month range.)

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME:0-1
  61405    54725     6680    0.891   0.00    0.00  (all messages):0-1

This means that there were 61405 messages mass-checked in total,
with 54725 spams, and 6680 "hams" (non-spam messages).

  5.377   3.7259  18.9072    0.165   0.23   -0.00  SPF_PASS:0-1

looking at the SPAM% and HAM% columns, that means that 3.7259% of the spams
checked had SPF_PASS, and 18.9072% of the hams. that means

  ((3.7259 / 100) * 54725) = 2038.998775

I'd suspect that rounding error means that 2039 spam messages passed the
SPF check, so round to 2039.

  ((18.9072 / 100) * 6680) = 1263.00096

and 1263 hams passed SPF.  Total those, and you get 3302 messages
from the overall corpus passing SPF; to express that as a percentage
of the total overall corpus, in other words "OVERALL%", you compute
(3302 / 61405) * 100 = 5.377.

If you have any more questions on the hit-frequencies format, I'll
be happy to fill you in -- I wrote the tool in question ;)

Ah. That certainly explains the computation. Thanks.  But the particular
stats are from only 4 people, who, being spamassassin contributors,
probably have more SPF-using friends than most people.

FYI, this is the original post:

---------- Forwarded message ----------
Date: Thu, 9 Sep 2004 15:18:42 +0200
From: Markus Stumpf <maex-lists-email-ietf-mxcomp(_at_)Space(_dot_)Net>
To: ietf-mxcomp(_at_)vpnc(_dot_)org
Subject: SPF abused by spammers

Justin Murdock posted this link on the qmail list:
    http://news.bbc.co.uk/1/hi/technology/3631350.stm
    "CipherTrust [...] found that 34% more spam is passing SPF checks than
    legitimate e-mail."

Sure.  But this was guaranteed to change over time, and vary depending on
corpus composition.  It's pretty radically different now, from where I and
the other SpamAssassin corpus contributors are viewing it.

Everything changes over time.

-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net         faster, more reliable, better service
617 344 9000