-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dean Anderson writes:
On Fri, 28 Jan 2005, Justin Mason wrote:
FWIW, here's the results of a check of 54725 spams and 6680 nonspam mails,
from SpamAssassin's weekly mass-check of network rules (at
http://www.pathname.com/~corpus/NET.age ).
All these messages were received less than 1 month ago, and are taken from
5 people's hand-classified corpora.
SPF records passing HELO strings: 4.98% of spam, 13.29% of ham
SPF records passing the MAIL FROM: 3.72% spam, 18.90% of ham
So it certainly looks like that statement is untrue.
Err, no:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME:0-1
OVERALL% SPAM% HAM% S/O RANK SCORE NAME:1-3
OVERALL% SPAM% HAM% S/O RANK SCORE NAME:3-6
5.377 3.7259 18.9072 0.165 0.23 -0.00 SPF_PASS:0-1
1.361 0.9087 3.3508 0.213 0.25 -0.00 SPF_PASS:1-3
1.749 0.5116 18.4304 0.027 0.34 -0.00 SPF_PASS:3-6
As you see, spam + ham does not add up to overall. Its not clear what
these statistics mean, nor how they were calculated. But your
interpretation is clearly either wrong or at least not supported by the
page.
Actually, you're wrong there. This is SpamAssassin's "hit-frequencies"
tool output. Those are percentages, not message counts, so simply summing
SPAM%+HAM% will not add up to OVERALL%.
Here's a quick walk through the pertinent parts. (I'm discarding the 1-3
and 3-6 month ranges -- those are old mails so that data isn't very useful
for network tests -- and just concentrate on the 0-1 month range.)
OVERALL% SPAM% HAM% S/O RANK SCORE NAME:0-1
61405 54725 6680 0.891 0.00 0.00 (all messages):0-1
This means that there were 61405 messages mass-checked in total,
with 54725 spams, and 6680 "hams" (non-spam messages).
5.377 3.7259 18.9072 0.165 0.23 -0.00 SPF_PASS:0-1
looking at the SPAM% and HAM% columns, that means that 3.7259% of the spams
checked had SPF_PASS, and 18.9072% of the hams. that means
((3.7259 / 100) * 54725) = 2038.998775
I'd suspect that rounding error means that 2039 spam messages passed the
SPF check, so round to 2039.
((18.9072 / 100) * 6680) = 1263.00096
and 1263 hams passed SPF. Total those, and you get 3302 messages
from the overall corpus passing SPF; to express that as a percentage
of the total overall corpus, in other words "OVERALL%", you compute
(3302 / 61405) * 100 = 5.377.
If you have any more questions on the hit-frequencies format, I'll
be happy to fill you in -- I wrote the tool in question ;)
FYI, this is the original post:
---------- Forwarded message ----------
Date: Thu, 9 Sep 2004 15:18:42 +0200
From: Markus Stumpf <maex-lists-email-ietf-mxcomp(_at_)Space(_dot_)Net>
To: ietf-mxcomp(_at_)vpnc(_dot_)org
Subject: SPF abused by spammers
Justin Murdock posted this link on the qmail list:
http://news.bbc.co.uk/1/hi/technology/3631350.stm
"CipherTrust [...] found that 34% more spam is passing SPF checks than
legitimate e-mail."
Sure. But this was guaranteed to change over time, and vary depending on
corpus composition. It's pretty radically different now, from where I and
the other SpamAssassin corpus contributors are viewing it.
- --j.
\Maex
--
SpaceNet AG | Joseph-Dollinger-Bogen 14 | Fon: +49 (89) 32356-0
Research & Development | D-80807 Muenchen | Fax: +49 (89)
32356-299
"The security, stability and reliability of a computer system is
reciprocally
proportional to the amount of vacuity between the ears of the admin"
--Dean
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFB+p3CMJF5cimLx9ARAlLpAKCAOosS1dSm7hjSgzH0dzRTWNsaBwCgpY5Y
lLmvE2U+4KdCyOXLXAdgYFY=
=HChc
-----END PGP SIGNATURE-----