On Fri, 28 Jan 2005, Justin Mason wrote:
Actually, you're wrong there. This is SpamAssassin's "hit-frequencies"
tool output. Those are percentages, not message counts, so simply summing
SPAM%+HAM% will not add up to OVERALL%.
Here's a quick walk through the pertinent parts. (I'm discarding the 1-3
and 3-6 month ranges -- those are old mails so that data isn't very useful
for network tests -- and just concentrate on the 0-1 month range.)
OVERALL% SPAM% HAM% S/O RANK SCORE NAME:0-1
61405 54725 6680 0.891 0.00 0.00 (all messages):0-1
This means that there were 61405 messages mass-checked in total,
with 54725 spams, and 6680 "hams" (non-spam messages).
5.377 3.7259 18.9072 0.165 0.23 -0.00 SPF_PASS:0-1
looking at the SPAM% and HAM% columns, that means that 3.7259% of the spams
checked had SPF_PASS, and 18.9072% of the hams. that means
((3.7259 / 100) * 54725) = 2038.998775
I'd suspect that rounding error means that 2039 spam messages passed the
SPF check, so round to 2039.
((18.9072 / 100) * 6680) = 1263.00096
and 1263 hams passed SPF. Total those, and you get 3302 messages
from the overall corpus passing SPF; to express that as a percentage
of the total overall corpus, in other words "OVERALL%", you compute
(3302 / 61405) * 100 = 5.377.
If you have any more questions on the hit-frequencies format, I'll
be happy to fill you in -- I wrote the tool in question ;)
Ah. That certainly explains the computation. Thanks. But the particular
stats are from only 4 people, who, being spamassassin contributors,
probably have more SPF-using friends than most people.
FYI, this is the original post:
---------- Forwarded message ----------
Date: Thu, 9 Sep 2004 15:18:42 +0200
From: Markus Stumpf <maex-lists-email-ietf-mxcomp(_at_)Space(_dot_)Net>
To: ietf-mxcomp(_at_)vpnc(_dot_)org
Subject: SPF abused by spammers
Justin Murdock posted this link on the qmail list:
http://news.bbc.co.uk/1/hi/technology/3631350.stm
"CipherTrust [...] found that 34% more spam is passing SPF checks than
legitimate e-mail."
Sure. But this was guaranteed to change over time, and vary depending on
corpus composition. It's pretty radically different now, from where I and
the other SpamAssassin corpus contributors are viewing it.
Everything changes over time.
--
Av8 Internet Prepared to pay a premium for better service?
www.av8.net faster, more reliable, better service
617 344 9000