ietf-mxcomp
[Top] [All Lists]

Re: So here it is one year later...

2005-01-28 16:14:33

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Dean Anderson writes:
On Fri, 28 Jan 2005, Dean Anderson wrote:

  ((3.7259 / 100) * 54725) = 2038.998775

I'd suspect that rounding error means that 2039 spam messages passed the
SPF check, so round to 2039.

  ((18.9072 / 100) * 6680) = 1263.00096

and 1263 hams passed SPF.  Total those, and you get 3302 messages
from the overall corpus passing SPF; to express that as a percentage
of the total overall corpus, in other words "OVERALL%", you compute
(3302 / 61405) * 100 = 5.377.

BTW, I'd say that 38% of the SPF use was ham, and 62% was spam.
    1263 ham / 3302  = ~38%
    2039 spam / 3302 = ~62%

So, 24% versus the 34% in September.  Either slightly better than
September, or perhaps the sample is too small and skewed to be useful. Or 
both.  Its still better to block whenever you see SPF.

hmm!  not sure if that's a good assumption -- it's very much dependent on
the comparative ham:spam ratio a domain would see. This set of corpora is
heavily skewed towards receiving more spam than ham; 87.8% of our messages
being spam.

Let's say those corpora were only receiving a third as much spam as they
currently do, possibly because they were younger email addresses or
whatever.  In that case, we'd see 1263 ham, 680 spam (2039/3 ~= 680), in
which case the proportion of SPF-using-spam vs SPF-using-ham would be on
its head: 34% being spam, 65% being ham.

What I'm trying to illustrate here is that it's important to compare
figures using figures that compensate for the comparative ham:spam ratio,
because that varies wildly.   Hence, comparing (SPF-bearing-ham / all-ham)
to (SPF-bearing-spam / all-spam) is safer than comparing the message
counts of SPF-bearing-ham and SPF-bearing-spam directly.

Still, I'd think the 3.7% of total spam using SPF is still fairly
significant, and probably reflects the relative proportion of genuine
commercial spam to non-commercial spam.  It would be interesting to know,
of those spams that pass SPF, how many are CAN-SPAM compliant?  How many
of the non-SPF spams were CAN-SPAM compliant?  I'd conjecture a strong
correlation.

now that's something I don't have time to get into, CAN-SPAM compliance
not being something that's easy to automate checking for (more's the
pity).

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFB+sdSMJF5cimLx9ARAhvgAKCdDpROw4/yqVxCwOwMipg46uh/LACbBD/6
HsR9l0+iPdHRG4RDM/zOSmQ=
=lLX4
-----END PGP SIGNATURE-----