Here is my first pass at generating some semi-useful data. (I wouldn't
call this data "useful" yet but perhaps a whole day's worth of data would
actually show some trends). Maybe they could be called "interesting" if
not really useful.
If the data gets used for anything, it will probably be best used for
finding out what areas need a lot more data :)
Snapshot of 10,000 transactions received Jun 28 between 21:10:07 and
21:36:47
Format is Action__SPFresult, where Action is what we actually did with the
message, and SPFresult is what SPF would have told us about the message if
we had consulted it.
Note however that a huge amount of our incoming mail is bounces, and I
don't have the HELO name in the log data, so I couldn't test SPF at all for
those - they just say (null sender)
"badrecip" means something was wrong with the recipient, either an unknown
username, unknown domain name, null or mangled, or blocked/internal use
only. Note that 15% is the largest number appearing on this list other
than RBLs, so this is a fairly sizeable chunk. Even 1% of our incoming
mail means 30,000 per day.
1550 (15%) badrecip__(null sender)
9 (0%) badrecip__error
8 (0%) badrecip__fail
307 (3%) badrecip__neutral
2 (0%) badrecip__pass
172 (1%) badrecip__softfail
2 (0%) badrecip__unknown
"badsender" means the MAIL FROM address is mangled/illegal or a nonexistent
domain or something.
40 (0%) badsender__(null sender)
17 (0%) badsender__neutral
"blocked_spam" and "blocked_virus" is rejected due to content (usually
"known spammer URL in the body")
88 (0%) blocked_spam__(null sender)
1 (0%) blocked_spam__fail
19 (0%) blocked_spam__neutral
10 (0%) blocked_spam__softfail
2 (0%) blocked_virus__(null sender)
1 (0%) blocked_virus__neutral
"delivered" means the message made it through in the clear, though it's
still got a good chance of being spam at this point, sadly.
260 (2%) delivered__(null sender)
3 (0%) delivered__error
3 (0%) delivered__fail
58 (0%) delivered__neutral
5 (0%) delivered__pass
7 (0%) delivered__softfail
"miscerror" are things like protocol misuse, and a handful of errors to
rare to get their own category (message rejected)
110 (1%) miscerror__(null sender)
1 (0%) miscerror__error
2 (0%) miscerror__fail
19 (0%) miscerror__neutral
1 (0%) miscerror__pass
4 (0%) miscerror__softfail
wow, we sure don't quarantine stuff very much, .01%... hmm.
1 (0%) quarantined__(null sender)
ratelimit means too many connections, causing us to reject before even
checking the RBL (most ips subject to ratelimit are also on an RBL anyway)
118 (1%) ratelimit__(null sender)
RBL is the lion's share right now. I would probably get more detailed
numbers for everything else if I just skipped this one, but I wanted to
make sure I was comparing apples to apples. RBL means we cut off the
connection before HELO.
6722 (67%) rbl__(null sender)
misc blah
1 (0%) syntaxerr__(null sender)
"tagged" means that the message was allowed through to someone's mailbox
(or to be bounced somewhere down the line) but the Subject is altered to
say it's probably spam.
36 (0%) tagged__(null sender)
1 (0%) tagged__error
1 (0%) tagged__fail
16 (0%) tagged__neutral
"temperror" is a 400-series error due to timed out DNS lookups, couldn't
verify recipient exists, etc.
394 (3%) temperror__(null sender)
4 (0%) temperror__error
3 (0%) temperror__neutral
2 (0%) temperror__softfail
This was all run with trusted-forwarder.org being checked, AND with "best
guess", though I haven't figured out how to tell whether the "pass" results
were due to a guess or not using m:s:q.
Same data sliced a different way... This time arranged by SPF result
Something like 89% of my transactions have a null sender, either MAIL FROM:
<> or the transaction doesn't get to the MAIL FROM stage at all. (again, I
wish I had HELO names but I don't have access to that)
1550 (15%) badrecip__(null sender)
40 (0%) badsender__(null sender)
88 (0%) blocked_spam__(null sender)
2 (0%) blocked_virus__(null sender)
260 (2%) delivered__(null sender)
110 (1%) miscerror__(null sender)
1 (0%) quarantined__(null sender)
118 (1%) ratelimit__(null sender)
6722 (67%) rbl__(null sender)
1 (0%) syntaxerr__(null sender)
36 (0%) tagged__(null sender)
394 (3%) temperror__(null sender)
SPF processing returned "error"
9 (0%) badrecip__error
3 (0%) delivered__error
1 (0%) miscerror__error
1 (0%) tagged__error
4 (0%) temperror__error
SPF processing returned "fail" - precious little of these: 0.15%
8 (0%) badrecip__fail
1 (0%) blocked_spam__fail
3 (0%) delivered__fail
2 (0%) miscerror__fail
1 (0%) tagged__fail
SPF returned "neutral" - 4.4%
307 (3%) badrecip__neutral
17 (0%) badsender__neutral
19 (0%) blocked_spam__neutral
1 (0%) blocked_virus__neutral
58 (0%) delivered__neutral
19 (0%) miscerror__neutral
16 (0%) tagged__neutral
3 (0%) temperror__neutral
SPF returned "pass" 0.08%
2 (0%) badrecip__pass
5 (0%) delivered__pass
1 (0%) miscerror__pass
SPF returned "softfail" - this is kinda cool actually: 1.95%
172 (1%) badrecip__softfail
10 (0%) blocked_spam__softfail
7 (0%) delivered__softfail
4 (0%) miscerror__softfail
2 (0%) temperror__softfail
spf returned "unknown" - this maybe means unknown mechanism
2 (0%) badrecip__unknown
The bright side for me is that my script (using Mail::SPF::Query) takes 19
min to process 26 min of real-time data, so hopefully I will be able to
keep it running constantly and have a readout in real time of the whole 3
mil, not just 10,000.
--
Greg Connor <gconnor(_at_)nekodojo(_dot_)org>