On Jul 20, 2004, at 11:47 AM, wayne wrote:
The thing I don't understand about all this data is - what is the PRA
analysis and particularly the PRA "pass/fail" analysis based on?
Caller-ID records? It's hard to imagine that there is significant
SenderID volume yet, since the specs are so new. There are not many
Caller-ID records, let alone a representative cross section of domains.
I would like to thank both Andy and Mark for finally providing some
data on the Caller-ID algorithm (aka PRA/PRD). I think it is
especially nice of them since neither of them are the ones that have
been pushing for the PRA the hardest.
Evaluating the PRA based on records published for SPF-classic is not
valid. SPF-classic publication is itself a non-representative
subsection of the net, and on top of that SPF classic records are not
Sender ID records - the semantics are different - so "pass" and "fail"
does not mean anything.
Also, where are the input mail samples coming from? If it is the
inboxes of technical folks then it is not representative. To get a
representative cross section of mail you need to look at a random cross
section of consumer mailboxes on the big ISPs and also, separately,
mail coming into corporate domains. B to B and B to C mail have very
different characteristics. You also have to make sure the inbox owners
agree with your "ham/spam" assignments.
I agree that generating meaningful data here is extremely difficult,
but how are these stats anything other than anecdotes with numbers? I
agree it's very interesting, but I don't see how this data can be used
for anything other than an aide to designing conclusive tests.