ietf-mxcomp
[Top] [All Lists]

Re: PRA algorithm and use of non-standard header fields

2004-07-20 09:36:40


On Jul 20, 2004, at 11:47 AM, wayne wrote:


I would like to thank both Andy and Mark for finally providing some
data on the Caller-ID algorithm (aka PRA/PRD).  I think it is
especially nice of them since neither of them are the ones that have
been pushing for the PRA the hardest.


The thing I don't understand about all this data is - what is the PRA analysis and particularly the PRA "pass/fail" analysis based on? Caller-ID records? It's hard to imagine that there is significant SenderID volume yet, since the specs are so new. There are not many Caller-ID records, let alone a representative cross section of domains.

Evaluating the PRA based on records published for SPF-classic is not valid. SPF-classic publication is itself a non-representative subsection of the net, and on top of that SPF classic records are not Sender ID records - the semantics are different - so "pass" and "fail" does not mean anything.

Also, where are the input mail samples coming from? If it is the inboxes of technical folks then it is not representative. To get a representative cross section of mail you need to look at a random cross section of consumer mailboxes on the big ISPs and also, separately, mail coming into corporate domains. B to B and B to C mail have very different characteristics. You also have to make sure the inbox owners agree with your "ham/spam" assignments.

I agree that generating meaningful data here is extremely difficult, but how are these stats anything other than anecdotes with numbers? I agree it's very interesting, but I don't see how this data can be used for anything other than an aide to designing conclusive tests.