On Jul 20, 2004, at 11:47 AM, wayne wrote:
I would like to thank both Andy and Mark for finally providing some
data on the Caller-ID algorithm (aka PRA/PRD). I think it is
especially nice of them since neither of them are the ones that have
been pushing for the PRA the hardest.
The thing I don't understand about all this data is - what is the PRA
analysis and particularly the PRA "pass/fail" analysis based on?
Caller-ID records? It's hard to imagine that there is significant
SenderID volume yet, since the specs are so new. There are not many
Caller-ID records, let alone a representative cross section of domains.
Evaluating the PRA based on records published for SPF-classic is not
valid. SPF-classic publication is itself a non-representative
subsection of the net, and on top of that SPF classic records are not
Sender ID records - the semantics are different - so "pass" and "fail"
does not mean anything.
Also, where are the input mail samples coming from? If it is the
inboxes of technical folks then it is not representative. To get a
representative cross section of mail you need to look at a random cross
section of consumer mailboxes on the big ISPs and also, separately,
mail coming into corporate domains. B to B and B to C mail have very
different characteristics. You also have to make sure the inbox owners
agree with your "ham/spam" assignments.
I agree that generating meaningful data here is extremely difficult,
but how are these stats anything other than anecdotes with numbers? I
agree it's very interesting, but I don't see how this data can be used
for anything other than an aide to designing conclusive tests.