Re: [ietf-dkim] list vs contributor signatures, was Wrong Discussion


On May 27, 2010, at 9:03 PM, Dave CROCKER wrote:



On 5/27/2010 2:22 PM, Steve Atkins wrote:

I'll write up the methodology in a little more detail, but out of my sample


eager to see the method description.  not lots of detail, just the gist of 
what 
criteria created each of the 4 values.


Sure. It was a very quick and rough analysis, based just on
my mailboxes. I assumed that all signing, DNS publication,
DKIM checking and ADSP checking was performed perfectly.
I assumed that any modification of the body or subject of the
mail after it was sent would invalidate a DKIM signature.

I did make a few simplifications to speed the analysis, but the
effect of those was solely to reduce the number of phishing
emails not rejected by ADSP that were counted.

My mailboxes are pre-categorised in a bunch of ways, such that
it's easy for me to extract transactional mail, mail from discussion
lists, mail from marketing lists and junk mail (including phishing).

the initial data is:

Legitimate email from paypal:

    72% rejected by ADSP
    28% not rejected


For these two groups I extracted all mail that had an
@paypal.com email address that was categorized as
legitimate email.

I inspected all of them by hand, and they were a mixture
of transactional notifications from paypal in response
to payments, direct 1:1 mail from paypal.com employees
and mail from paypal.com employees via mailing lists.

I didn't actually check signatures, just considered the
1:1 mail and transactional mail as "not rejected" and
considered the mail sent through mailing lists that
I know modify the content as "rejected by ADSP".


Phishing emails using "paypal" in the From line:

    39% rejected by ADSP
    61% not rejected.


For this I extracted all the email categorised as "junk mail"
that included the string "paypal" in some case or other in
the RFC2822 From: field.

That includes any use of "paypal" in the local part or domain
part of the email, or in the "friendly from".

It excludes any phish emails that didn't include the term
paypal in the From: field, or which used B or Q-encoding
or which used homoglyphs or misspellings of paypal.
(This will exclude some phishes that would pass ADSP, but
will not exclude any that would be rejected by ADSP).

I checked them quickly by eye - all appeared to be paypal
phishes.

Of those, I classified those where the from address was
@paypal.com as "rejected by ADSP" and those where
it wasn't as "not rejected".


Paypal is rather a special case, as they actively register
many, many domains in a lot of TLDs that contain the word
paypal or some misspelling of it, both proactively and in
response to enforcement. I didn't consider those domains
as triggering an ADSP rejection for a number of reasons.
One is that many (most?) of them would have been acquired
by paypal though enforcement action after the phishes were
sent, and the other is that it's a behaviour (registering a
huge number of domains purely to deny them to others)
that's atypical and that doesn't scale.

Havning said that, I did spot check quite a lot of the phishes that
I'd tagged as "not rejected" and the vast majority weren't
using domains I'd expect paypal to have proactively reserved
(paypal.net, for instance) - they were mostly using the word
"paypal" in the friendly from, the local part or a subdomain of
the domain part. Of those that weren't of that form many were
things like "@paypal-access.com" and suchlike. So I think those
two numbers are likely accurate to within a few percent or better.


This is pretty interesting data.  It declares both FPs and FNs with ADSP, 
which 
certainly ain't part of any model I ever heard in support of its use.


I expect that it would improve drastically in some respects with
more widespread use of ADSP - I'd expect paypal employees
to migrate to using a non-paypal.com domain for their email,
for example.

Also, my mailbox is more typical of someone in the industry
than a consumer mailbox as, well, I get some mail from paypal.com
that's neither transactional nor bulk. Someone who was a pure
consumer at a major ISP who didn't use mailing lists or forwarding
services and had no interaction with anyone at paypal other than
via their bulk email system would see a much lower FP rate.

It's also based on sender behaviour before there's significant actual
filtering via ADSP. I would expect less mail, both legitimate and 
illegitimate,
to be rejected by ADSP as time went on.


Given that a standard carries strategic costs in terms of development, 
implementation and deployment (real dollars and time) one would think that 
its 
level of benefit should not decay, or at least not quickly.  Since it takes 
years to become useful it should take quite a few years before it becomes 
useless...


+1

Cheers,
 Steve

_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html