Murray S. Kucherawy wrote:
According to what we have, the biggest users of "relaxed/relaxed" are the
large mailbox providers like Gmail and Yahoo and other legitimate senders,
not spammers. The top 20, for example:
+----------------------------------+----------+
| name | count(*) |
+----------------------------------+----------+
| gmail.com | 421745 |
| yahoo.com | 313109 |
| facebookmail.com | 233441 |
| yahoogroups.com | 104523 |
| auth.ccsend.com | 90195 |
| linkedin.com | 74710 |
| google.com | 59049 |
| reply.newsmax.com | 53286 |
| ATT.NET | 43602 |
| sbcglobal.net | 36534 |
| googlegroups.com | 34359 |
| e.groupon.com | 30350 |
| paypal.com | 24568 |
| f74d39fa044aa309eaea14b9f57fe79c | 21019 |
| emailinfo.bestbuy.com | 17067 |
| ebay.com | 16192 |
| 636ae4d78ec2b46248fc59ac1ad737df | 14580 |
| expediamail.com | 13058 |
| bellsouth.net | 12431 |
| googlemail.com | 12426 |
+----------------------------------+----------+
Total relaxed/relaxed signatures received = 3444978; total above = 1626244
(47%)
In fact, the first domain name that (statistically) looked likely to be a
spammer is way down on the list, around #106 (out of 63314), and everything
before that accounted for 58% of total signatures. So, our data don't agree
with the claim, and certainly not with "by far".
But I don't understand why this is a useful line of analysis. If spammers
are using relaxed/relaxed, they merely have the same concern as a legitimate
sender, namely signature survivability. This shouldn't be a surprise. I
hope we're not talking about the idea of filtering based on which
canonicalization is in use, which is almost certainly a bad idea.
Some good info Murray.
It is all reflective of whats called Peer or Personal Network
Community (PCN).
The collection you have is an aggregate of many sites. However, in
reality each site will have a different PCN.
I agree, even for my small site collection, the majority volume are
DKIM signed mail are from:
Gmail
Facebook
and for my PCN, the third is:
mipassoc.org
But when you normalized it, there are a small part (3 to 4) of the
total domains which are, by far, good/bad spammers.
When we started our SMTP daily stats collection in 2003, it started as
a per site basis and the PCN patterns were obvious. At some point, we
automated the collection with the attempt to show an aggregation of
the total sites. Almost immediately, the various measurements were
skewed in one direction or another simply because one or more sites
had a higher measurement for thing or another.
For my PCN, by the time mail is finally accepted, the RFC5322 payload
is indeterminate (i.e. everything that could be done was done), and
the analysis of the DKIM signed mail is that most of it are spamming
domains.
While you may be eager to publicly state this input is insignificant
and doesn't matter, my 35+ years of producing software for thousands
of customers and inter-operating with my industry peers says it is
very significant. One can not always lump a total aggregation summary
to reflect what is true or false at the site level. The fact that DKIM
analysis is in a limbo state is reflective of whats I am stating.
Why not try redoing your stats for your PCN only and see what it shows?
Keep in mind we got of the ESP business in 1998, but we still got a
lot of dirty, inactive addresses coming our way. So our PCN will be
very different.
Finally, in my opinion you have two motivations for C14N and it could
be based simply on the degrees of separation of what one deems important:
Private Communications:
Desired to have the most secured integrity
Bulk, Public Communications:
More relaxed, less secured, with a wide degree of
receivers, minimized C14N related issues with an
relaxed algorithm.
--
Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html