-----Original Message-----
From: ietf-dkim-bounces(_at_)mipassoc(_dot_)org
[mailto:ietf-dkim-bounces(_at_)mipassoc(_dot_)org] On Behalf Of Hector Santos
Sent: Wednesday, May 18, 2011 1:49 PM
To: IETF-DKIM
Subject: Re: [ietf-dkim] New canonicalizations
Whatever the actual reason, since its not the default and the reality
the option exist and serves a purpose, there is an reasonable
practical explanation there is a certain population of domains seeking
the path of least resistance with reduced accidental <cr><lf>
injections and mutations along the path as its very possible to occur
in our heterogeneous networks of Unix (LF), MAC (CR) or DOS (CRLF)
transport, gateways and storage I/O differences.
I think you're asking for a count of domains using various canonicalizations
that produce spam. Here's what we have:
+------------------------+-----------+------------+
| count(distinct domain) | hdr_canon | body_canon |
+------------------------+-----------+------------+
| 214 | 0 | 0 |
| 1 | 0 | 1 |
| 62 | 1 | 0 |
| 3805 | 1 | 1 |
+------------------------+-----------+------------+
This counts a domain as "spammy" if the mail we've seen signed by that domain
is labeled as spam by Spamassassin at least 50% of the time, just as a starting
point. But if instead I report on less than 50% (relatively clean domains),
the ratios are about the same:
+------------------------+-----------+------------+
| count(distinct domain) | hdr_canon | body_canon |
+------------------------+-----------+------------+
| 2703 | 0 | 0 |
| 6 | 0 | 1 |
| 2238 | 1 | 0 |
| 20573 | 1 | 1 |
+------------------------+-----------+------------+
So I don't think a conclusion's really possible here.
I don't think there is anything reliable there from I can see, but its
not unreasonable for one to hypothesize that there might be a direct
correlation between the number of hops and the tendency to use
relaxed/relaxed. It might be interesting to see if that may be a
motivation for using relaxed/relaxed:
c-param vs ave # of hops (received lines)
+---------------------+-----------+------------+----------+
| avg(received_count) | hdr_canon | body_canon | count(*) |
+---------------------+-----------+------------+----------+
| 1.0976 | 0 | 0 | 2214 |
| 1.0000 | 0 | 1 | 7 |
| 1.0338 | 1 | 0 | 7569 |
| 2.3349 | 1 | 1 | 14086 |
+---------------------+-----------+------------+----------+
Canonicalizations of "0" mean "simple", "1" is "relaxed". So there is possibly
a correlation between use of relaxed/relaxed and the hop count for spam, but I
have trouble envisioning that as something that's being actively considered by
signers.
The same report for non-spam, however, shows that there's probably not much of
a statistically significant difference:
+---------------------+-----------+------------+----------+
| avg(received_count) | hdr_canon | body_canon | count(*) |
+---------------------+-----------+------------+----------+
| 1.2570 | 0 | 0 | 220497 |
| 1.0971 | 0 | 1 | 412 |
| 1.4505 | 1 | 0 | 172136 |
| 2.0206 | 1 | 1 | 980337 |
+---------------------+-----------+------------+----------+
I don't know where all this is leading, but there you go.
-MSK
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html