Murray S. Kucherawy wrote:
I don't think there is anything reliable there from I can see, but its
not unreasonable for one to hypothesize that there might be a direct
correlation between the number of hops and the tendency to use
relaxed/relaxed. It might be interesting to see if that may be a
motivation for using relaxed/relaxed:
c-param vs ave # of hops (received lines)
+---------------------+-----------+------------+----------+
| avg(received_count) | hdr_canon | body_canon | count(*) |
+---------------------+-----------+------------+----------+
| 1.0976 | 0 | 0 | 2214 |
| 1.0000 | 0 | 1 | 7 |
| 1.0338 | 1 | 0 | 7569 |
| 2.3349 | 1 | 1 | 14086 |
+---------------------+-----------+------------+----------+
Canonicalizations of "0" mean "simple", "1" is "relaxed". So there
is possibly a correlation between use of relaxed/relaxed and the
hop count for spam,
I just finished doing this test and got the following. I stored
records (hops, hash, sdid) in a SQL tables and ran the following queries:
select hash, count(*) from c14n
group by hash;
+--------------------------------+
| hash count(*) |
|--------------------------------|
| relaxed/relaxed 5420 |
| relaxed/simple 1115 |
| simple/relaxed 2 |
| simple/simple 1314 |
+--------------------------------+
select hash, hops, sdid, count(*) from c14n
group by hops
order by hops desc, hash;
+--------------------------------------------------------------+
| hash hops sdid count(*) |
|--------------------------------------------------------------|
| relaxed/relaxed 8 gmail.com 8 |
| relaxed/relaxed 7 talamasca.ocis.net 6 |
| relaxed/simple 6 mrochek.com 49 |
| relaxed/relaxed 5 yahoo.com 474 |
| relaxed/relaxed 4 gmail.com 184 |
| simple/simple 3 maimonides.edu 84 |
| relaxed/relaxed 2 coldwatercreek.com 1483 |
| relaxed/relaxed 1 facebookmail.com 5563 |
+--------------------------------------------------------------+
I had notice gmail.com messages had a wide degree of multi-hops, so I
did a query just for it:
select hash, hops, sdid, count(*) from c14n
where sdid="gmail.com"
group by hops
order by hops desc, hash;
+--------------------------------------------------------------+
| hash hops sdid count(*) |
|--------------------------------------------------------------|
| relaxed/relaxed 8 gmail.com 8 |
| relaxed/relaxed 7 gmail.com 4 |
| relaxed/relaxed 6 gmail.com 14 |
| relaxed/relaxed 5 gmail.com 14 |
| relaxed/relaxed 4 gmail.com 107 |
| relaxed/relaxed 2 gmail.com 130 |
+--------------------------------------------------------------+
Looking at these messages:
hops=2 direct private emails to users
hops=4 xml-dev list messages
hops=5 pop3ext, ietf-smtp list messages
hops=6 spf-help, ietf discuss list messages
hops=7 spf-discuss list messages
hops=8 spf-discuss list messages
but I have trouble envisioning that as
something that's being actively considered by signers.
The reason we needed relaxed in the first place is because there are
many long time systems that are still active and had evolved from UUCP
(like us) and still have those backend internal I/O designs, including
UI, report writers, text interfaces, etc, in place. The first change
was just swapping the transport method UUCP to SMTP and the only
interoperability requirement was to make sure the edge had the proper
LF/CRLF interface translations in place.
Never an issue until DKIM came along. So for example, if the system
backend storage is <LF>, you can imagine a standalone DKIM signer or
verify utility needs to take this I/O into account when reading the
file. It can't assume that all mail storage is x822/5322 with CRLF
delimiters. We can state it but it is really none of anyone's
business how the backend data is stored as long the end result is the
same.
So are signers/operators aware mail mutations can happen? I think so.
Are signers blasting 1 to Many messages "believe" they need a more
relaxed integrity to maximize the DKIM verification across the many
receivers? I think so (although your stats are showing the similar
passage rates for simple or relaxed).
I also think that if DKIM has a C14N option (i.e. STRIP) available to
resolve legacy throughputs for particular streams, they will use it
too maybe on per target basis only. :)
Anyway, thanks.
--
Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html