ietf-dkim
[Top] [All Lists]

Re: [ietf-dkim] New canonicalizations

2011-05-18 19:46:52
Murray S. Kucherawy wrote:
I don't think there is anything reliable there from I can see, but its
not unreasonable for one to hypothesize that there might be a direct
correlation between the number of hops and the tendency to use
relaxed/relaxed. It might be interesting to see if that may be a
motivation for using relaxed/relaxed:

      c-param vs ave # of hops (received lines)

+---------------------+-----------+------------+----------+
| avg(received_count) | hdr_canon | body_canon | count(*) |
+---------------------+-----------+------------+----------+
|              1.0976 |         0 |          0 |     2214 |
|              1.0000 |         0 |          1 |        7 |
|              1.0338 |         1 |          0 |     7569 |
|              2.3349 |         1 |          1 |    14086 |
+---------------------+-----------+------------+----------+

Canonicalizations of "0" mean "simple", "1" is "relaxed".  So there 
is possibly a correlation between use of relaxed/relaxed and the 
hop count for spam, 

I just finished doing this test and got the following. I stored 
records (hops, hash, sdid) in a SQL tables and ran the following queries:

select hash, count(*) from c14n
      group by hash;

+--------------------------------+
| hash                  count(*) |
|--------------------------------|
| relaxed/relaxed       5420     |
| relaxed/simple        1115     |
| simple/relaxed        2        |
| simple/simple         1314     |
+--------------------------------+

select hash, hops, sdid, count(*) from c14n
      group by hops
      order by hops desc, hash;

+--------------------------------------------------------------+
| hash                hops    sdid                    count(*) |
|--------------------------------------------------------------|
| relaxed/relaxed     8       gmail.com               8        |
| relaxed/relaxed     7       talamasca.ocis.net      6        |
| relaxed/simple      6       mrochek.com             49       |
| relaxed/relaxed     5       yahoo.com               474      |
| relaxed/relaxed     4       gmail.com               184      |
| simple/simple       3       maimonides.edu          84       |
| relaxed/relaxed     2       coldwatercreek.com      1483     |
| relaxed/relaxed     1       facebookmail.com        5563     |
+--------------------------------------------------------------+

I had notice gmail.com messages had a wide degree of multi-hops, so I 
did a query just for it:

select hash, hops, sdid, count(*) from c14n
    where sdid="gmail.com"
    group by hops
    order by hops desc, hash;

+--------------------------------------------------------------+
| hash                hops     sdid                   count(*) |
|--------------------------------------------------------------|
| relaxed/relaxed     8        gmail.com              8        |
| relaxed/relaxed     7        gmail.com              4        |
| relaxed/relaxed     6        gmail.com              14       |
| relaxed/relaxed     5        gmail.com              14       |
| relaxed/relaxed     4        gmail.com              107      |
| relaxed/relaxed     2        gmail.com              130      |
+--------------------------------------------------------------+

Looking at these messages:

    hops=2   direct private emails to users
    hops=4   xml-dev list messages
    hops=5   pop3ext, ietf-smtp list messages
    hops=6   spf-help, ietf discuss list messages
    hops=7   spf-discuss list messages
    hops=8   spf-discuss list messages

but I have trouble envisioning that as 
something that's being actively considered by signers.

The reason we needed relaxed in the first place is because there are 
many long time systems that are still active and had evolved from UUCP 
(like us) and still have those backend internal I/O designs, including 
UI, report writers, text interfaces, etc, in place. The first change 
was just swapping the transport method UUCP to SMTP and the only 
interoperability requirement was to make sure the edge had the proper 
LF/CRLF interface translations in place.

Never an issue until DKIM came along. So for example, if the system 
backend storage is <LF>, you can imagine a standalone DKIM signer or 
verify utility needs to take this I/O into account when reading the 
file.  It can't assume that all mail storage is x822/5322 with CRLF 
delimiters.  We can state it but it is really none of anyone's 
business how the backend data is stored as long the end result is the 
same.

So are signers/operators aware mail mutations can happen?  I think so.

Are signers blasting 1 to Many messages "believe" they need a more 
relaxed integrity to maximize the DKIM verification across the many 
receivers?  I think so (although your stats are showing the similar 
passage rates for simple or relaxed).

I also think that if DKIM has a C14N option (i.e. STRIP) available to 
resolve legacy throughputs for particular streams, they will use it 
too maybe on per target basis only. :)

Anyway, thanks.

-- 
Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com


_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html