Re: [ietf-dkim] New canonicalizations

Alessandro Vesely wrote:

Hector wrote:
  4 - Lines over 998 (1000 with CRLF), this is an invalid RFC5322, but
      its possible some verifiers are designed to do a buffered C14N
      and don't check for RFC5322 line lengths between two memory points
      in the buffer as oppose as a line by line feed into the C14N
      function.  Why buffer vs line?  speed.


I imagined the C14N function reads characters one by one.  On finding
CRLF it can go back a few bytes to remove end-of-line punctuation.
However you code c14n(), it will be sparklingly faster than sha256().

However, distinguishing begin middle of line versus begin/end is
possibly inconsistent, since line breaks may be altered because of
invalidly long lines or RFC3676 rewrapping.

      I found 98 such buffer hash errors from various domains due to
      having at least 1 super long line.


Some MUAs consistently keeps paragraphs on a single line.


Or rather on the WRITER side, if set, and they can use:

   - use QP to send it (not necessarily save it as QP on the local
     user side).

   - use automatic word wrapping based on the width set (usually
     < 70 to stay consistent with console terminal types).

But when not, on the READER side, as you know, reading can be 
difficult under the display device that doesn't word wraps it for you. 
   This is often a design problem for WEB based viewing because it may 
depend on the HTML tag used to display it.

    <pre> preformatted - viewer will display as is.
    <p>   paragraph - viewer will word wrap

When I read this list mail, for example, via the archive on the web, 
some participant's mail are not word wrapped. So I often just copy and 
paste it into my editor, hit ALT-B to word wrap it just so I can read 
it.   This is an old problem where people assumed (or didn't) everyone 
is using the same reading devices.

In our WEB mail viewer, I forget the logic but it has been adjusted 
over time, and "security" is part of it.   The best way (under web) to 
do it is to use multipart/alternative with text/plain and text/html to 
allow reading devices to be smart.

When you made your suggestion, I thought you and I were thinking the 
same thing in regards to the C14N/HASH should be on what the user 
"wrote" and what is "read," i.e. the actual context, with all the 
"color" and formatting removed.  I think the idea has theoretical 
merit but it is definitely extra processing on both ends (signer and 
verifier) so it may not be feasible.

Just wish to make a few other points regarding the larger buffer readers.

     Messages with illegal (length) text lines
+------------------------------------------------+
| signer                    total  illegal  hops |
|------------------------------------------------|
| coldwatercreek.com        78     78       2    |
| livingsocial.com          41     2        1    |
| jcprewards.com            23     4        2    |
| news.redlobster.com       11     4        1    |
| xanthianoutswagger.net    2      2        1    |
| resultsmail.com           2      2        1    |
| trl3.net                  2      2        1    |
| numbersoft.info           1      1        1    |
+------------------------------------------------+

all spam.  Obviously there are a lot of spammers and eMarketers that 
believe that it (illegal lines) is not checked, and for practical 
reasons, it isn't, or it isn't done to a latter point.

Most SMTP server use a larger buffer data blocks to read the DATA 
stream.  Reading this line by line is extremely inefficient and poor 
TCP and networking performance. It can be the difference between a 2 
second upload to a 1 minute upload.   It wouldn't be hard to do a line 
length check between two <CRLF> points, but for us, this (illegal 
lines) wasn't detected until DKIM checking was added.

So if there anything positive about DKIM is that its helping system do 
things they probably never bother (didn't need to or want to) check 
before. The multiple From: header was among them.   The main point is 
that there are many systems with a transport mechanism that really 
don't care what is in the payload. But RFC5322 compliancy is becoming 
a more aspect to take into account, and including line lengths should 
be among them.

-- 
Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com


_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html