Re: [ietf-dkim] New canonicalizations

On 25/May/11 14:27, Hector Santos wrote:

Alessandro Vesely wrote:

On 25/May/11 10:03, Hector Santos wrote:

How would 7/8 bit be considered?

Personally, the STRIP C14N idea would work just fine by removing all 
trailing WSP (CR, LF, SP) and for QP text, decode it first.  I'm 
considering updating my 2006 I-D to include the QP decoding logic.


I propose a much more radical approach, something that will likely
land on the too-loose side.  Such kind of approach is justified by the
"most breakage is innocent" theory, and by already having two
canonicalizations on the too-tight side.

For example, consider these criteria for feeding the body hash:

1) For multipart MIME messages, completely remove the preamble, the
epilogue, and all boundaries and entity headers.

2) For MIME encoded parts, get back to the binary content.

3) For text parts, completely remove /any/ whitespace.  Additionally,
remove most punctuation, especially from begin and end of lines.


Do we really need this?  Do you know of cases related to this?


The idea is to anticipate any unknown signature breaker.  My three
points above are rather generic.  They are meant to be expanded so as
to include your five points below, and more.

I think it is quite obvious that MIME rewriting generates new
boundaries, and may alter an entity's header.

Non-text binary content that arrives corrupted deserves breaking a
signature.  However, a rewriter may decode a base64 entity for local
storage, and then re-encode it with a different line length.

Text undergoes any kind of massage, trailing "=" may be leftover,
CRLFs may be doubled, "From " turned into ">From ", besides the
leading dots you mention in point five.

We should identify and list the sort of transformation issues we are 
seeing.


Erring on the too-lose side implies some generalization.

I have identified four so far:

  1 - We forgot a possible top CRLF. We dealt with the bottom <CRLF>
      but not the top,

  2 - Top level QP decoding,

  3 - Top Level reformatting to C-T-E: base64 (not MIME multi-part)

  4 - Lines over 998 (1000 with CRLF), this is an invalid RFC5322, but
      its possible some verifiers are designed to do a buffered C14N
      and don't check for RFC5322 line lengths between two memory points
      in the buffer as oppose as a line by line feed into the C14N
      function.  Why buffer vs line?  speed.


I imagined the C14N function reads characters one by one.  On finding
CRLF it can go back a few bytes to remove end-of-line punctuation.
However you code c14n(), it will be sparklingly faster than sha256().

However, distinguishing begin middle of line versus begin/end is
possibly inconsistent, since line breaks may be altered because of
invalidly long lines or RFC3676 rewrapping.

      I found 98 such buffer hash errors from various domains due to
      having at least 1 super long line.


Some MUAs consistently keeps paragraphs on a single line.

and I just found a yet another problem which I was currently 
investigating to see where it this "mite" is occurring:

  5 - Incorrect handling of lines beginning with dots, for example
      I sent a message containing a line beginning with:

      ... blah blah blah.  blah.

      and it was received by my SMTP server as:

      .... blah blah blah.  blah.

      and its sent to this list, it comes back as:

      ..... blah blah blah.  blah.

I checked my SMTP server and its correct. I just now pretty much 
confirmed ThunderBird is causing this initial dot escaping error by 
sending mail to my gmail host and hotmail host accounts. The "...." 
showed up in my new mail bin. But it also appears there could be an 
intermediary with a bug as well adding yet another dot.

While we like to shift blame to the buggy software, IMV, since 
transport DOT escaping is a SMTP requirement and there could be these 
small issues, I'm thinking maybe we should consider a similar 
"relaxed" logic now done for reducing multiple <WSP> to reduce 
multiple <DOT> as well but only when its begins on a new line (or for 
block transfers, preceded by <CR><LF>).  So a line like

      <DOT><DOT><DOT><SP>blah<SP>blah<DOT><SP><SP>blah<DOT>

it becomes:

      <DOT><SP>blah<SP>blah<SP>blah<DOT>


Yes, dot is one of the punctuation characters that should be removed.
_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html