ietf-mailsig
[Top] [All Lists]

Re: DKIM: Canonicalization

2005-07-18 14:54:29

On July 18, 2005 at 11:36, Jim Fenton wrote:

Members of the list differ on what canonicalization is trying to 
accomplish.  We need to get consensus on that before we make much 
progress on what the algorithm(s) themselves are.

It think it will help further that there is an agreement of
what the goals of DKIM are.  From the Introduction:

  DomainKeys Identified Mail (DKIM) defines a simple, low cost,
  and effective mechanism by which cryptographic signatures can be
  applied to email messages, ...

This first sentence alone implies that users of DKIM can protect
the integrity of email with digital signatures just as digital
signatures are used in other systems.

1. Minimal changes to the message.
2. Only allow modifications explicitly permitted by RFC2822.
3. Do not alter the semantics of the message.
4. Do not provide a reasonable opportunity for abuse.


How about something like the following to replace relevant parts of
the DKIM spec (it includes a variation to simple and nowsp, along
with added a new algorithm, minwsp):

3.4.  Canonicalization

  In a typical cryptographic system, any form of data modification
  of digitially signed data is not tolerated, leading to signatures
  validation failures.  Such systems normally have a data transport
  system (or assume one exists) that will not alter the contents of
  the data.  Unfortunately, with email, the transport system cannot
  guarantee such reliability.  Empirical evidence demonstrates that
  some mail servers and relay systems modify email in transit.

  To address this problem canonicalization algorithms are defined
  to address the common mutations that occur in email while not
  compromising the integrity of the data signed.  If a DKIM digital
  signature is deemed valid, the user must have a reasonable level of
  trust that the data signed represents what the signing agent signed.

  Since signers may have different tolerencees to what mutations are
  acceptable, DKIM defines more than one canonicalization algorithm
  to to address different levels of tolerances: simple, minwsp, and
  nowsp.  When no canonicalization algorithm is specified, simple
  MUST be used.

  Canonicalization simply prepares the email for presentation to the
  signing or verification algorithm. It MUST NOT change the transmitted
  data in any way.

  Data passed into the canonicalization algorithm MUST have lines
  terminated in CRLF format, as specified in RFC-2822.

  In all cases, the header field of the message is presented to the
  signing algorithm first in the order indicated by the signature
  header field.  Only header fields listed as signed in the signature
  header field are included. The CRLF separating the header field
  from the body is then presented. Canonicalization of header fields
  and body are described below.


3.4.1.  The "simple" canonicalization algorithm

  The simple algorithm is designed to be the least tolerant.  However,
  this does not imply that no mutation is allowed.  For header data,
  RFC-2822 explicitly defines semantics that allow for variation
  in data.  For example, header field names are case-insensitive,
  "SUBJECT" is equal to "Subject" which is equal to "SuBjEcT".

  For "simple" the following canonicalization header field rules are
  applied before before signing or verifying:

    1. Each field is unfolded as specified in RFC-2822.
    2. Each field name is converted to lowercase.

  For the body, the following canonicalization rules are applied:

    1. LWSP at the end of the body is removed.


3.4.2.  The "minwsp" canonicalization algorithm

  The minwsp algorithm is designed to deal with common whitespace
  modification that may happen during transit.

    1. Strip all WSP characters at the end of each line of a header field,
       before any unfolding is done.
    2. Unfold any fields that are folded.
    3. Convert field names to lowercase.

  For the body,

    1. LWSP at the beginning of the body is removed.
    2. All trailing WSP at the end of lines are removed.
    3. Any lone CR or LF is converted to CRLF.
    4. LWSP at the end of the body is removed.


3.4.2.  The "nowsp" canonicalization algorithm

  The nowsp algorithm is the most liberal of the canonicalization
  algorithms:

    1. Strip all WSP characters at the end of each line of a header field,
       before any unfolding is done.
    2. Unfold any fields that are folded.
    3. Convert multiple WSP characters into a single SP character.
    4. Convert field names to lowercase.

  For the body,

    1. Remove all CR, LF, and WSP.


--ewh


<Prev in Thread] Current Thread [Next in Thread>