On July 18, 2005 at 11:36, Jim Fenton wrote:
Members of the list differ on what canonicalization is trying to
accomplish. We need to get consensus on that before we make much
progress on what the algorithm(s) themselves are.
It think it will help further that there is an agreement of
what the goals of DKIM are. From the Introduction:
DomainKeys Identified Mail (DKIM) defines a simple, low cost,
and effective mechanism by which cryptographic signatures can be
applied to email messages, ...
This first sentence alone implies that users of DKIM can protect
the integrity of email with digital signatures just as digital
signatures are used in other systems.
1. Minimal changes to the message.
2. Only allow modifications explicitly permitted by RFC2822.
3. Do not alter the semantics of the message.
4. Do not provide a reasonable opportunity for abuse.
How about something like the following to replace relevant parts of
the DKIM spec (it includes a variation to simple and nowsp, along
with added a new algorithm, minwsp):
3.4. Canonicalization
In a typical cryptographic system, any form of data modification
of digitially signed data is not tolerated, leading to signatures
validation failures. Such systems normally have a data transport
system (or assume one exists) that will not alter the contents of
the data. Unfortunately, with email, the transport system cannot
guarantee such reliability. Empirical evidence demonstrates that
some mail servers and relay systems modify email in transit.
To address this problem canonicalization algorithms are defined
to address the common mutations that occur in email while not
compromising the integrity of the data signed. If a DKIM digital
signature is deemed valid, the user must have a reasonable level of
trust that the data signed represents what the signing agent signed.
Since signers may have different tolerencees to what mutations are
acceptable, DKIM defines more than one canonicalization algorithm
to to address different levels of tolerances: simple, minwsp, and
nowsp. When no canonicalization algorithm is specified, simple
MUST be used.
Canonicalization simply prepares the email for presentation to the
signing or verification algorithm. It MUST NOT change the transmitted
data in any way.
Data passed into the canonicalization algorithm MUST have lines
terminated in CRLF format, as specified in RFC-2822.
In all cases, the header field of the message is presented to the
signing algorithm first in the order indicated by the signature
header field. Only header fields listed as signed in the signature
header field are included. The CRLF separating the header field
from the body is then presented. Canonicalization of header fields
and body are described below.
3.4.1. The "simple" canonicalization algorithm
The simple algorithm is designed to be the least tolerant. However,
this does not imply that no mutation is allowed. For header data,
RFC-2822 explicitly defines semantics that allow for variation
in data. For example, header field names are case-insensitive,
"SUBJECT" is equal to "Subject" which is equal to "SuBjEcT".
For "simple" the following canonicalization header field rules are
applied before before signing or verifying:
1. Each field is unfolded as specified in RFC-2822.
2. Each field name is converted to lowercase.
For the body, the following canonicalization rules are applied:
1. LWSP at the end of the body is removed.
3.4.2. The "minwsp" canonicalization algorithm
The minwsp algorithm is designed to deal with common whitespace
modification that may happen during transit.
1. Strip all WSP characters at the end of each line of a header field,
before any unfolding is done.
2. Unfold any fields that are folded.
3. Convert field names to lowercase.
For the body,
1. LWSP at the beginning of the body is removed.
2. All trailing WSP at the end of lines are removed.
3. Any lone CR or LF is converted to CRLF.
4. LWSP at the end of the body is removed.
3.4.2. The "nowsp" canonicalization algorithm
The nowsp algorithm is the most liberal of the canonicalization
algorithms:
1. Strip all WSP characters at the end of each line of a header field,
before any unfolding is done.
2. Unfold any fields that are folded.
3. Convert multiple WSP characters into a single SP character.
4. Convert field names to lowercase.
For the body,
1. Remove all CR, LF, and WSP.
--ewh