On Sat, 04 Nov 2006 02:03:38 -0000, John Levine <johnl(_at_)iecc(_dot_)com> 
wrote:
It's still an open question how Unicode is going to show up in mail
headers, with 8 bit UTF8 being only one of multiple possibilities.
More likely there will be some kludge to smoosh it into 7 bits so it
can transit through old MTAs.  I don't think anyone is opposed to DKIM
handling whatever happens, but I also don't think it's productive to
try to guess at this point which way it'll turn out.
Paul Hoffman has answered this well enough. All that is needed is that
DKIM should not fail to work just because it finds some octet with bit 8
set, because it is clear that whatever happens regarding unicode, such
octets are surely going to appear.
The present draft contains some advice:
       INFORMATIVE IMPLEMENTATION NOTE:  Although the "plain text"
       defined below (as "tag-value") only includes 7-bit characters, an
       implementation that wished to anticipate future standards would be
       advised to not preclude the use of UTF8-encoded text in tag=value
       lists.
Presumably whoever wrote that was satisfied that allowing such
UTF8-encoded text would do no harm. In which case, you may as well make
allowing it mandatory (or at least allowing the full 8bits, since the
question of the actual code doesn't matter at the moment, so long as ASCII
is a subset of it).
   l=  Body length count
This has been very contentious.  Personally, I will never put an l=
into a signature, but there are some vocal people who insist that it's
important for signtures to survive (some) mailing list software, so
it's there if they want it.
But the people who want it won't take kindly to having it deleted by
overly "helpful" verifier policy modules. Currently the draft suggests
that is a reasonable practice. It isn't, and the draft should not be
saying such things. By all means warn the user, and even provide the user
with tools to delete it. But don't chop bits out of his mail without his
approval.
The MUST is quite deliberate so that DKIM implementations will
interoperate.  You're welcome to do whatever you want to exchange
messages with your friends, but for mail to everyone else, you have
to use SHA-256 and dns/ext because that's what you know they'll be
able to handle.
But that is not what the draft says. It currenty says, in effect, that
signers MAY use either SHA-1 or SHA-256 (and in consequence verifiers MUST
accept both - that bit is not in dispute). But you cannot say, at the same
time, that signers MAY use SHA-1 and MUST use SHA-256 (or MUST implement
SHA-256 even if they have no intention of generating it). RFC 2119 just
does not allow you to use MUST in those sorts of ways.
You say "signers SHOULD convert the message to a suitable MIME
content transfer encoding such as quoted-printable or base64". That
sounds to me like a pretty strong discouragement to continue using
CTE 8bit.
I think that what the wording should say is that messages must be valid
RFC2822 (or maybe RFC822) messages.  ...
Messages haven't been valid RFC 2822 for a long time (ever since 8BITMIME
and now BINARYMIME).
You can sign whatever you want, but if the message is 7bit, your
signature is more likely to survive transit to the verifier.
DKIM doesn't understand MIME.  If DKIM signers and verifiers had to
unpack MIME parts they would be orders of magnitude more complicated.
In practice, I think that nearly everyone uses the simple body canon
anyway.
Not at all. Going through the MIME structure of a message body and undoing
all Q-P or Bas64 encodings is fairly straightforward, and if you hash and
sign the result of doing that, then it is guaranteed to pass straight
through all those systems which (quite legitimately under RFC 1652)
re-encode stuff en route, without breaking the signature. I shall try to
write a demonstration implementation in the next day or so, and it
certainly won't be "orders of magnitude more complicated".
You said in a later message that the "relaxed" body canonicalization
should be ditched because the things it protected against rarely happen.
Surely it would be better to augment it with something that would make it
proof against things that regularly _do_ happen, and happen with the full
blessing of IETF standards.
And, moreover, I do not see why the 'simple' canonicalization is the
default (or even why it even exists at all, for both headers and bodies).
Can anybody suggest a scam or threat that would be facilitated if
"relaxed" rather than "simple" was used?
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
_______________________________________________
NOTE WELL: This list operates according to 
http://mipassoc.org/dkim/ietf-list-rules.html