Re: Last Call: <draft-ietf-dane-openpgpkey-07.txt>

On 02/15/2016 05:46 PM, John C Klensin wrote:


--On Monday, February 15, 2016 4:33 PM +0100 Harald Alvestrand
<harald(_at_)alvestrand(_dot_)no> wrote:

Note that the user understandability of "only lowercase if
it's all ASCII" is zero.

If ARNE matches arne, but BLÅBÆR doesn't match blåbær, any
user from an extended-ASCII country (which is *all* Latin
script using countries, even though the non-ASCII variants in
English are rarely used) will be mighty confused.

Indeed.

However, that is exactly the decision we made with IDNA (both
the "2003" and "2008" versions and, as there, may be
justification for really strong advice for treating email
addresses (both local and domain parts) as lower-case only.   

Harald, I am confident you know all of this, but others may
not...  The idea of requiring that mailbox names be treated as
all lower case was discussed during the work leading up to RFC
1123 and again in DRUMS (pre-2821).  The community reached what
appeared to me as fairly strong consensus that we just couldn't
do it.  Part of the problem was that, at the time 821 was
written (and maybe as late as the time of DRUMS) there were
still systems around that operated upper-case-only and had only
the vaguest idea what lower case was.  Another part was that
Unix (and Multics) and some of their successors were very
case-sensitive in general: "foo" and "Foo" and "foO" were
unambiguously three different names.

Because of that history and consensus, the strong suggestions in
5321 are about as far as one is going to get as far as
restrictions/ recommendations on the mailbox names themselves
and the "don't try to guess" rule probably isn't going anywhere.

In retrospect, we dodged a bullet because, for mailbox local
parts, ARNE does not, in terms of anything a sender is allowed
to predict, match arne.  That BLÅBÆR doesn't match blåbær
may still be a surprise to some, but it is not more or a
surprise.

From that perspective, the problem facing DANE is that either

the basic "if they are not identical, they don't match" rules is
applied or there is a need to invent rules different from the
email rules and that de facto modify the email rules by
restricting the syntax of a mailbox if there is any possibility
a DANE DNS record will be used with it.  Nothing I'm aware of
(other than probably the WG Charter) prohibits DANE from
proposing an update to 5321 and 6530ff, but the history (and
probable side-effects that no one has tried to analyze) predicts
that the idea won't easily get community consensus.


Yep. I'm sympathetic to the quandary of DANE.

Our strong advice was basically "if you (the recipient's mailbox
manager) depend on case  differences to tell mailboxes apart, you are a
fool; if you (the sender) depend on case not mattering, you are a bigger
fool."

DANE is an algorithm for the *sender* to look up information about the
*recipient*'s mailbox in the DNS, which means that the whole experiment
depends on the sender (who has no idea of what or where the recipient
is) being able to construct exactly the same hash that is generated by
the recipient - incompatible with the two pieces of advice I have
abstracted out above.

A possible way out (strawman!!!!) would be to say:

- All recipient participants in the experiment MUST agree to ignore case
differences in mailbox names. This has no effect on non-participants, so
we can possibly get consensus for that.

- All code in the experiment MUST use a particular algorithm to generate
the LHS lookup key
(I would suggest toLowerCase(NFC(string) in the C locale) off the top of
my head - but one could also argue for caseFold(NFC(string)) or
NFC(caseFold(string)) - and the people choosing had better know the
difference)

The case operations referenced are in Unicode 8.0.0 section 5.18 - I
*strongly* recommend actually reading that chapter, and not making the
(invalid) assumption that calling toLower() in some random library will
actually do something compatible with this.

I don't think anything less precise has a chance of being interoperable.

BTW, this text from the draft is obviously not saying what it intended
to say:

   o  The user name (the "left-hand side" of the email address, called
      the "local-part" in the mail message format definition [RFC5322]
      and the local-part in the specification for internationalized
      email [RFC6530]) should already be encoded in UTF-8 (or its subset
      ASCII).  If it is written in another encoding it should be
      converted to UTF-8 and then hashed using the SHA2-256 [RFC5754]
      algorithm, with the hash truncated to 28 octets and represented in
      its hexadecimal representation, to become the left-most label in
      the prepared domain name.  Truncation comes from the right-most
      octets.  This does not include the at symbol ("@") that separates
      the left and right sides of the email address.

As written, it states that hashing is only applied to strings that are
not originally in UTF-8 - but the "for example" text below makes it
clear that this is not intended.

Replacing "and then" with ". The string is then" would fix the problem.