ietf
[Top] [All Lists]

Re: Last Call: <draft-ietf-dane-openpgpkey-07.txt>

2016-02-15 02:37:02
Hello,

Thank you, John, for your detailed comments on the i18n aspect of this
draft, which I admit I hadn't fully considered.  I think you're right
that, whatever approach is taken, it would make sense to add a short
"Internationalization Considerations" section to state what the expected
interaction is between this specification and non-ASCII addresses.

More comments inline below:

Temporarily and for purposes of discussion, assume I agree with
the above as far as it goes (see below).   Given that, what do
you, and the systems you have tested, propose to do about
addresses that contain non-ASCII characters in the local-part
(explicitly allowed by the present spec)?  Note that lowercasing
[1] and case folding are different and produce different results
and that both are language-sensitive in a number of cases, what
specifically do you think the spec should recommend?  

I have not seen any specific examples of software which unintentionally
converts characters to uppercase (although I can readily imagine such
bugs/features), so I'm prepared to assume that the lowercasing logic can
be safely limited to just the input strings which include only ASCII
characters.  My idea was for the client to make a reasonable effort to
correct for a plausible (but rare) problem, so for the purposes of an
experiment I think it is acceptable if this correction does not try
anything more clever, like converting 
MUSTAFA(_dot_)AKINCI(_at_)EXAMPLE(_dot_)COM to
mustafa.akıncı@example.com (although 
mustafa(_dot_)akinci(_at_)example(_dot_)com should
be tried).

Also, do you think it is acceptable to publish this document
with _any_ suggestions about lower-casing or "try this, then try
something else" search without at least an "Internationalization
Considerations" section that would discuss the issues [1] and/or
some more specific recommendation than "try lowercase" (more on
that, with a different problem case, below).

You are right that adding such a section could be of great benefit to at
least some implementers, even if the discussion in that section is
simply "Only try lower-casing when the input is all ASCII".  If someone
can come up with something more helpful than that brief statement, then
I'd be very supportive of it.

Dropping that assumption of agreement for discussion, I
personally believe that this document could be acceptable _as an
Experimental spec_ with any of the following three models, but
not without any of them:

 (i) The present "MUST not try to guess" text.

 (ii) A recommendation about lowercasing along the lines
      you have outlined but with a clear discussion of i18n
      issues and how to handle them [2].

 (iii) A clear statement that the experiment is just an
      experiment and that, for the purposes of the experiment,
      addresses that contain non-ASCII characters in the local
      part are not acceptable (note that would also require
      pulling the UTF-8 discussion out of Section 3 and
      dropping the references to RFC 6530 and friends).

Perhaps you would settle for an option (ii.v) which is my lowercasing
recommendation + a discussion of the i18n issues + that discussion being
based on the experimental restriction of only applying the lowercasing
logic to ASCII-only local parts.  I hope that would be in keeping with
your sensible suggestions above.

...
e.g., 
   U+0066 U+006F U+0308 U+006F   and
   U+0066 U+00F6 U+006F
are perfectly good (and SMTPUTF8-valid) representations of the
string "föo"    

Using the same theory as your lower case approach, would you
recommend trying first one of those and then the other [3]?

That is tempting, but I accept that it may be too much unnecessary
complexity to suggest or recommend it at this stage of the experiment. 
I know that various ideas have been proposed for handling normalisation
of local-parts more generally, and I think we should allow that work to
progress separately, uncoupling it from the document at hand.

The more I think about it, the more I'm convinced that the
specification and allowance for UTF-8 [4] in the first bullet of
Section 3 is unacceptable without either text there that much
more carefully describes (and specifies what to do about) these
cases or an "Internationalization Considerations" section that
provides the same information.  I suggest that anyone
contemplating writing such text carefully study (not just
reference) Section 10.1 of RFC 6530.   Of course, simply
excluding non-ASCII local-parts from the experiment, as
suggested in (iii) above, would be an alternative.  I have mixed
feelings about whether it would be an acceptable one for an
experiment.  I am quite sure it would not be acceptable for a
standards-track document when the EAI work and/or the IETF
commitment to diversity are considered.

I think that excluding non-ASCII local-parts from just the extra
lower-casing logic, and pointing out the complexity of case handling in
non-ASCII contexts in a separate section as you have suggested, might
address the outstanding concerns, without hindering diversity.

...
[2] I note that, historically, the DNS community has been very
reluctant to accept techniques that depend on or imply multiple
lookups for a single perceived object and, separately, for
"guess at this, try it, and, if that does not work, guess at
something else" approaches.  Unless those concerns have
disappeared, the potential for combinatorial explosion when
lower-casing characters that may lie outside the ASCII
repertoire is truly impressive.

That's another reasonable point, thank you.  Hopefully it is mitigated,
at least for the most part, by settling for only lower-casing characters
for all-ASCII local-parts, avoiding the combinatorial explosion you
mention.  Also, this extra lower-casing step will only happen in the
relatively rare situations where the input local-part contains at least
one upper-case character (although I don't know in practice how many
extra lookups that will lead to, on average).

Best regards,
Edwin