On Tue, Sep 15, 2009 at 12:16:44PM -0400, John C Klensin wrote:
--On Tuesday, September 15, 2009 15:28 +0100 Kurt Zeilenga
I strongly oppose such an 'or' as SASLprep and Net-UTF-8 uses
different Unicode normalization algorithms.
Well, not really.
RFC 5198 says 'all character sequences SHOULD be normalized
according to Unicode normalization form "NFC" (see Section 3).'
RFC 4013 says 'This profile specifies using Unicode
normalization form KC, as described in Section 4 of
Now, NFKC processing is a proper superset of NFC processing.
An implementor that stores NFC strings will not interoperate with any
peer that sends query strings in NFKC. That's because a peer could send
a query string that doesn't match any storage string without additional
normalization of the storage strings!
I think the right answer is to leave _query_ strings unnormalized and
require that _storage_ strings be normalized (see my separate reply on
that general topic, with a different Subject:, just now).
(Nodes that store strings have to have enough normalization code to
validate the normalization of query strings, if query strings are
required to be normalized. Expecting implementors to normalize query
strings is not that big a deal. Peers that send query strings will
typically need to be able to normalize too, for local reasons, but
there's no obvious reason why there must be such local reasons.)
Then the choice of normalization form for storage strings only affects
peers that read them back -- which is enough to justify requiring the
use of a normalization form for storage strings. The choice of K or not
K then can be conceivably left to the implementor (provided peers are
required to support non-K when reading back storage strings).
Ietf mailing list