Keith Moore wrote:
First, it's naive to assume that UTF-8 will be the native
representation on everybody's platform
Clarification. I did not say anything about UTF-8 on platforms, but
instead cited native representation in protocol messages. However,
if UTF-8 is the encoding of choice for a particular protocol's data or
message formats, then we know that UTF-8 is also going to be incorporated
into the necessary supporting functions at the participating end-points.
This doesn't mean that the whole box has to use UTF-8. Frankly, I don't
even think that's relevant. Instead, it is a question of whether or not
the related components (searching, as in the previous example) will be
likely to deal with UTF-8, rather than having to selectively graft an
exraneous encoding into select portions of that service in order to
provide simple functionality (as with 2047 and searching, again).
But this is also entirely irrelevant. By your argument that the transfer
encoding is irrelevant, I would like to hear your arguments as to how,
say, using EBCDIC to pass ASCII data around could possibly be seen as
reasonable design. Of course the native encodings are always best. The
fact that most of the apps are heading towards UTF-8 should tell us that
we should be designing for a long-term support infrastructure that
provides the data in the format it is going to be used in. Furthermore,
whenever the remaining services get upgraded or replaced, they should be
able to use something a little better than the best technology that 1968
money can buy.
Second, the portion of IDNA that does ASCII encoding is such a trivial
bit of code that the number of failures introduced by that code will
pale in comparison to those introduced by the other code needed to
handle 10646 (normalization, etc) which would be needed no matter what
encoding were used.
Getting new problems in addition to shared problems is hardly an argument
in your favor. You've already conceded that 2047 has some problems with
transliteration goofiness, and that restricting it to unstructured data
limits the real damage that is caused. Are we to believe that extending
structured data with mandatory transliteration will not cause the problems
you thankfully avoided?
Numerous examples demonstrate that transition issues are often
paramount in determining whether a new technology can succeed.
I agree that transitional services are important. I also think that the
evidence shows that end-station layering works well when existing formats
are used as carriers for *subsets*, and when it is targeted to a specific
usage. That isn't what's being done here, though. Instead, well-known and
commmonly-used data-types will get *extended* into broader and
incompatible forms by default, and it will happen purposefully and
accidentally. This is not transitional probing, it is knowing that stuff
will break and doing it anyway.
Cripes, why do we have to do it all in a big-bang? Can't we start with the
transfer encoding (no required upgrades for anything), incrementally add
transliteration where we know it will be safe and robust (some upgrades),
and then add UTF-8 for those newer services that can make use of it (some
more upgrades)? What is the problem with this?
Simplicity is often a virtue, but IDN is inherently complex - it
reflects the tremendous variety in the world's languages and
writing systems. And blind faith in some vague notion of
cleanliness is a poor substitute for engineering analysis.
That's almost a fair shot. I do put a bunch of faith into transparent
data-types and structures. Dunno about "blind". ASCII is always best when
it's encoded as ASCII, after all.
reliability. But the need to allow incremental upgrade of
legacy application components strongly compels IDNA, and the
incremental benefit of a native UTF-8 query interface beyond
that of IDNA does not appear to justify the additional complexity.
The complexity required for a direct UTF-8 name-resolution service in
conjunction with simple passthru-everywhere is minor in comparison to the
complexity of transliterate-everywhere.
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/