Re: Unicode is not an IETF character code

But note that my comment said that I thought we needed to have a clear
basis for believing that we could (and WOULD) deliver a better
solution.  To date, I haven't seen that, but it's no big deal if I'm
wrong.  My claim, however, is that a distinct effort to formulate
an Internet character set spec needs to be made and it needs to be
clear about the ways in which it will (must) be better than the
variety of existing or emerging specs.


Dave,
   Let me take another cut at what I think Ned is trying to say.
   ISO (pretending for a moment that it is a monolithic entity) has left
us (and other users of character set standards) in their usual messy
state, one for which we both share loathing.  There are a vast
plentitude of "ISO Character Set Standards", some of which don't even
come out of ISO/IEC JTC1 on Information Technology.  As a result, "adopt
ISO's solution" is a non-statement, no decision at all.
   In particular, none of these things says "this is the latest and
greatest, please forget about all of the others and transition away from
them".  Certainly, several of them have True Believers who would make
that assertion on their behalf, but "ISO" has made no such decisions. 
They haven't even offered standardized guidance about the tradeoffs.
   If one is going to have what we would normally consider a
"standard"--sufficient definition that interoperability is highly
probable--we have to make choices: choices among standards and, in at
least one or two cases, choices about how to profile standards.
   None of this requires "a clear basis for believing that we could (and
WOULD) deliver a better solution", only the painful realization that,
for our purposes, ISO has done only half the job and we can't use any of
it without making decisions about what to use and how to use it.  That
we could do better if we started over is a rational hypothesis, but it
isn't necessary to any of the discussions I've seen on this.

   For example, 
 (i) the discussions about character unification are really discussions
about whether IS 10646* is an appropriate "character set" for MIME
registration purposes absent additional profiling and noting that we did
endorse and document a specific profile as part of the registration/
definition process in doing ISO-2022-JP.

 (ii) the discussions about the appropriateness of sending IS 10646* as
32-bit base64 encoded, 16-bit base64 encoded, UTF-2 with some encoding,
or inventing an encoding of our own are, again, issues of profiling the
thing for our purposes: as far as I know*, ISO has provided us with no
guidance on the subject and indeed may still be semi-recommending UTF-1
for at least some purposes.  Note, in this context, that UTF-2 is not
part of IS 10646, but would be a decision on our part that we can
deliver a better solution by looking elsewhere.

As in your notes and Ned's, I am avoiding comment on the technical
merits of any of these options.

   --john
* I continue to find one aspect of this discussion interesting.  I
haven't seen IS 10646.  You haven't seen IS 10646.  Actually, probably
no one associated with IETF has.  ISO hasn't published it.  If ITTF (the
last pre-publication review in the ISO/IEC JTC1 process) has seen it and
signed off, I haven't seen the notice.  What we have seen is the 2nd
DIS.  A few of us have also seen all or parts of the fairly extensive
notes and minutes of SC2 that document agreement to post-DIS-ballot
changes.  But the text, no.   So there are interesting possibilities
about embracing vapor.