ISO? DIS? 10646 and RFC-XXXX

  I've been away for a few weeks, and just wanted to pick up the one
issue I know something about before I disappear for another week.

On ISO DIS 10646....
  On June 5, Keld wrote...

Oh, why is 10646 a mistake? 10646 is the international standard (if
everything works OK it will be approved tomorrow) with all characters
in the world (some are still missing due to validation of the right
way to do them).

  Well, it didn't "work OK".  There were negative votes; ISO/IEC
JTC1/SC2 now needs to respond to those negatives.  But, as has been
mentioned elsewhere on the list, the WG within SC2 that is actually
responsible for the development of 10646 has started banging out a
compromise with some of the negative comments and some of the UNICODE
advocates/issues.  I think it is reasonable to assume from this that
the WG no longer believes in the draft of 10646 that was circulated for
DIS vote and that it will not try to push it forward in its present
for.  SC2 could, of course, ignore the WG, but that would be a very
rare and unusual event given the way ISO SCs usually work.
  Assuming that SC2 does not repudiate the WG and the embroynic
compromises, the draft is nearly certain to be revised sufficiently to
require another round of DIS balloting.  That puts "approval" into 1992
at least, more likely a year from now at the earliest.
  What is more important is that some things are under discussion for
the revision that challenge some of the assumptions that have floated
around these lists.  For example, compaction method 5 level 2 may
disappear.  And there is a proposal to populate most of the C0 and C1
positions with graphics (glyphs).  If those things actually happen--and
I think it will take some months before we can even make a good
estimate--the assumption that ISO8859-1 "is" 10646 may be a casualty.

  That is important because the assumption, most recently expressed by
Philippe-Andre Prindeville on the 4th, has been:

Further, because 8859-1 is a subset of ISO 10646 (or will be when
it is approved), it provides an easy transition path to a broader
encoding scheme.  If one simply defines G/P/R as 32/32/32, and
Compaction Mode 1 as the default for our profile, then we are
inherently ISO 10646 compatible.

  To paraphase an old popular song, "it ain't necessarily so".

  Moreover, as several people have noted, notably Randall Atkinson on
the "other" list around the 28th of May, some variations on 10646,
especially with the C0 and C1 spaces inhabited by graphics, may cause
transport problems with the recognition of CR, LF, and ".".  There are
three ways to avoid those problems that I know of:
  -- require 10646 to be transported only in base64 format.
  -- create special escapes and require that they be used whenever an
octet of 10646 might be mistaken for a CR or LF (this would, basically,
be a new transport encoding)
  -- modify the transport protocol to explicitly specify the number of
octets per "character" or the character set itself so that transport-
delimiter characters can be identified in an unambiguous way.
   Note that "quoted-readable" is not mentioned here because it isn't
10646 but a different set of character conventions that are isomorphic
with 10646.  Putting it in the above list would be equivalent to the
alternative "don't transport 10646 at all, transport only character
sequences that can be reverse-translated into 10646".

One should now repeat the principle, articulated nearly six months ago,
that anticipating Standards is not a good idea.  Given the present
situation of uncertainty and the transport-affecting issues above that
cannot, I believe, be properly resolved until we find out what is going
to be in the Standard (or even in the next DIS draft), it is probably
appropriate that the reference to 10646 in RFC-XXXX be changed to an
explicit placeholder, to be the subject of amendment or new RFCs when
the thing arrives.


On the other hand, there are a few things about the late DIS 10646
version that are probably stable and may be of use to us.  In
particular, while more characters may be added (and almost certainly
will be), and some changes might occur as a result of Han unification
(if that goes through), the character-naming model is likely to be
pretty stable.  So it is probably sensible to use those names if
univeral character names are needed for something.  There is no "more
universal" list around, and, as far as I know, the Unicode character
name list is completely compatible with it.  I suppose we should all be
thankful for small favors.

  ---john