Re: Character set Detail Considered Harmful

Dave writes, in part...

The status of 8859 is not clear, from the References section of 
RFC XXXX.

   If this is so, then the References section needs to be improved.  The 
"status of 8859" is as follows.  "ISO 8859" describes a family of roughly 
a dozen International Standards which use a common structuring model and 
construction rules.  The specific Standards are known as, e.g., ISO 
8859-1, ISO 8859-2, etc.
   8859-1 (Latin alphabet 1), 8859-2 (Latin alphabet 2), 8859-6 (Latin/
Arabic), and 8859-7 (Latin/Greek) became ISO International Standards in 
1987.  Other members of the family came later, with the most recent one 
being published (that is, after the Standard became final) within the 
last six months.  There is, as my earlier note suggested, extensive 
experience with 8859-1, including hardware implementations in very 
large-volume products.  To pretend otherwise is, IMHO, not a sign of
care and conservatism about standardizing the untried but either
evidence of lack of willingness to review existing experience or an odd
variation of the "not invented here" model of which we so often
criticize ISO and CCITT. 

    Yes, there are other alternatives within the ISO arena, some of 
which have never been mentioned in these discussions.  And I've been one 
of the strongest advocates of avoiding tying ourselves to untried and 
incomplete proposals, and will continue to be.  But, unless people are 
willing to take the position that we don't need non-ASCII character sets 
until 10646 is finally approved, it seems to me to be totally 
inappropriate to ignore the 8859 experience and usage.

And I think similar arguments could be made for what we have come to 
refer to as 2022-jp.  Nothing experimental about that either.  Our 
problems with it are due only to the fact that, if there are official 
definitions, they are probably written in Japanese and in Kanji.  We 
have not figured out how to publish RFCs in that language and character 
set, or how to effectively reference documents that the majority of IETF
participants cannot read.  That is a problem, but it doesn't seem to me 
to be an RFC-XXXX problem, or a problem of a set of character set 
conventions that are experimental or not well-defined.

Appendix F, in that regard, it an attempt at an interpretative 
translation of the real definition.  Let me suggest a different way to 
handle it, not as a serious proposal at the moment but as something
people should think about as a means of clarifying the issue here 
(especially in the context of beliefs and desires about a truely 
international Internet Society and IETF):  Assuming that copyright 
regulations, etc., permit, an informational document should be submitted
immediately to the RFC editor for publication that contains the "real" 
specification of what we describe as 2022-jp but which, if I understand 
things correctly, is actually a Japanese (JIS) National Standard. 
Presumably that document is in Kanji and presumably the RFC editor can
figure out some way to cope with that.  As part of the coping process,
appendix F should be removed from RFC-XXXX and attached to the proposed
informational RFC as an informal guide to the specification for those 
who find reading technical Japanese excessively challenging.

Now I suggest that type of approach would make things procedurally 
cleaner (independent of causing Jon a lot of aggravation), but that it 
really doesn't change anything: what we call 2022-jp is in use, has been 
tested for interoperability on a variety of platforms, etc.

We really can separate the well-established and proven from the 
speculations, and we can do so without much trouble.  Saying "character 
sets are a mess, let's stick to ASCII" is unnecessary and really not 
much better than saying "communication is much easier if everyone uses 
English".
    john