closure & character sets


  I've read the preceding character set discussion with some interest.
The ability to use international character sets is my own primary
interest in this working group as I have done work in multilingual
computing for several years now (including working to create European
and Japanese versions of software whilst I was with GE) and myself
normally use several languages.

  I consider the inclusion of label values specifying each of the
ISO-8859-X character sets (for each value of X defined by ISO) to be
essential for interoperability of multilingual messages.  These
standards are widely implemented (DEC & HP have been shipping ISO-8859
terminals for years now and other vendors also are supporting many of
the ISO-8859 family).  If we don't formally specify ISO-8859,
interoperability will suffer needlessly.  This is much less
experimental than most of the other material in RFC-XXXX and should be
included.  Note that I would be very unhappy if we included formal
specifications for the ISO-646-N family of character sets because they
have been superceded by ISO-8859 and the inclusion of ISO-646-N labels
would encourage their use in lieu of ISO-8859-X and would reduce
interoperability.

  Mark & Erik have made it quite clear that there is in fact a lot of
implementation experience with the 'iso-2022-jp' scheme.  It is clear
to me and others who are interested in CJK computing that some form of
CJK support is essential, so I support the notion of defining the
'iso-2022-jp' label and providing a reference to the JUNET document
that Erik cited in an earlier message and indicating that a future RFC
will try to provide the essential implementation details in English.
This is to say that the label should be defined, but that
implementation of 'iso-2022-jp' (i.e. being able to display the glyphs
or transliterate them into some alphabetic representation if
necessary) should NOT be made mandatory for conformance to RFC-XXXX.

  I agree that it is premature to standardise ISO-10646 and its
representation.  It is however highly desirable to continue to include
"rationale" text in the RFC indicating that future support for a
universal character set such as ISO-10646 is highly desirable given
the multilingual and multinational nature of the current Internet.  It
is also wise to continue to include "rationale" text strongly
discouraging the adoption of other private character sets.  The
proliferation of many character sets in Internet mail would impede
interoperability.  Specification of the use of ISO-10646 in Internet
Email should be deferred to another RFC and should not be undertaken
until after ISO-10646 is finally approved by the ISO.

  While Keld and I have had our disagreements, I respect his basic
command of the facts and think that RFC-CHAR (as it is called) is
reasonably complete.  For example, it appears to fully support
Vietnamese (unlike ISO 1st DIS 10646) for example.  There are problems
in the handling of CJ ideograms (e.g.: differing pronunciations of the
same character and the very high incidence of homonyms make phonetic
representation difficult or impossible; a huge lookup table would be
required to implement RFC-CHAR on a system using anything other than
ISO 2DIS 10646 if the CJK ideograms are supported), but those appear
to be inherent in any alphabetic encoding of such ideograms.  It isn't
clear to me that RFC-CHAR should be made an Internet Standard, but it
clearly should be published as it is very useful within at least
European languages and perhaps for all alphabetic languages.


SUMMARY:

  Omission of ISO-8859-N support would be a ship-stopper.
  Prohibition of ISO-2022-JP support would be a ship-stopper.
  RFC-CHAR should be published at least informationally.
  ISO-10646 should be mentioned in rationale text with 
    specification and non-experimental use deferred to an RFC
    written after it becomes final.
  Use of all other character sets should be explicitly discouraged.
  The namespace other than beginning with "X-" should be reserved
    for future use by Internet Standards & RFCs.

Ran
atkinson(_at_)itd(_dot_)nrl(_dot_)navy(_dot_)mil