Reply to: RE>>Response
(Asmus Freytag writes:)
Let's get this discussion back to where useful information is exchanged and
participants and observers learn something from the exchange of facts.
Maybe someone else can summarize the state of the discussion for all of
us, so we can see what issues are still open, and for which the discussion
as yielded some answers.
I will attempt to do so. A couple of people have stated that as defined,
ISO/IEC 10646 cannot be used as a MIME character set in the absence of
external profiling information due to issues revolving around directionality
of text and use of combining marks.
The text issue related to Hebrew and Arabic, and specific reference was made
to RFC 1555 and RFC 1556. In those RFCs, variants of ISO-8859-6 and ISO-8859-8
are defined as multiple MIME charsets. Specifically, there are -i and -e
variants, -i signifying implicit ordering and -e signifying explicit ordering.
The claim was made that (at least) the same needs to be done for 10646.
This is incorrect. My reading of RFCs 1555 and 1556 is that the variants were
defined because control sequences for specifying directionality were added. A
text stream with such sequences added cannot be claimed to be ISO-8859-x,
since that standard exists and does not define those sequences. Since control
codes for specifying implicit and explicit directionality already exist in
ISO/IEC 10646, an amended character set is not necessary, nor is a separate
designation. It is possible, within the context of 10646, to accomodate all
three of the options discussed in RFC 1556: "visual mode" (what Mark Davis
called "logical order"), implicit mode, and explicit mode. Therefore the
single character set designation suffices.
As for combining marks, the issue was the requirement that MIME charsets
unambiguously specify the translation from a byte stream to "glyphs" (the MIME
standard is silent as to what a glyph is). Assuming that 10646 is to be held
to the same standard as every other existing character set standard, including
all those already specified and accepted for use with MIME (ASCII and
ISO-8859-x), then there is no issue, as ISO 10646 specifies exactly as much
about the translation of character codes to screen display (i.e., almost
nothing) as all the other character set standards. No external profiling
information (other than the usual things like having the appropriate fonts
installed) is necessary to display a 10646 message. By external profiling
information, MIME means additional out-of-band information that must accompany
the text in a message. While the display of messages in scripts that use
combining marks is certainly complex, it is algorithmic and does not require
any information to be transmitted with the message beyond the 10646 character
sequence. Therefore I believe that the version of 10646 specified in our draft
document ("Encoding of ISO/IEC 10646-1/Unicode in MIME") meets the
requirements for a MIME charset.
I must say that the purpose of my distributing the drafts of my documents was
to solicit feedback on the best way technically to encode 10646/Unicode within
MIME, not a debate on the merits of 10646. I am pursuing this because of a
practical need by people making commercial use of 10646 and Unicode to have a
means of transmitting it via electronic mail. The 10646 standard exists, is
finished, and is in commercial use. At this point we should be discussing how
to use it, not whether it should exist or is perfect. I have received a few
useful comments, for which I thank the persons involved.
The only pending change in the documents right now is the elimination of some
features of the UTF-7 encoding of 10646 which duplicated aspects of the quoted
printable content transfer encoding of MIME. In retrospect, we decided these
were redundant. I plan to update the documents and take them to the next level
of the standards process [as soon as I figure out what that is :-)]. If anyone
wishes to make comments during this informal review, there is not much time
left. Anyone who does not have copies of the documents in question and wants
to review them should contact me immediately via e-mail, and I can send you
plain text or Postscript versions.
David Goldsmith
david_goldsmith(_at_)taligent(_dot_)com