[Top] [All Lists]


1994-01-17 13:13:49
Subject:   Response                             Time: 11:53 AM   Date:
This is in response to an issue raised on BIDI a couple of weeks ago that was
sent to David.

Mark Davis

Date: 01/04/94 3:57 PM

I need to slip on my Area Director hat and inject a few policy
statements into this evolving discussion. I want to stress that they are
policy statements, and not technical ones. Discussion needs to continue
on the technical issues until something reasonably approximately
consensus is reached.  I don't think anything here contradicts what
Nathaniel and Ned told you.

(1) Please be very clear about whether you are discussing UNICODE or IS
10646-1:1983 ("BMP", henceforth in this note "10646").  While the code
point mappings are the same, the introductory and conformance texts
appear to be different.  It will be much easier for you to pick one and
use it than to prove that the two are the same and therefore, the two
can be taken as equivalent.  There is a deliberate bias in MIME toward
use of the ISO standard, so take that as a preference unless there are
strong reasons for using UNICODE.

The goal in producing the MIME proposal was to identify 10646 as the charset.
In the process of merging Unicode 1.0 and 10646, Unicode 1.1 has been brought
into full conformance with 10646. Unicode also adds additional information
that more exactly specifies the behavior of characters. As a practical matter,
I fully expect that the vast majority of 10646 implementations will be Unicode
implementations as well.

My personal preference (speaking without any official hats on) would be to
allow the specification of both 10646 and 10646/Unicode, where the latter
provides receivers with more information about the incoming text. In such a
case, a receiver that wants to distinguish between certain aspects of unmarked
10646 and Unicode can; others can ignore the additional information supplied.
In the absence of this information, receivers will probably assume that the
text is Unicode (but have no assurance of that fact).
(2) 10646 does not specify the presentation order of characters on,
e.g., a screen, relative to characters in the data stream.  For
languages whose characters are read from right to left, this implies a
profiling issue, since there are several methods in use of writing the
characters of those languages into the data stream.  This issue was
addressed in a special review at the Houston IETF and the conclusion was
that the character sets used with Hebrew and Arabic should be registered
three times each, to correspond to the presentation orders defined in
the relevant ECMA/ISO standard.  A "charset" definition for 10646 must
address this issue or it doesn't meet the profile-free MIME requirement.

The default internal ordering of characters within 10646 is logical order; it
also provides format codes for controlling the implicit and explicit
presentation order (0x200E, 0x200F, and 0x202A through 0x202C), or can be used
with other standards such as ISO/IEC 6429.

(Unicode provides an detailed algorithm for determining presentation order of
10646 characters within a line or paragraph--even in the absence of
presentation format codes. In the presence of presentation format codes, it
also specifies use of those codes, and their interaction with the implicit

However, since the default internal ordering of the characters is specified,
the semantics of the text is preserved by 10646, and can be used to transmit
information correctly even in the absence of any further presentation
information. What is not specified by 10646 is the precise details of the
graphical expression of the text.

[A point that often seems to be lost in these discussions is that *no*
character set specifies the precise details of the graphical expression of the
text; neither JIS, nor ASCII, nor 8859/x etc. None of them specify the exact
bits used to draw a character, the exact width of a character, the exact
placement of successive characters, nor the exact shape of a character.]

(3) Your text includes the statements:
    The United States bodies X3L2 and X3V1 have recently developed a
    character/glyph model whose main purpose is to clarify the use of these
    terms and provide examples of their usage.  This character/glyph model
    developed at the request of the relevant ISO bodies and has been
    both to SC2 and SC18 for formal approval.

As a general rule, IETF is very nervous about basing our work on
something that is halfway through the ISO process.  That rule has been
strongly reinforced, probably to "showstopper" level in this particular
area by the history of 10646.  When MIME was first being designed, the
working group was told, essentially, that 10646 was a done deal and
should be IETF-standardized on the basis of the DIS.  Unfortunately,
that was DIS-1, and the turmoil DIS-2 and the current Standard had yet
to happen and was, at that time, unexpected.  

Consequently, if the definition you are proposing depends on an emerging
piece of work in JTC1, and the quality or utility of that definition
would be changed significantly if JTC1 decides to do "something else",
your efforts are likely to go into IETF-hold until JTC1 does something

The work being done in this area is to help further refine the distinctions
between characters and glyphs, and in no way implies that 10646 cannot be
currently be implemented as it stands. The main goal of the character/glyph
model is to help the relevant subcommittees to catagorize future proposed
additions to 10646 to ensure that they are not better coded by SC18. This will
have no effect on the characters currently in 10646, and have no impact on
consideration of 10646 within MIME.

(4) As Nathaniel mentioned, the use of 10646 was specifically intended
as soon as it settled down enough to be adequated defined.  It was not
included in-line in RFC 1521 and its predecessor along with US-ASCII and
the 8859 group only because those definitions were not in place.  
Precisely because it is considered very important, you should assume
that we will want to place any definitional, external profiling, or
restrictive information that goes with MIME use of 10646 on the IETF
standards track.


<Prev in Thread] Current Thread [Next in Thread>