[Top] [All Lists]

Re: Response

1994-01-26 14:27:03
Okay, I'll bite on this one.

Again, I disagree that you must know it is Japanese, and, therefore, 
that you need profiling information to tell you this fact.  I shall 
prove this below.

Given a 10646/Unicode plain text without any profiling, then display the 
text as follows:

[ clever algorithm deleted ]

     -) short sequences containing neither Kana nor Hangul characters may
        be resolved by determining whether every character in the sequence
        maps to some character in a particular national standard; in the
        case that every character in the sequence maps to more than one
        national standard, then choose the first standard given some
        prioritization based on locale (e.g., choose Japanese standard if
        Japanese locale)

Hmmm.  Use of a locale sounds pretty much like "external profiling" to me.

However, I *do* appreciate your suggestions, because they form the first
concrete proposal I've seen for dealing with Ohta-san's complaints (as I
understand them) in a way that's compatible with MIME charset labelling.

The crux of the matter is whether or not such display is deemed to
be acceptable in the case that a wrong font (or glyph) is chosen to
display a given character (e.g., choosing a Japanese font to display
a unified CJK ideograph contained in a Chinese text).

As far as I know, MIME does not specify any criteria for typographic
acceptability.  In the absence of such criteria, it is not possible
to make a negative judgement about correctness or acceptability of
the above algorithm.  The purpose of the algorithm was to display
each character with some glyph and that this algorithm performs this
is plainly evident.  Therefore, in the absence of a criteria for
typographic quality, this algorithm *is* correct and serves the
requirement for a MIME client to display a 10646/Unicode text.

My recollection is that the "unique mapping of characters to glyphs" prose
was invented to disallow ISO 646 variants where a given code point can mean
one character in one national variant and a completely different character in
another.  In retrospect, the word "Glyphs" was too specific (and not quite
what's intended), but "characters" would not have been precise enough.  The
"no external profiling" prose was intended to disallow lumping all of the ISO
2022 switching sequences into a single MIME charset.  It's impractical to
support the entirety of ISO 2022 (new charsets can always be added), so a
client needs to know which of the charsets are being used before it can
decide whether to display the body part.

It's true that MIME doesn't specify criteria for typographic acceptability,
and for most purposes that's not a problem.  I don't care whether an upper
case A is rendered as a stick figure, or in Courier or Times Roman; my brain
sees that as an "A".  But substituting a Greek alpha would confuse me, even
though the two characters are similar in appearance and have a common

If the precise forms of the characters are important to those who use the
language, the unified ideographs may well be sufficiently different from the
character desired to violate the intent of the "unique mapping" MIME charset
requirement.  In short, I think Ohta-san has a valid point which should not
be dismissed out-of-hand or by claiming that it doesn't exist.

Should MIME decide that it will establish a criteria of typographic
acceptability for displaying character text, then it would have to
describe how an multilingual European text encoded with ISO 8859-1
could, without profiling, "acceptably" display distinct language
sequences with distinct fonts; or, how a multilingual Arabic and
Turkish text encoded with ISO 8859-6 could, without profiling,
"acceptably" display distinct language sequences using, say Naskh
versus Ruq`ah styles of Arabic as appropriate to Arabic vs. Turkish
written language customs.

This is also a good point; the problem is not specific to 10646.

(But if we're being pedantic, does the *definition* of 8859/6 give multiple
possible appearances for some characters?)


I suggest that the question of whether 10646 violates the MIME spec
in minor ways is of secondary importance.

The more important question is: 

     | Is the Internet really better off without 10646 in MIME? |


It appears that although 10646 is imperfect (some would say sorely lacking),
it's the best technical solution yet devised.  An improvement may be
forthcoming, but it's probably years away, and it would help to have some
real experience with using 10646 to guide the next version.

Registering 10646 as a MIME charset doesn't mean that everyone is going to
use it, or that the other charsets will go away.  Ultimately, people will use
what works best for them, if they have the opportunity to choose.

Registering 10646 in its current state doesn't mean that it won't be improved
later.  If the ideograph unification problem is annoying enough, someone will
devise a solution.

Regardless of what we say here, 10646 is not going to go away.  We don't have
that much clout.  NOT providing for 10646 might be really damaging to MIME,
in comparison to other email systems that allow 10646.  Do we want to take
this risk?

Finally, registration of 10646 as a MIME charset is NOT an endorsement of
10646, or any particular use of 10646.  It just gives a way to label it for
those who do want it.

If we decide that MIME really does want to be able to have 10646, then we may
want to change the "unique mapping" prose for the Full Standard to very
clearly allow it.


It seems to me that the best thing we can do is to make 10646 as good as
possible for MIME, without making it incompatible with other anticipated 
uses of 10646.  Glenn Adams's suggestions as to how 10646 might be
displayed seem to have the right intent -- though others may have better

But I'm very tired of seeing the same arguments over and over.  So at this
point I'll strongly suggest that those who insist that we should prevent all
use of 10646 in MIME be silent for the time being.

Meanwhile, those who favor 10646 in MIME should continue making their
proposal as good as they can, keeping in mind the expressed concerns of those
who have problems with it.  

When the proposal is finalized those who don't like it can address their
concerns to the specifics of that proposal.

Now, as to rules:

Simple registration of a MIME charset doesn't require the consensus of any
working group.  For better or worse, the rules allow a non-standards-track
proposal to be submitted for publication as an RFC EVEN IF SOME PEOPLE HAVE

If the final 10646-in-MIME proposal as submitted for publication still isn't
acceptable to some people, they are better off taking their complaints to the
RFC editor, than trying to get agreement here that 10646 is a bad idea.


Keith Moore

<Prev in Thread] Current Thread [Next in Thread>