Hmmm. Use of a locale sounds pretty much like "external profiling" to me.
I thought "external profiling" meant information that had to accompany the
message, but perhaps I was misinterpreting. Even assuming a locale is
"external profiling", this still only affects the latter stages of Glenn's
algorithm, which are aimed at achieving optimal as opposed to basic
legibility.
It's true that MIME doesn't specify criteria for typographic acceptability,
and for most purposes that's not a problem. I don't care whether an upper
case A is rendered as a stick figure, or in Courier or Times Roman; my brain
sees that as an "A". But substituting a Greek alpha would confuse me, even
though the two characters are similar in appearance and have a common
ancestry.
If the precise forms of the characters are important to those who use the
language, the unified ideographs may well be sufficiently different from the
character desired to violate the intent of the "unique mapping" MIME charset
requirement. In short, I think Ohta-san has a valid point which should not
be dismissed out-of-hand or by claiming that it doesn't exist.
This is indeed an important point. However, the Han unification in
10646/Unicode, accomplished under the aegis of the CJK-JRG
(China/Japan/Korea Joint Research Group, which was composed of nationals
from China, Japan, and Korea, all of whom were members of national
standards bodies of their countries), followed the principle that
characters were unified only under certain conditions, layed out in more
detail in "The Unicode Standard, Version 1.0, Volume 2", which itself took
the discussion from CJK-JRG Document 3-28, "Explanatory Notes for the
Unified Ideographic CJK Characters Repertoire and Ordering, Version 1.0".
There are many requirements that had to be met before two ideograms would
be unified, but one was definitely that the two characters have the same
"abstract shape", meaning that the component structure, number of
components, relative position of components in each complete character,
structure of a corresponding component, treatment in the source character
set, and radical contained in the component, all had to be identical or the
characters were not unified. Other rules which prevented unification
include characters having similar shapes that were unrelated by historical
derivation, and characters that are distinct within one of the source
national character sets.
The upshot of this is that legibility of such text does not require that a
font be used which corresponds to the language the text is written in, as
Glenn points out in the first step of his algorithm. Chinese text displayed
in a Japanese font, or vice versa, should be legible and comprehensible to
readers of those languages, even if the typographic quality is not optimal.
This is in keeping with the Unicode principle of minimal legibility. If
someone claims that there is a unified Han character which would not be
comprehensible and legible to a reader if displayed using the wrong font
(assuming the font does in fact have a glyph for that character, of
course), or that there is a string of text in Chinese, Japanese, or Korean
which would not be legible (given that the font contains glyphs for all the
characters in the text), would they please provide a concrete example
rather than speaking in generalities.
As an aside, I am told by people who are thoroughly familiar with and
literate in both Chinese and Japanese that the variations one finds in
characters among Japanese fonts is often greater than the typographic
variations between Japanese and Chinese fonts (for example).
(lots of good advice deleted)
Meanwhile, those who favor 10646 in MIME should continue making their
proposal as good as they can, keeping in mind the expressed concerns of those
who have problems with it.
When the proposal is finalized those who don't like it can address their
concerns to the specifics of that proposal.
That is why I posted the proposals to these mailing lists in the first
place. Unfortunately, I've received a lot of criticism of 10646 and
Unicode, but almost no comment on the proposals. I must therefore assume
they were perfect as distributed :-) :-).
Seriously, I would very much like to receive comments on the specifics of
the proposals that I posted. If anyone missed the documents when they were
originally posted, please e-mail me and I will be happy to send you
Postscript and/or plain text versions. The only technical (as opposed to
editorial) change we plan to make is the elimination of
"content-transfer-encoding-like" features of the UTF-7 mail-safe variant,
specifically the rules lifted from quoted-printable about line breaks,
white space, and so on.
----------------------------
David Goldsmith
david_goldsmith(_at_)taligent(_dot_)com
Taligent, Inc.
10201 N. DeAnza Blvd.
Cupertino, CA 95014-2233