Re: Proposals for 10646/Uni

I'm afraid that one of his proposal, ISO-10646-UTF7, is effectively of new
CTE rather than a new charset.


UTF-7 is somewhat like a content transfer encoding. We briefly considered
proposing it as such, but 1) in MIME, new content transfer encodings are
explicitly and strongly discouraged, and 2) it seems like it would only be
useful for 16 bit character sets which have 7 bit ASCII as a subset, of which
there are not many. So, we made it a transformation format of 10646/Unicode
instead.


Well, it might be a wise strategy. At least, MTAs do not have to take
care of UTF-7.

To your surprise, I rather welcome your additional proposal of
ISO-10646-UTF7 (not the first one: ISO-10646-UNICODE), because
it is a Sign of Babel.

Contrary to the common belief that UNICODE will be the future single
encoding, you have successfully demonstrated that it won't be.

BTW, is your ISO-10646-UNICODE big-endean, little-endean or bi-endean
with 0xff00?

Of course, because of Han unification, UNICODE is not a charset of MIME.


I don't quite understand this. According to the original MIME document (RFC
1341), 10646/Unicode was not listed as an initial charset for MIME was because
of the controversy between ISO 10646 and Unicode. No mention was made of Han
unification as a reason for excluding 10646/Unicode. Now that the
10646/Unicode unification is complete, there is no reason not to proceed.


The RFC1522 says:

:   This RFC specifies the definition of the charset parameter for the
:   purposes of MIME to be a unique mapping of a byte stream to glyphs, a
:   mapping which does not require external profiling information.

As you can see in the section 26 of ISO 10646, 436 pages of volume
is dedicated to show the differences of glyphs in G/T/J/K.

So, UNICODE, at least, needs G/T/J/K profiling information such as:

        charset=ISO-10646-UNICODE-K

But, I'm afraid you can't understand the unification problem.

A, hopefully, more obvious point on how UNICODE as is can not be MIME
charset is in Section 23.3 of the ISO:

        The rules for forming the combined graphics symbol are
        beyond the scope of ISO/IEC 10646.

Again, the mapping rule from codes to glyphs are not given.

So, can you understand that, as a MIME charset, you must drop all
the combining characters of ISO 10646, unless you give all the
rules to combine them?

That is, you must drop, Arabic, Thai, Devanagari and so on.

                                                Masataka Ohta

PS

For more information on why ISO 10646/UNICODE is no good and how
can it be improved, see:

        "Character Encoding Method for Internationalized Plain
        Text Processing", Proceedings of 8th International Joint
        Workshop on Computer Communications, Masataka OHTA,
        Dec. 1993.

electric copy is available from me.