ietf-822
[Top] [All Lists]

Character set issues for MI

1993-12-21 16:14:50
Character set issues for MIME/10646
From: Masataka Ohta 
<mohta(_at_)necom830(_dot_)cc(_dot_)titech(_dot_)ac(_dot_)jp>
BTW, is your ISO-10646-UNICODE big-endean, little-endean or bi-endean
with 0xff00?


As specified by ISO 10646 and in the documents I distributed, it is big
endian.

The RFC1522 says:

:   This RFC specifies the definition of the charset parameter for the
:   purposes of MIME to be a unique mapping of a byte stream to glyphs, a
:   mapping which does not require external profiling information.

As you can see in the section 26 of ISO 10646, 436 pages of volume
is dedicated to show the differences of glyphs in G/T/J/K.

So, UNICODE, at least, needs G/T/J/K profiling information such as:

       charset=ISO-10646-UNICODE-K

But, I'm afraid you can't understand the unification problem.

A, hopefully, more obvious point on how UNICODE as is can not be MIME
charset is in Section 23.3 of the ISO:

       The rules for forming the combined graphics symbol are
       beyond the scope of ISO/IEC 10646.

Again, the mapping rule from codes to glyphs are not given.

So, can you understand that, as a MIME charset, you must drop all
the combining characters of ISO 10646, unless you give all the
rules to combine them?

That is, you must drop, Arabic, Thai, Devanagari and so on.


I now understand the issue you are raising, and we are preparing a detailed
response. However, I will point out that ISO 10646 does not use the word
"glyph" anywhere, and if you interpret "glyph" to mean what the 10646 passages
you quote are referring to (namely, the visual form of a character), then none
of the ISO standards qualify as MIME character sets (including 8859-1 and JIS)
because they contain similar language. All of this will be covered in detail
in our reply. This was a good point to bring up because the terminology in
character set standards in general has been a little fuzzy; I hope our
forthcoming response will help to clarify things a little.


For more information on why ISO 10646/UNICODE is no good and how
can it be improved, see:

       "Character Encoding Method for Internationalized Plain
       Text Processing", Proceedings of 8th International Joint
       Workshop on Computer Communications, Masataka OHTA,
       Dec. 1993.

electric copy is available from me.


I would like to point out that the merits of 10646 or Unicode for any
particular purpose are not the subject of this document review; there are
other forums in which to have that discussion. The purpose of this document
review is to discuss a proposal for encoding ISO 10646/Unicode within MIME.
ISO 10646 is an international standard, and has been or is in the process of
being adopted as national standards in many countries; it is also starting to
see commercial adoption and use. Therefore, it needs to be dealt with
regardless of other issues.

Having said that, I would very much like to receive an electronic copy of your
document on encoding for internationalized plain text processing; please send
it to the address below.

David Goldsmith
david_goldsmith(_at_)taligent(_dot_)com




<Prev in Thread] Current Thread [Next in Thread>