ietf-822
[Top] [All Lists]

Re: restrictions when defining charsets

1993-02-04 20:03:41
"charset" should provide all the profiling information to uniquely
map a byte stream to glyphs.

Thus, bare Unicode, which can't map some Devanagari and some Han correctly,
can't be a "charset".

Actually, this is true of a wide variety of things that you might think
are charsets.

One of them is ASCII!

I know there once was overriding of functionality in character encoding.
So, it might be necessary to define an instance of ASCII on the internet,
if there is any confusion.

The result is a standard which is vague enough
about the appearance of the glyphs that it is legitimate to print code 41
(exclamation point) as or-bar and code 94 (circumflex) as not-sign.

I'm afraid that RFC1345 defines ASCII code point 41 (octal) as an
exclamation mark and 94 (decimal) circumflex accent.

It is madness to interpret the definition of "charset" so narrowly that
the well-understood ASCII character set would not qualify.

It is well-understood, though it might have been misunderstanding, by most
of the people that code point 41 of ASCII is an exclamation mark and
definitely not or-bar.

I'm not up on the subtleties of the Devanagari/Han
problems with Unicode, but I strongly suspect that they qualify as bugs,
which we can legitimately overlook, rather than gross differences of
intent, which we can't. 

Why, do you think, Japan vote NO to DIS 10646-1.2?

The "BUG" was pointed out repeatedly and, still, was not fixed because
the committee did not considered it bug.

To the committee, it is a specificaion.

Unicode is *meant* to be a unique mapping, and
comes very close to being one.

I'm afraid your statement is based on observations of European characters
only.

I must repeat my warning. Don't be confused by the fact that Unicode
comes very close to be a unique mapping between code points and
European characters.

If you can't understand Devanagari/Han problem, see documents of Unicode.

As I quoted in earlier posting, it is explicitely stated that it is not
meant to be so.

                                                        Masataka Ohta