ietf-822
[Top] [All Lists]

Re: 10646, and all that

1993-03-13 03:02:02
The DIS does not say that the correnponding CJK characters are
the same single character. Instead, it says that the same code point
is assigned to the different "graphic symbols".

Can you cite some text from the DIS on which you base this claim?

Sure.

In section 25 of the DIS, it is written that:

        Any entry in any of the G, T, J, or K columns in-
        cludes a sample graphic symbol from the source
        character set standard, together with its coded
        representation in that standard.

As long as you must give visual representation of characters, you
must give "graphics symbol"s. Isn't text in MIME visible?

Note that, while JIS allows font variation, that's all JIS allows.
JIS does not Unify Latin/Greek/Cyrillic 'A'. Japanese society does
not allow incorrect "graphic symbol". In school, correct "graphic
symbol"s are taught. Japanese goverment does not accept forms if it
is written in incorrect "graphic symbol"s. Japanese printers does not
use Chinese Han for Japanese.

After you said something similar, earlier this week, I asked on
the Unicode and ISO10646 mailing lists, and was assured that
ISO-10646 retains Unicode's notion that one code point is exactly
equivalent to one (possibly unified) character, and that
nationalized ideographs are treated as glyph variants.

Strange. Why can't you cite some text, then?

Because separation problem is ISO 10646 specific and not a general issue of
charset, it is absurd to introduce a new concept.

The issue is not ISO-10646-specific.  Examples have been
presented for which language information would be useful
regardless of the character set.

So far, only proposed usefulness of the separate language information
other than displaying of ISO 10646 is for spell checking, for which I
have already shown that language informaiton is useless.

So, do you want to introduce another confusion?

...we are never going to be able to eliminate all possibility
for confusion.  ("If a truly idiot-proof system is ever devised,
Nature will spontaneously evolve a higher grade of idiot which is
able to subvert it.")

So, keep it simple, stupid.

A body-scope language tag may introduce
some potential for confusion, but it replaces the more confusing
and less workable notion of trying to encode language matrices in
the character set name.

How can you say it more confusing and less workable?

We already have a definition of "charset" which is completely workable
and not confusing.

According to the definition, ISO 10646 needs, to be a "charset", some
profiling.

I have shown a concret, non-confusing and workable example of such a
profiling.

That ISO 10646 might need some more profiling for Greek/Coptic is a
ISO 10646 specific problem, not MIME's problem.

On the other hand,

I have never seen any workable example of the use of separate
language information.

I have never seen workable definition of separate language information.

I have never seen workable purpose of separate language information.

I have never seen workable definition on how "T" (for taiwan)
variation is named as a distinct language.

So far, people have thier own idea on what language information is
without giving precise description of it.

                                        Masataka Ohta

<Prev in Thread] Current Thread [Next in Thread>