ietf-822
[Top] [All Lists]

Re: 10646, and all that

1993-03-08 02:06:09
I believe that many of us recognize the problems you are pointing out
and are trying to work with you in solving them.  At least some of us
believe that the problems are complex and that there is not a single
simple solution that meets all needs.

I don't think the problem is complex. My solution is simple enough,
isn't it?

I'm afraid that some of you are making the simple problem complex by
overgeneralization.

   The cited definition is very old.

But, ISO 646 is disambiguated with it, even now.

No strong preference.  What I was really arguing for was separating the
language info from the char set name.

Why do you think the separation is necessary?

Because I'm seeking a position that is technically reasonable, symmetric
across languages, and that people can deal with.

What?

        symmmetric across languages?

what is it? Isn't it a totally now concept?

And it is interesting that the ability to designate
language even when it is not needed to clarify a character set may
leverage other useful things.

That's overgeneralization.

As I have said before, we would not need to do any of this if 10646 were
really adequate to the role to which we would like to assign it.  It
isn't.  If you don't like that, take it up with ISO.

It is the intended design goal of Unicode/ISO10646 to provide language
information outside of Unicode/ISO10646. So, why don't we do so?

And, as we have 
discussed in private, while I understand that Japan voted against 10646
DIS-2 at the JTC1 level, I also understand that, had Japan felt very
strongly about this and been able to find a single additional JTC1
P-member to agree, 10646 could easily have been buried in ISO
procedures, probably into the next century.    It is consequently
rational to deduce the absence of a strong majority in the Japanese
standards community that the unification issue is *that* important, all
of the time.

You mean, there is no strong majority in Japan because Japan failed
to force another non-Japanese P-member to agree?

While your reasoning might be convincing in the political world of
ISO, it, at least, has nothing to do with the current issue on
profiling of 10646. So, could you say it outside of IETF?

Conversely,
there are clearly situations in which unified Han are interpretable from
context, however un-aesthetic or un-linguistic that might be.

Just as unified variants of ISO 646 are interpretable from context,
unified Han are interpretable.

The
reality is that the issue isn't a binary construction like "need", but a
scale from "harmless but probably not worth the trouble" to "required
for proper interpretation by many users".

The issue is "correctness".

No Japanese script is written in Chinese Han. Thus, it is incorrect to
write Japanses script in Chinese Han.

In the case of ISO 646, we have assigned different charset names to
each national variant.
    And deprecated their use.   But these are national variants
recognized by ISO, and national variants in which the character
descriptions and names drawn from the repertoire are different.

CJK variance of 10646 is also recognized by ISO. See DIS 10646-1.2
and you will find each variant of CJK Han characters are listed,
that's why 10646 became so voluminous.

ISO recognizes and documented that CJK characters are different.

    As others have pointed out, one often benefits from language
information even if there are no structural ambiguities about the
character encoding.

That is an entirely separate issue.

Moreover, the truly multilingual character encoding won't need:
    Content-language:
header at all. So, I object to introduce the to-be-obsoleted header.

Tuples as complex as {country,language, character
set, character encoding} are common in linguistic and textual analysis
work.

That's not common for plain text.

    You have convinced me (not hard, I was convinced by mid-1991) and
much of the rest of the WG that 10646 isn't a "truely multilingual
character encoding".   But the choices are to provide sufficient
supplemental information, or to just decide to not use 10646 because it
is inadequate and wait for something better to come along.

So, let's use it with enouugh profiling information. But, please don't
try to make the issue unnecessarily complex.

                                                Masataka Ohta

<Prev in Thread] Current Thread [Next in Thread>