ietf-822
[Top] [All Lists]

10646, and all that

1993-03-01 19:47:01
Ohta san, 

Assuming that an arbitrary assigment of font could result in an unintended and
improper reading, I agree with your concern that 10646 is theoretically
inadaquate to completely specify the intended semantics of a senders message, in
that there is no internal mechanism for disambiguating the unified Han
characters.

Although this inadequacy can be largely compensated for by contextual
information provided by nearby hangul/katakana/hiragana, this solution is
dependant upon some degree of homogeneity of the character stream.  I forsee
especial difficulty with lists of names of persons from various countrys.
Unfortunatly, I am not literate enough in any of the asian languages to research
the difficultys in implementing this appraoch, so I must leave that defense of
10646 to others.

It would be hoped that the composing agent could be expected to pass along the
disambiguation information, perhaps as an in-line declaration.  This would
suggest a charset=cjk-tagged_10646, assuming we cant get the 10646 people to
address the problem.

I will say that if the 16 bit 10646 is not going to do the job, then what are we
to do---would you advocate a 32 bit character set?  Perhaps we could use the
extra capacity to support extraterestrial linguistics :-o

Please answer the following:

        -Is it unreasonable to expect a user to recognize that the 
        software has miss-rendered some characters as Han, which should 
        have been Kanji?  

        -Has anyone attempted to study just how often this sort of error 
        would occur in typical mail?

I would hope that you would agree with me that it is not unreasonable to expect
a uer to see the error, especially since it is possible to have all usage of the
chinese font rendered in a distinctive form (chosen by the user, perhaps a red
coloration).

I would also hope that some serious studies had been done.  As a user, I would
not like to have to constantly be fussing with this problem just to read my
mail, but if a little bit of fussing allowed me to read mixed language mail
where I couldnt before, then I might be willing to put up with that
inconveniance.  I suppose, to be truthfull, I would have to add that I would be
wondering what would be necessary to fix it for good.

Still, I dont see 10646 as an impossible-to-live-with thing, but maybe it will
need some additional mechanism for complete disambiguation of the folded
characters.

As to those who would argue the folding of the various forms of 'a', and '4', is
analogous to the folding of the variant Han characters, I am not ready to accept
that---the difference is that the european viewer is accustomed to seeing all of
the variations, and habitually perceives them as similar.  The typical asian
probably does not have the same experience level with Han/Kanji.., especially
when one considers the distortions already imposed by small fonts on a monitor.
Those of us who have the ability to alter the font we read our mail in probably
choose a font for ease of reading, when the machine picks one for you
arbitrarily, it is irritating.  When that happens at arbitrary points in a
lengthy document it could get downright agravating.

--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>


<Prev in Thread] Current Thread [Next in Thread>