ietf-822
[Top] [All Lists]

Re: 10646, and all that

1993-03-04 20:59:24
  Now, are we, Japanese, being forced to use Unified Han
  because it is convenient for Europeans?

No, and it is not conveniance for europeans that we are discussing, but
conveniance for all readers of mail which is polylingual in nature. That
includes Chinese Japanese, Korean, American, European, Australian, Indochinese,
Israeli, Arabic, South American and African.  If we are to have any hope of
supporting the worlds languages it must be thru some superset of the existing
charset's (sa 10646 attempts to be).  No other approach allows the creation of a
generic UA.

=======

Masataka Ohta,

Please stop being paranoid, nobody here wants you or your country to loose its
culture, least of all me.  

   I enjoy watching SUMO, I have even tried it as an amateur. 
   I play IGO (4 kyu).  
   I very much enjoy Samurai movies.
   I use Japanese wood-working tools.
   I like Japanese traditional music (especially the koto).
   I like Japanese food.
   I enjoy Japanese comic books, TV shows and TV ads.
   I build models of the Japanese navy ca 1933.
   I drive a Japanese car.

I also admit that I like the foods of many other countrys, other foreign films
and music from most cultures, including my own.

=======

What I was discussing, and what Erik was replying to, was the problems relating
to an alternative solution which would dynamically choose *appropriate* display
fonts by means of some heuristic-to-be-determined so that a UA could cope with
raw 10646 streams.

It is my beleif that the only other alternative is some mark-up system, which
should have been part of the 10646 spec, but isnt, so we would have to design
it and somehow convince the world to support it.

For the heuristic-based solution I am presuming that the user has established
appropriate fonts for display of specific 10646 subgroups (ie, Helvetica for
ascii, Kyoto for S-JIS...), and that some means of mapping exists to/from 10646
and the user-selected fonts.  

This leaves us with the need for determining a C/J/K association for 10646 Han
characters.  I think that a reasonable start would be to partition the stream
for unambiguous font asignments, and to intially assign all Han to C. This pass
corectly marks J/K phonetic alphabet usage, which then brings us to pass 2,
which would change C Han to J/K when adjacent to a phonetic alphabet stream of
that country.  A more sophisticated heuristic would take further clues from
punctuation, and would try to establish L/R associativity for the rare
occaisions when J/K phonetics straddle a C Han portion.

The result would then be displayed to the user, who would be expected to edit it
so that it made sense.  Typical mail should resolve to one language.  Untypical
mail will tend to resolve to either C/J or C/K, and then the C might be best
displayed in red, or otherwise flagged as suspicious, as an aid to the user.
Rare mail will have both J and K, and may proove difficult to deal with.

The interactive editing required must be small/nil for the average mail message,
or the typical user will react negativly.

Which brings be back to my statement that this approach will need testing.  In
fact, I think it needs to be prooven empirically before it can be recomended.  I
also think that it needs to be considered before we look at other more intrusive
solutions.

pro - 10646.1 becomes feasible.
    - no mark-up effort required of the message composer.

con - feasibility is unclear.
    - user acceptance is unclear.
    - lack of 10646 support structures makes testing a pain.
    - no mark-up to take advantage of for other possible needs.
    - possible need for C/J/K grammer checker.

--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>


<Prev in Thread] Current Thread [Next in Thread>