Re: 10646, and all that

  32 bit set of ISO 10646 is identical to 16 bit one.


well, not ===, it does contain the 16 bit version, but it also is twice as
large, and will be harder to get manufacturors and users to accept.
Applications which use 16 bit characters are already perceived by users as being
slower, the comercial implications of this are not ignorable.  Also, a
quadrupleing (8->32) of the typical file size is much more noticible than a
simple doubling, which has similar, tho lesser ramifications.

  > Perhaps we could use the
  > extra capacity to support extraterestrial linguistics
  :-o
  
  If we can use the extra capacity arbitrary, we can. I
  have even designed a one. But it is not ISO, of course.
  You can't call it ISO 10646.


sorry, I meant the above as a joke, and also as a mild complaint that the excess
capacity of a 32 bit wide character set would be looking *very* far into the
future for further uses.

Please answer the following:

  -Is it unreasonable to expect a user to recognize that the 
  software has miss-rendered some characters as Han, which 
  should have been Kanji?

  
  It depends.


Please elaborate at enough length so that we can understand your position more
clearly.  The psychology of this point is important.

  -Has anyone attempted to study just how often
  this sort of error would occur in typical mail?

  
  Now? ISO 10646 is not used yet. So, how can anyone study
  it?


The same way one studys anything that is new, implement it to the degree
necessary for the study, and then expose some people to trial data while
observing their reaction.  On the Macintosh, one would provide 10646 mapping for
cjk and roman within a simple program which presents text files viewed thru
10646, on a machine using system 7.1 set up with chinese, japanese, and korean
script's, then allow subjects to view poly-lingual mail so that their reactions
could be assesed.

  What? "expect a user to see the error"? Haven't you
  misordered words?
  
      A user expect to see no error.
  
  That's the reasonable behaviour of users.


I cant tell from your words if 
  - you are complaining that my english confuses you, 

or if 
  - you are trying to say that you think all users expect perfect 
    performance from all software.

Please remember that I am suggesting a solution based on imperfect information,
and thus we are discussing the users reaction to necessarily inaccurate
software, in order to assess the need for more perfect information so that the
necessity for imperfect software is removed.

Many computer systems rely on imperfect data to make predictions, Optical
character recognition, weather forecasting, game playing programs are all
examples where knowledgable users expect errors.

I admit that mail is not normally error prone, but modem transmission of mail
certainly is, and gateway altered mail is another source of problems which I
would expect to see in mail, and I cant think of any reason why japanese mail
users wouldnt experience such problems, but I think your observations are of
interest to all of us, so I really would like a serious answer to my Q.

  Could you please, please, understand that that's exactly
  what I have proposed with "charset"?


I agree with you that there are technical problems with implementing 10646.  I
appologize for not having taken the time to read all the voluminous traffic that
the debate over the precise meaning of "charset" has provoked, and so I can
neither agree with you nor disagree with you, instead I will state what I
understand the term to mean.

Charset - an arbitrarily ordered numerical mapping of a collection of
conventionalized symbols usefull in written comunication.

Please note that by "conventionalized" I am refering to such folding as allows
us to represent the various forms of '4' as having a common semantic.  While a
standard might limit itself to an illustration of just one of the varietys, all
should be considered as equally valid, absent context.

Where an unfolding becomes necessary, then additional information, potentially
beyond the specification of the charset (as standardized) would be necessary.

I have been led to beleive that japanese are taught the shape of their alphabets
in an intensive and rigorous manner, moreso than is required of Occidentals,
this is natural, given the complexity of the typical Kanji, precision would aid
in recognition.  

--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>