data errors

I don't agree with Ohta on everything he says, but he is very right on
one thing: Han unification is a data error.  So-called ``identical''


I don't like the way you write ``identical''.  On my screen, the first
quotation mark looks like *two* characters, with each one looking more
like a *reversed* (i.e. mirror image) comma, as opposed to a *turned*
comma.  The opening quotation mark is supposed to be a *single*
character, with two *turned* high commas.

Yet, a character set issue that is of major importance to nearly a 1/3
of the world's population is treated lightly.


I don't like the way you write 1/3.  On my screen, it looks like a 1
with a diagonal line on its right, and a 3 further to the right.  The
proper way to write this is to have a small 1 with a horizontal line
underneath it, and a 3 underneath the line.

I'm very sorry, but if you are unable or unwilling to make things look
right on my screen, then I'm afraid I have to say that you have made a
data error.  Please refrain from making such mistakes in the future.

  1/2 :-)

   ^
   |
   +--------------- :-)

The point is that it is quite possible to *communicate* in email even
if all the bells and whistles of perfect rendering are not included.
You can say the same of Han unification.  I have seen a message that
was encoded entirely in ISO-2022-JP, and yet contained some Chinese
text.  The author was quoting part of the Taiwanese character set
standard, CNS 11643.  He admitted that some of the glyphs were not
exactly the same as those he saw on paper, but it was clear from this
exercise that he could *communicate* even though he was forced to use
rudimentary measures.  We have been forced to use rudimentary measures
in email for quite a while, and this situation will probably continue
for a while too.

That is not to say that we shouldn't *try* to improve the rendering of
email.  Far from it.  As everybody has noticed, people are actively
trying to get "rich text" and SGML into email.

I sincerely hope that it will be possible to get perfect rendering of
Han, ASCII, etc in email in the future.  In the meantime, we can make
incremental improvements (the *only* way to do it in networking) by
taking the relatively small step from plain text ASCII to plain text
10646 subsets.

Also, 10646/Unicode is like a bulldozer.  Instead of getting run over
by the bulldozer, we should jump on it and try to tweak some of the
controls on its dashboard.  And that is exactly what Masataka is
doing.  He is adding language tags to 10646.  Glenn Adams has done
that too.  There is no doubt that 10646 and Unicode need to be used in
conjunction with such extra information in order to be truely
satisfactorily rendered.

So let's quit fighting and work *together* for a change.  (But let's
do it on a different mailing list, OK Greg?)


Regards,

Erik