perl-unicode

Re: Japanese text search problem

2001-08-07 08:45:59
on 01.8.8 1:54 AM, Andreas Marcel Riechert at riechert(_at_)pobox(_dot_)com 
wrote:
Why should Unicode be the "de facto standard for internal
representation"? ...or "internal standard" to whom, or what? In perl
that could happen, but as a general statement I cannot agree, but
anyway I would like to hear your reasoning.

  I am not a big fan of Unicode but even I cannot ignore the fact that two
most prevalent OSes, Windows and MacOS, internally uses Unicode, I think it
was fair to say so (I was even careful enough to say "de facto").

E.g. if I was going to write one of the bigger Kanwa-Jiten
(Chinese/Japanese Character Dictionary) Database I would rather
use TAD (TRON-encoding) than its compititor Unicode.
For much other stuff I am quit happy with euc-jp.

  We all know only too well the best not always prevail.  TAD wins
engineering beauty contest but too bad it didn't get enough support from
those who code to make their ends meet.
  As for EUC-JP, yes, EUC-JP is the internal code of Jcode (because Jcode
was born pre-5.6 days and it needs to continue to work on 5.0.x).  I would
even say EUC-JP is the best so long as your piece of code is 'mere'
bilingual.

Maybe I am old fashioned, but I still use euc-jp or sjis for
most of the processing/ output I do. And I am quit happy with
them. 

  I confess; so do I for most of the times.  The biggest problem is that
there are still too few tools to edit utf8 files.

Perl 5.0.x and below can handle EUC faily well but regex may fail.  If you
don't use regex, just replace utf8 with EUC in the recipe above.

Ken Lundes pdfs and book will help with the regex problem.

  I know.  The similar technique is also mentioned in "Perl Cookbook" and
"Mastering Regular Expressions" but these are too counter-intuitive.  Let
the camel chew whatever it can digest.

Dan the Developer of Jcode
Andreas Marcel the happy and thankfull  user of Jcode

  Thank you for using my humble code.

Dan the Man with Too Many Charsets to Deal with

<Prev in Thread] Current Thread [Next in Thread>