On Thu, 10 Jan 2002 19:50:10 +0900
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> wrote:
Bad news. It's gotten worse on the latest DEVEL14150. It completely
ignores 2byte chars. Here is the detailed research.
I used MacOS 10.1.2 for 5.7.2 and FreeBSD 4.5-stable for DEVEL14150
(5.7.2 didn't just compile on FreeBSD; I think it's a know fact).
# first let's see if conventional method works
perl -MJcode -ple '$_=jcode($_,'euc')->utf8' table.euc > table.utf8
# table.euc is a euc-jp encoded text that contains all ascii, JISX0201
# (aka Hankaku Kana) and JISX0208
Now comes Encode module of 5.7.2
# see the previous mail for classic.pl
../classic.pl -d table.euc camel572.utf8
../classic.pl -e table.utf8 camel572.euc
Voila! diff -u table.utf8 camel572.utf8 gives me an empty string! They
are completely identical. Bad news is that encoding back to euc is the
trash. Half way it would be it worked.
Now DEVEL14150. Decode worked fine like 5.7.2 but when you try to
encode from utf8 to euc-jp, perl croaks with;
euc-jp '[non-printable garbage]' does not map to UTF-8 at
/home/dankogai/perl/lib/5.7.2/i386-freebsd-multi-64int/Encode/Tcl.pm
line 228
I guess in that string SVf_UTF8 would be off.
This should be due to not using the UTF-8 layer.
(But "euc-jp .. does not map to UTF-8 " error message
must be shown on decoding to unicode.)
Please refer to the PerlIO manpage for detail;
we'd declair the stream takes unicode sequence
like this: binmode(FILEHANDLE, ":utf8");
or through open() function.
Bleadperl has * many many * docs on Unicode...
perluniiintro, perlunicode, lib/utf8, etc.
I'd be glad if this would help you,
http://homepage1.nifty.com/nomenclator/perl/unicode.htm
(in Japanese)
there is a brief on Perl's Unicode support including
a bit of comparison and differences
between that of Perl 5.7 and 5.6.
Now I am tempted to implement toplevel Encode myself....
Also, 5.7.2 and its variants appear pretty unstable. Let me see if
Encode itself can work on 5.6.1 as well (should be, it's under ext/
directory after all. A little tweak on compile scripte would be needed,
however).
Dan the Man with Too Many Charsets to Handle
Encode::Tcl should work on Perl 5.6 as it is pure-perl,
however it's very slow, as you pointed it out,
and therefore not very practical to use.
There is much room for improvement.
Regards,
SADAHIRO Tomoyuki
URL: http://homepage1.nifty.com/nomenclator/