perl-unicode

Re: converting Japanese chars into their Unicode values using 5.8's Encode

2002-09-19 06:30:05
Robert Allerstorfer <roal(_at_)anet(_dot_)at> writes:
Hello,

I want to convert source code written in the Japanese shift_jis
character set, into their Unicode numbers. For instance, "ŒŸ" should
result in "U+691C" (which is 26908 in decimal). I tried using the
Encode module of Perl 5.8 with something like this:

       use Encode::JP;
       my $string = "ŒŸ";
       Encode::from_to($string, "shiftjis", "utf8");
       my $ord = join("\n", unpack('U*', $string));
       print "$string\n$ord";

from_to does what it says. In that case you took shiftjis decoded
it to Unicode then re-encoded as UTF-8 octets.

What you might have meant was to get Unicode rather than the re-encoded form: 

        use Encode::JP;
        my $string = "ŒŸ";
        Encode::from_to($string, "shiftjis", "Unicode");
        binmode STDOUT,':utf8';
        print length($string)," chars '$string'\n";
        my $ord = join("\n", map( ord($_),split(//,$string)));
        print "$ord";





But, this gives a 3-character string "怜" (with the decimal values
230, 164 and 156). Could anyone please point me to the right direction
on how to get the decimal number 26908 instead?

Thanks in advance.
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/