Re: converting Japanese chars into their Unicode values using 5.8's E

Robert Allerstorfer <roal(_at_)anet(_dot_)at> writes:

Hello,

I want to convert source code written in the Japanese shift_jis
character set, into their Unicode numbers. For instance, "ŒŸ" should
result in "U+691C" (which is 26908 in decimal). I tried using the
Encode module of Perl 5.8 with something like this:

       use Encode::JP;
       my $string = "ŒŸ";
       Encode::from_to($string, "shiftjis", "utf8");
       my $ord = join("\n", unpack('U*', $string));
       print "$string\n$ord";


from_to does what it says. In that case you took shiftjis decoded
it to Unicode then re-encoded as UTF-8 octets.

What you might have meant was to get Unicode rather than the re-encoded form: 

        use Encode::JP;
        my $string = "ŒŸ";
        Encode::from_to($string, "shiftjis", "Unicode");
        binmode STDOUT,':utf8';
        print length($string)," chars '$string'\n";
        my $ord = join("\n", map( ord($_),split(//,$string)));
        print "$ord";


But, this gives a 3-character string "æ€œ" (with the decimal values
230, 164 and 156). Could anyone please point me to the right direction
on how to get the decimal number 26908 instead?

Thanks in advance.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

converting Japanese chars into their Unicode values using 5.8's Encode, Robert Allerstorfer

Next by Date:

Re[2]: converting Japanese chars into their Unicode values using 5.8's Encode, Robert Allerstorfer

Previous by Thread:

converting Japanese chars into their Unicode values using 5.8's Encode, Robert Allerstorfer

Next by Thread:

Re[2]: converting Japanese chars into their Unicode values using 5.8's Encode, Robert Allerstorfer

Indexes:

[Date] [Thread] [Top] [All Lists]

Re: converting Japanese chars into their Unicode values using 5.8's Encode