perl-unicode

Re: unicode on windows

2003-11-21 18:30:07
I am using windows XP and Perl5.8.1

Thanks,
Neelima 
--- John Delacour <JD(_at_)BD8(_dot_)COM> wrote:
At 12:01 pm -0800 21/11/03, Neelima Bandla wrote:
 yes I got the same thing as well.

 徉房徉房

That's not what I got, but the Kanji.  You may have
seen that in my 
email because your mailer (especially if you are
using Web mail) 
cannot interpret and display UTF-8.  That's why I
stressed at the 
beginning of the message that the charset in the
headers was UTF-8. 
In other words the resulting file name here was four
Kanji.

If I had time, I'd try what your doing on Windows
but I only have NT4 
and I'm not sure how it copes with Unicode file
names.


 my $utf8_str1 =  pack("U*", @array);
 my $utf16_str1 = encode("UTF-16" , $utf8_str1);
 open(FD, ">$filepath\$utf16_str1") or die("$!");
 close FD;

 Now this is what I have done,since windows
encoding is
 in UTF-16, I used the encode function to convert
into
 UTF-16, Now it is refusing to create the file
with
 invalid argument

Here again I'm not sure what is required and whether
you need to 
start with the Byte Order Mark U+FEFF or U+FFEE. 
Someone who uses 
Windows 2000+ will need to advise you here.  You did
not mention what 
system you're using or what version of Perl.

If I run this script, which I find a more convenient
way of doing the 
same thing if you're starting with raw hex codes, I
get the string as 
UTF-16 with the BOM U+FEFF and that's fine in my
envornment:

use strict ;
use Encode ;
my ($l, $r, $text) = ( "\\x{",  "}") ;
my @x = qw(

5f89 623f 5f89 20 623f

);
for (@x) {
      $text .= "$l$_$r"
}
my $UTF8 =  eval qq~"$text"~ ;
my $UTF16 = encode("UTF-16" , $UTF8) ;
print $UTF16

On the other hand, if you're starting with raw hex
codes you don't 
need to use Encode at all, just print the BOM
followed by the two 
bytes separately.  If you run these two scripts
together you will get 
your four Kanji printed twice, but the second method
is far simpler 
and does not need the Encode library; it is
therefore also faster. 
As to how your operating system deals with the data,
I'm not 
qualified to comment but I'm sure the majority of
people on the list 
are.

What's wrong seems not to be your output but the
interaction between 
that output and the environment you're working in.

use strict ;
use Encode ;
#### LONG METHOD ####
my ($l, $r, $text) = ( "\\x{",  "}") ;
my @x = qw(

5f89 623f 5f89 20 623f

);
for (@x) {$text .= "$l$_$r"}
my $UTF8 =  eval qq~"$text"~ ;
my $UTF16 = encode("UTF-16" , $UTF8) ;
print $UTF16 . "$/$/" ;

###SHORT METHOD####
my ($l, $r, $text) = ( "\\x{",  "}") ;
my @x = qw(

FE FF 5f 89 62 3f 5f 89 20 62 3f

);
for (@x) {$text .= "$l$_$r"}
print eval qq~"$text"~ . "$/$/"

JD










__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

<Prev in Thread] Current Thread [Next in Thread>