Re: About HTML unicode

At 12:31 am +0800 3/12/04, He Zhiqiang wrote:

Now i encountered another problem, there are a few files containsnot only one charset but also two or more, for example, file1contains japanese and chinese, if i use open() to load the datainto memory, ord and length etc.. can't correctly work! Perhasp imiss something to encode or decode the data ?
code:
#!/usr/bin/perl -w
use utf8;
open(FD, "< file1");
while(<FD>) {
chomp;
print "length = ".length($_);
}
close FD;
----------
length() can not count the correct non-ASCII characters. :(

If the file is in UTF-8, then it may be in any number of _languages_but it uses only one character set -- Unicode. So far as I know "useutf8" is now redundant and ineffectual in Perl. You will get thecorrect character count (6 characters rather than 18 bytes) byopening the file handle as utf-8 as below.


no warnings;
my $f = "/tmp/cjk.txt";
my $text = "\x{56d8}\x{56d9}\x{56da}\x{56db}\x{56dc}\x{56dd}\n";
open F, ">$f";
print F $text; # writes $text to $f as UTF-8
close F;
open F, "<:utf8",  $f;
for (<F>) {
  chomp;
  print "$_  -  Length = " . length() . $/;
}

JD

<Prev in Thread]	Current Thread	[Next in Thread>
About HTML unicode, He Zhiqiang Re: About HTML unicode, Masanori HATA Re: About HTML unicode, He Zhiqiang Re: About HTML unicode, John Delacour <= Re: About HTML unicode, Ben Morrow Re: About HTML unicode, Masanori HATA Re: About HTML unicode, Gisle Aas Re: About HTML unicode, Ben Morrow