perl-unicode

Re: About HTML unicode

2004-12-02 03:30:05
He Zhiqiang wrote:
can easily convert via Javascript function escape(), but i wonder that 
is there some method or function
or modules can do the same job?? If i can do it, then in one html page, 
i can display ont only chinese, but
also japanese, korea etc... This is something like HTML unicode, am i 
right?
The ord() function can't do the job because it return the incorrect 
decimal value. perhaps i do not describe

To call it "HTML unicode" seems to be wrong, regularly it had better to
do as "Numeric character references", I think.
<http://www.w3.org/TR/html4/charset.html#entities>

To use numeric character references is not the only way to display
multi-lingual text in a html document. Actually, I use 'raw' UTF-8
characters in some html documents. For that, I edit the source file of a
html with a text editor which can handle UTF-8 encoding.

Please browse the sample.html which is attached with this mail.
Not only to view with browser but also to do the source of the file.

You may learn more about Unicode and HTML.
About Unicode:
<http://www.unicode.org/standard/WhatIsUnicode.html>
About HTML:
<http://www.w3.org/MarkUp/>

BTW, when you use numeric character references method, there is no need
to look around any modules. Only to use "unpack('U*', $string)" function
 is enough to do.
Please inspect and estimate my sample code which is attached as sample.pl.

-- 
Masanori HATA
<lovewing(_at_)dream(_dot_)big(_dot_)or(_dot_)jp>
He's always with us!

Browse this html with Unicode (UTF-8) encoding.

Using raw UTF-8 data:

  • News in English
  • Actualit辿s in French
  • 育 in Simplified Chinese
  • 域 in Japanese
  • 伎 in Korean Hangul

Using numeric character references (each data is encoded with ASCII itself):

  • News in English
  • Actualités in French
  • 新闻 in Chinese
  • 新聞 in Japanese
  • 뉴스 in Hangul
#!/usr/local/bin/perl -w
use 5.008;
use strict;
use warnings;
use utf8;

my %raw = (
    'English'  => 'News',
    'French'   => 'Actualit辿s',
    'Chinese'  => '\xE6\x96育\x97\xBB',
    'Japanese' => '\xE6\x96域\x81\x9E',
    'Hangul'   => '\xEB\x89伎\x8A\xA4',
);

my %numeric_ref;
foreach my $lang (keys %raw) {
    my @numbers = unpack('U*', $raw{$lang});
    $numeric_ref{$lang} = join ';&#', @numbers;
    $numeric_ref{$lang} = '&#' . $numeric_ref{$lang} . ';';
}

print "<ul>\n";
foreach my $lang (keys %numeric_ref) {
    print "<li>$numeric_ref{$lang} in $lang</li>\n";
}
print "</ul>\n";
__END__
<Prev in Thread] Current Thread [Next in Thread>