perl-unicode
|
Re: About HTML unicode2004-12-02 03:30:05He Zhiqiang wrote: can easily convert via Javascript function escape(), but i wonder that is there some method or function or modules can do the same job?? If i can do it, then in one html page, i can display ont only chinese, but also japanese, korea etc... This is something like HTML unicode, am i right? The ord() function can't do the job because it return the incorrect decimal value. perhaps i do not describe To call it "HTML unicode" seems to be wrong, regularly it had better to do as "Numeric character references", I think. <http://www.w3.org/TR/html4/charset.html#entities> To use numeric character references is not the only way to display multi-lingual text in a html document. Actually, I use 'raw' UTF-8 characters in some html documents. For that, I edit the source file of a html with a text editor which can handle UTF-8 encoding. Please browse the sample.html which is attached with this mail. Not only to view with browser but also to do the source of the file. You may learn more about Unicode and HTML. About Unicode: <http://www.unicode.org/standard/WhatIsUnicode.html> About HTML: <http://www.w3.org/MarkUp/> BTW, when you use numeric character references method, there is no need to look around any modules. Only to use "unpack('U*', $string)" function is enough to do. Please inspect and estimate my sample code which is attached as sample.pl. -- Masanori HATA <lovewing(_at_)dream(_dot_)big(_dot_)or(_dot_)jp> He's always with us! Browse this html with Unicode (UTF-8) encoding. Using raw UTF-8 data:
Using numeric character references (each data is encoded with ASCII itself):
#!/usr/local/bin/perl -w use 5.008; use strict; use warnings; use utf8; my %raw = ( 'English' => 'News', 'French' => 'Actualit辿s', 'Chinese' => '\xE6\x96育\x97\xBB', 'Japanese' => '\xE6\x96域\x81\x9E', 'Hangul' => '\xEB\x89伎\x8A\xA4', ); my %numeric_ref; foreach my $lang (keys %raw) { my @numbers = unpack('U*', $raw{$lang}); $numeric_ref{$lang} = join ';&#', @numbers; $numeric_ref{$lang} = '&#' . $numeric_ref{$lang} . ';'; } print "<ul>\n"; foreach my $lang (keys %numeric_ref) { print "<li>$numeric_ref{$lang} in $lang</li>\n"; } print "</ul>\n"; __END__
|
|