On 24 Apr 2003 15:25:03 -0000
dine005(_at_)yahoo(_dot_)co(_dot_)in (JD) wrote:
Hi,
I need to convert Encoding to entity using Perl. (xml
file)
I have following Unicode and entity names (937 entity,
text file format).
\x{0177} ŷ
\x{0176} Ŷ
\x{00bd} ½
Here is a sample script.
I hope it helps.
SADAHIRO Tomoyuki
#!perl 5.8.0 or later
use strict;
use warnings;
# creates hash tables.
our (%ent2chr, %chr2ent);
while (<DATA>) {
next if /^\s*$/; # skip null or only-space lines
my($hex, $ent) = split;
# removes weird characters like commas, paren.s, braces, etc.
$hex =~ tr/0-9A-Fa-f//cd;
$ent =~ tr/&;0-9A-Za-z_//cd;
my $chr = pack 'U', hex $hex;
$ent2chr{$ent} = $chr;
$chr2ent{$chr} = $ent;
}
# decodes entities
sub ent2str {
my $str = shift;
$str =~ s/(&[A-Za-z0-9_]+;)/exists $ent2chr{$1} ? $ent2chr{$1} : ''/egs;
return $str;
}
# encodes entities
sub str2ent {
my $str = shift;
$str =~ s/(.)/exists $chr2ent{$1} ? $chr2ent{$1} : $1/egs;
return $str;
}
# tiny tests
my $string = "Perl \x{177} \x{bd} \x{176}\x{bd}";
my $entity = "Perl ŷ ½ Ŷ½";
print $entity eq str2ent($string) ? "ok\n" : "not ok\n";
print $string eq ent2str($entity) ? "ok\n" : "not ok\n";
printf "%s\n", str2ent("\x{177}\x{176}\x{177}");
# defines entities. add them as many as you want.
# hexadecimal entity
__DATA__
0177 ŷ
0176 Ŷ
00bd ½