perl-unicode

Re: Encoding to entity

2003-04-25 06:30:05

On 24 Apr 2003 15:25:03 -0000
dine005(_at_)yahoo(_dot_)co(_dot_)in (JD) wrote:

Hi,

I need to convert Encoding to entity using Perl. (xml
file)

I have following Unicode and entity names (937 entity, 
text file format).
\x{0177}   ŷ
\x{0176}   Ŷ
\x{00bd}   ½


Here is a sample script.
I hope it helps.

SADAHIRO Tomoyuki



#!perl 5.8.0 or later

use strict;
use warnings;

# creates hash tables.
our (%ent2chr, %chr2ent);
while (<DATA>) {
   next if /^\s*$/; # skip null or only-space lines

   my($hex, $ent) = split;

   # removes weird characters like commas, paren.s, braces, etc.
   $hex =~ tr/0-9A-Fa-f//cd;
   $ent =~ tr/&;0-9A-Za-z_//cd;

   my $chr = pack 'U', hex $hex;
   $ent2chr{$ent} = $chr;
   $chr2ent{$chr} = $ent;
}

# decodes entities
sub ent2str {
    my $str = shift;
    $str =~ s/(&[A-Za-z0-9_]+;)/exists $ent2chr{$1} ? $ent2chr{$1} : ''/egs;
    return $str;
}

# encodes entities
sub str2ent {
    my $str = shift;
    $str =~ s/(.)/exists $chr2ent{$1} ? $chr2ent{$1} : $1/egs;
    return $str;
}

# tiny tests
my $string = "Perl \x{177} \x{bd} \x{176}\x{bd}";
my $entity = "Perl &ycirc; &half; &Ycirc;&half;";

print $entity eq str2ent($string) ? "ok\n" : "not ok\n";
print $string eq ent2str($entity) ? "ok\n" : "not ok\n";
printf "%s\n", str2ent("\x{177}\x{176}\x{177}");

# defines entities. add them as many as you want.
# hexadecimal  entity
__DATA__
0177   &ycirc;
0176   &Ycirc;
00bd   &half;

<Prev in Thread] Current Thread [Next in Thread>