perl-unicode

Re: Question about converting utf8 to ascii and char refs

2004-10-27 02:30:09

aaron(_at_)starwoodhotels(_dot_)com said:
I have a UTF-8 string which I want to output as ascii and have the UTF8
characters converted to numeric character references.

I tried using Encode with the FB_HTMLCREFS fail back option enabled,
but for the 2 byte UTF8 characters, 2 incorrect char refs were printed
out instead of the correct one. �

Try something like this (assuming that $_ contains the string, and has its 
utf8 flag set):

    s/([^[:ascii:]])/sprintf("&#%d;",ord($1))/eg;

For each utf8 character that is outside the ascii range, this replaces it 
with a decimal-based numeric character reference.

        Dave Graff