perl-unicode

Re: \uXXXX && &#xxxx

2004-12-08 08:30:19

smkelly(_at_)filenet(_dot_)com said:
-i need to covert the strings to 2 Unicode formats ,
on like this \ua9e0 for each character
on like this &#a9e0 for each character 

I think in the latter case, you might really want "ꧠ" (decimal 
number, terminated with semi-colon), if your intention is to produce HTML 
numeric entities for unicode characters.

One basic approach (assuming $_ contains a utf8 string) is:

  # convert non-ascii to "\uHHHH":
  s/([^[:ascii:]])/sprintf("\\u%04x",ord($1))/eg;

  # convert non-ascii to "&#nnnn":
  s/([^[:ascii:]])/sprintf("&#%d;",ord($1))/eg;

and similarly for other variants.  Look at the section on "POSIX character 
class syntax" regarding the "[:ascii:]" expression.

        David Graff


<Prev in Thread] Current Thread [Next in Thread>