On Thu May 16 22:58:18 2002, Dan Jacobson wrote:
How do I turn unicode into &# notation? At present I do
echo 'some unicode'|perl -wlne \
's/(...)/"&#".unpack("U",$1).";"/ge;print'
But I bet the pros do something smarter.
Well, actually that's quite nice as it is, although a perl 5.8 user
would have the joy of using this:
echo 'some unicode' | piconv -f utf8 -t latin1 -C 512
For that, you may wish to grab the perl 5.8-RC1 when it's released
next week or so, which contains, among other nifty things, full
unicode and legacy conversion support.
You could also use
rsync -avz rsync://ftp.linux.activestate.com/perl-current/ perl58
to check out the latest source tree. See the file INSTALL for how to
compile/install it from scratch.
With perl 5.8, you can do things like:
use Encode;
my $big5 = '這是?(_at_)些 Big5 文字';
my $unicode = decode('big5', $big5);
print encode('latin1', $unicode, Encode::FB_HTMLCREF);
which prints:
這是一些 Big5 文字
Hence, your example could be turned into
perl -MEncode -wlne 'print encode('latin1', decode('utf8', $_), \
Encode::FB_HTMLCREF);'
or as a simple filter script:
#!/usr/bin/env perl
use open IN => 'utf8'; use Encode;
print encode('latin1', $_, Encode::FB_HTMLCREF) for <>;
which is semantically equivalent of:
echo 'some unicode' | piconv -f utf8 -t latin1 -C 512
Hope that helps,
/Autrijus/
(CC'ed to perl-unicode. Personally, I wonder what piconv invocation should
I use if I'd want *all* characters to fallback into CHECK? Can we have a
'-t null'? :-))
pgplKDuYhaUyW.pgp
Description: PGP signature