perl-unicode

Re: unicode -> &# notation

2002-05-18 13:03:37
On Thu May 16 22:58:18 2002, Dan Jacobson wrote:
How do I turn unicode into &# notation?  At present I do
echo 'some unicode'|perl -wlne \
's/(...)/"&#".unpack("U",$1).";"/ge;print'
But I bet the pros do something smarter.

Well, actually that's quite nice as it is, although a perl 5.8 user
would have the joy of using this:

    echo 'some unicode' | piconv -f utf8 -t latin1 -C 512

For that, you may wish to grab the perl 5.8-RC1 when it's released
next week or so, which contains, among other nifty things, full
unicode and legacy conversion support. 

You could also use

    rsync -avz rsync://ftp.linux.activestate.com/perl-current/ perl58

to check out the latest source tree. See the file INSTALL for how to
compile/install it from scratch.

With perl 5.8, you can do things like:

    use Encode;
    my $big5    = '這是?(_at_)些 Big5 文字';
    my $unicode = decode('big5', $big5);
    print encode('latin1', $unicode, Encode::FB_HTMLCREF);

which prints:

    這是一些 Big5 文字

Hence, your example could be turned into

    perl -MEncode -wlne 'print encode('latin1', decode('utf8', $_), \
                         Encode::FB_HTMLCREF);'                      

or as a simple filter script:

    #!/usr/bin/env perl
    use open IN => 'utf8'; use Encode;                              
    print encode('latin1', $_, Encode::FB_HTMLCREF) for <>;

which is semantically equivalent of:

    echo 'some unicode' | piconv -f utf8 -t latin1 -C 512

Hope that helps,

/Autrijus/

(CC'ed to perl-unicode. Personally, I wonder what piconv invocation should
I use if I'd want *all* characters to fallback into CHECK? Can we have a
'-t null'? :-))

Attachment: pgplKDuYhaUyW.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>
  • Re: unicode -> &# notation, Autrijus Tang <=