perl-unicode

Re: tr// for conversion

1999-06-17 14:29:25
Martin_Hosken(_at_)sil(_dot_)org writes:
: Is there any way that we could use tr/// to do 8-bit to Unicode conversions
: simply? I am invisaging something like:
: 
: tr/[\x80-\x9f]/\x20ac\x..../U;
: 
: or the like whereby the lhs of the tr is considered in binary and the rhs in
: UTF-8.

Already been done, though what you want looks more like

    tr/\x80-\x9f/\x{20ac}\x..../CU;

: Likewise for reverse conversion you could have UTF8 on the lhs and 8-bit
: clean on the rhs.

Just use UC instead of CU.

: The only difficulty here is that you would want an extra code on the rhs
: to be used for the 'out of range' code (what happens when a code >256
: isn't matched and converted, you want a default character inserted rather
: than the thing deleted).

The tr/// operator already has a mechanism for defaults, in that it
replicates the last character of the rhs if it's too short.  Also,
the rule is that if a given character is specified more than once, the
first translation is used.  So

    tr/a\0-\x{10ffff}/bX/UC;

should translate a to b and every thing else to X.  It should even do
it fairly effiently, since chunks of table aren't allocated unless
needed.

Larry

<Prev in Thread] Current Thread [Next in Thread>