perl-unicode

converting between utf8 and bytes

2000-07-01 07:48:35
I have found out how to create a utf8 string: insert something with a code
255 (a BOM should do it) and then strip it off later. Hacky, but works.

But how do I change the way a string is interpretted?

use utf8;

# other code

sub pretty
{
    my ($str) = @_;

#    $str =~ tr///CC;    # This crashes Perl 5.6.0 (ActivePerl)
#    use bytes;          # This does nothing
    $str =~ s/([\xc0-\xff][\x80-\xbf]+)/'\x{'.sprintf("%04x", unpack("U", 
$1)).'}'/oge;
    $str;
}

$str is interpretted as UTF8 (SvUTF8 is set).

Any suggestions?

And a follow-up question:

How do I make a UTF8 string containing codes 127<x<256 without having to insert 
a BOM in the front and then strip it off?

Martin Hosken

PS. Apologies for the vague previous question.

<Prev in Thread] Current Thread [Next in Thread>