perl-unicode

Re: Translating a Latin-1 string to a UTF8 string in Perl 5.6.1

2001-12-11 18:45:08
On Tue, Dec 11, 2001 at 07:00:20PM -0600, Michael A. Grady wrote:
Now that Perl 5.6.1 has removed support for tr///CU, is there still
an easy way to take a latin-1 character string and convert it to
a UTF8 string? I need to do that for generating LDIF files to load
into an LDAP server. 

No need for fancyisms.  I think the below might work even in perl4...

#!/usr/bin/perl -sp

if ($r) {
    # UTF-8 to Latin-1
    s/([\xC0-\xDF])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
} else {
    # Latin-1 to UTF-8
    s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
}
 
I saw mention of using pack('U0',...), but I can't figure out how that
actually works. E.g. Given a variable $string with a value of 'Áine', I'd
like to get the corresponding string in utf8.

pack("U0U*", unpack("C*", $latin1here))

--
Michael A. Grady                             m-grady(_at_)uiuc(_dot_)edu
Senior Research Programmer                   http://ljordal.cso.uiuc.edu 
Computing & Communications Services Office   (217) 244-1253  phone
University of Illinois at Urbana-Champaign   (217) 265-5635  fax
Rm. 103, MC 680, 2212 Fox Drive, Suite C     Champaign, IL 61820

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>