On Tue, Dec 11, 2001 at 07:00:20PM -0600, Michael A. Grady wrote:
Now that Perl 5.6.1 has removed support for tr///CU, is there still
an easy way to take a latin-1 character string and convert it to
a UTF8 string? I need to do that for generating LDIF files to load
into an LDAP server.
No need for fancyisms. I think the below might work even in perl4...
#!/usr/bin/perl -sp
if ($r) {
# UTF-8 to Latin-1
s/([\xC0-\xDF])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
} else {
# Latin-1 to UTF-8
s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
}
I saw mention of using pack('U0',...), but I can't figure out how that
actually works. E.g. Given a variable $string with a value of 'Áine', I'd
like to get the corresponding string in utf8.
pack("U0U*", unpack("C*", $latin1here))
--
Michael A. Grady m-grady(_at_)uiuc(_dot_)edu
Senior Research Programmer http://ljordal.cso.uiuc.edu
Computing & Communications Services Office (217) 244-1253 phone
University of Illinois at Urbana-Champaign (217) 265-5635 fax
Rm. 103, MC 680, 2212 Fox Drive, Suite C Champaign, IL 61820
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen