perl-unicode

Re: Converting between UTF8 and local codepage without specifying local codepage

2005-11-08 07:31:32

dschlege(_at_)us(_dot_)ibm(_dot_)com said:
Is there someway to convert from "whatever" the local codepage is to utf8
and back again ?  

The Encode::encode and decode routines require passing a specific
codepage to do the conversion but finding out what the "local codepage"
is is very tricky across different platforms, particularly UNIX where it
is hard to determine.  

Have you looked at the "perllocale" man page?  It's not clear to me that
figuring out the "local codepage" (i.e. the "locale") is particularly hard
on unix systems -- that's what the POSIX "locale" protocol is for.  (I 
don't know how you would figure it out on MS-Windows systems, but that's 
more a matter of me being blissfully ignorant of MS software generally.)

If you're dealing with data of unknown origin, and it's in some clearly 
non-ASCII, non-Unicode encoding, then being able to detect its character 
set is a speculative matter, especially for text in languages that use 
single-byte encodings.

The "Encode::Guess" module can help in detecting any of the unicode
encodings and most of the multi-byte non-unicode sets (i.e. the legacy code
pages for Chinese, Japanese and Korean), but it can't help much when it
comes to correctly detecting, say, ISO Cyrillic vs. ISO Greek (vs. Thai vs.
Arabic ...), let alone "Latin1" vs. "Latin2".

        David Graff