perl-unicode

Re: Converting between UTF8 and local codepage without specifying local codepage

2005-11-11 16:52:02
Having dug into this more on Unix, I can see that the aliasing mechanism 
helps fill in most holes on which encodings to use for which local 
codepages. But I have also come to the realization that Perl is not using 
the underlying system code pages but is relying on its own encoding 
objects to handle conversions. Since only a small set of encoding objects 
are available by default this would mean that I would need to load up 
additional Perl CPAN modules to get additional language encodings, 
otherwise my code wouldn't be able to run much outside of ASCII and 
English environments. Windows seemed to work ok with Simplified Chinese 
using the Encode package but maybe the Windows implementation does use the 
underlying system codepages somehow ? 

So am I correct that I would need to load up additional encodings and I 
couldn't count on Perl to access the wide range of available system 
encodings otherwise ? I just need to confirm that I am not 
misunderstanding something here. 

Thanks very much,
Dave Schlegel




Nicholas Clark <nick(_at_)ccl4(_dot_)org> 
Sent by: Nicholas Clark <nick(_at_)flirble(_dot_)org>
11/09/2005 10:11 AM

To
David Schlegel/Lexington/IBM(_at_)IBMUS
cc
David Graff <graff(_at_)ldc(_dot_)upenn(_dot_)edu>, 
perl-unicode(_at_)perl(_dot_)org
Subject
Re: Converting between UTF8 and local codepage without specifying local 
codepage






On Wed, Nov 09, 2005 at 10:02:31AM -0500, David Schlegel wrote:
That is helpful information. I have been spending time to determine the 
local page by other means but have consistently been challenged that 
this 
is the wrong approach and that Perl must know somehow. Getting a 
definitive answer is almost as helpful as getting a better answer. 

Based on what you are saying, there is no way to ask Perl what the 
"local 
codepage" is and hence there can be no variant of "Encode" which can be 
told to convert from "local codepage" to UTF8 without having to provide 
the "local codepage" value explicitly. 

Yes. A good summary of the situation.

Is I18N::Langinfo(CODESET())  the best way to determine the local 
codepage 
for Unix ? Windows seems to reliably include the codepage number in the 
locale but Unix is all over the map.

I don't know. I have little to no experience of doing conversion of real
data, certainly for data outside of ISO-8859-1 and UTF-8, and I've never 
used
I18N::Langinfo. I hope that someone else on this list can give a decent
answer.

Nicholas Clark