Having dug into this more on Unix, I can see that the aliasing mechanism
helps fill in most holes on which encodings to use for which local
codepages. But I have also come to the realization that Perl is not using
the underlying system code pages but is relying on its own encoding
objects to handle conversions. Since only a small set of encoding objects
are available by default this would mean that I would need to load up
additional Perl CPAN modules to get additional language encodings,
otherwise my code wouldn't be able to run much outside of ASCII and
English environments. Windows seemed to work ok with Simplified Chinese
using the Encode package but maybe the Windows implementation does use the
underlying system codepages somehow ?
So am I correct that I would need to load up additional encodings and I
couldn't count on Perl to access the wide range of available system
encodings otherwise ? I just need to confirm that I am not
misunderstanding something here.
Thanks very much,
Dave Schlegel
Nicholas Clark <nick(_at_)ccl4(_dot_)org>
Sent by: Nicholas Clark <nick(_at_)flirble(_dot_)org>
11/09/2005 10:11 AM
To
David Schlegel/Lexington/IBM(_at_)IBMUS
cc
David Graff <graff(_at_)ldc(_dot_)upenn(_dot_)edu>,
perl-unicode(_at_)perl(_dot_)org
Subject
Re: Converting between UTF8 and local codepage without specifying local
codepage
On Wed, Nov 09, 2005 at 10:02:31AM -0500, David Schlegel wrote:
That is helpful information. I have been spending time to determine the
local page by other means but have consistently been challenged that
this
is the wrong approach and that Perl must know somehow. Getting a
definitive answer is almost as helpful as getting a better answer.
Based on what you are saying, there is no way to ask Perl what the
"local
codepage" is and hence there can be no variant of "Encode" which can be
told to convert from "local codepage" to UTF8 without having to provide
the "local codepage" value explicitly.
Yes. A good summary of the situation.
Is I18N::Langinfo(CODESET()) the best way to determine the local
codepage
for Unix ? Windows seems to reliably include the codepage number in the
locale but Unix is all over the map.
I don't know. I have little to no experience of doing conversion of real
data, certainly for data outside of ISO-8859-1 and UTF-8, and I've never
used
I18N::Langinfo. I hope that someone else on this list can give a decent
answer.
Nicholas Clark