perl-unicode

Re: Converting between UTF8 and local codepage without specifying local codepage

2005-11-11 16:52:47
When I enter the command "Encode::encodings" or 
"Encode::encodings(":all")" I get back only a handful of encodings (ascii, 
ascii-ctrl, iso-8859-1, null, utf8).
In order to have more encodings I believe that you have to load additional 
perl modules that provide those encoding objects. This may be an issue of 
what has been built into your perl environment. 

On a Japanese Unix Solaris machine, I was able to find out that the local 
codeset was "PCK" based on the I18N::Langinfo information. There were no 
aliases defined for it and when I tried to use it it said it didn't have 
an encoding for it. 





David Graff <graff(_at_)ldc(_dot_)upenn(_dot_)edu> 
11/11/2005 10:42 AM

To
David Schlegel/Lexington/IBM(_at_)IBMUS
cc
perl-unicode(_at_)perl(_dot_)org
Subject
Re: Converting between UTF8 and local codepage without specifying local 
codepage







dschlege(_at_)us(_dot_)ibm(_dot_)com said:
... I have also come to the realization that Perl is not using the
underlying system code pages but is relying on its own encoding objects
to handle conversions. Since only a small set of encoding objects are
available by default this would mean that I would need to load up
additional Perl CPAN modules to get additional language encodings,
otherwise my code wouldn't be able to run much outside of ASCII and
English environments. Windows seemed to work ok with Simplified Chinese
using the Encode package but maybe the Windows implementation does use
the underlying system codepages somehow ? 

I sorry, I'm not sure I understand what you mean by "only a small set of
encoding objects".  Regarding the Encode module and what it can handle in 
a
"default" installation, I see a total of 124 labels for supported
encodings, including:

 - 11 distinct labels for various unicode encodings, 
 - 2 relating to ascii
 - all the iso-8859's (1-16)
 - 38 different "cp\d+"
 - 2 each of "big5.*" "gb\d+"
 - 3 each of "euc-??", "jis\d+" and "koi8-."
 - "shiftjis" and 7bit-jis
 - a bunch of Mac codepages (some of which aren't really functional,
                              but that's a separate topic)
 - and more...

(As you probably know, the Encode man page tells how to get a complete 
list
of installed encodings.  Presumably, some are synonyms for others.)

Are you referring to something other than codepages/encodings when you 
mention "only a small set of encoding objects"?  Or are you saying that 
124 is only a small set?

So am I correct that I would need to load up additional encodings and I
couldn't count on Perl to access the wide range of available system
encodings otherwise ? I just need to confirm that I am not
misunderstanding something here.

If you could mention some specific items in the "wide range of available
system encodings" that do not show up within the Encode module's 
inventory,
that would help to clear things up.

                 Dave Graff