perl-unicode

Re: Resolving charset names with Encode

2004-10-24 11:30:09
Bjoern Hoehrmann <derhoermi(_at_)gmx(_dot_)net> writes:
Hi,

 What is currently the best way to resolve charset names to use them
with Encode.pm? I would have expected that e.g.

 Encode::decode('ebcdic-cp-us', '')

would just work but it does not appear to know that alias. Then I've
tried to use I18N::Charset as in

There are two parts to the problem:

1. The actual encoding map must exist.
2. There must be an alias fromn the name to the map.

Encode's charset list is largely based on the tables on the Unicode website,
with some additions from old its Tcl/Tk roots in rendering to various font 
encodings. 

EBCDIC encodings are probably a bit weak compared to say ICU (which as 
IBM is _the_ EBCDIC shop and originator of ICU if I remember correctly).

As with most open source projects work gets done by volunteers with an 
interest in or need for the function.

If you have authoritiative tables and names I am sure Dan would accept 
patches to add charsets.


 Encode::decode(I18N::Charset::enco_charset_name('ebcdic-cp-us'), '')

which also fails. Then I've tried something simpler using the cp037
alias

 Encode::decode('cp037', '')
 Encode::decode(I18N::Charset::enco_charset_name('cp037'), '')

which both fail, too. In order to use the encoding with Encode it seems
I have to use "CP37" which is not registered in the IANA registry... So
this does not seem to work very well. 

The Encode name was intended to follow this order:

1. What main users (native writers/speakers) of encoding call it.
2. MIME name
3. IANA name 
4. De-facto name
5. Some other name ? (But such an encoding seems obscure!)

But with the intent if having IANA and National Standard names as aliases.

It works better in other cases
such as

 Encode::decode('l1', '')                                   # fails
 Encode::decode(I18N::Charset::enco_charset_name('l1'), '') # works

Now I would have hoped there is a foo() in I18N::Charset that I could
use as in

 foreach my $name (I18N::Charset::foo)
 {
   my $alias = I18N::Charset::enco_charset_name();
   Encode::Alias::define_alias($name, "'$alias'") if defined $alias;
 }

that would make

 Encode::decode('l1', '');

work, but it seems that there is no such routine... What could be done
to improve this? Ideally I would like to reduce my code to deal with
this stuff to at most

 use I18N::Charset qw(...);

preferably less.

regards.

<Prev in Thread] Current Thread [Next in Thread>