perl-unicode

RE: Resolving charset names with Encode

2004-10-25 01:30:22
Martin 'Kingpin' Thurn <mthurn(_at_)verizon(_dot_)net> writes:
 It seems to me that the main problem is that Encode does not use IANA
registered names.  

It is supposed to have IANA names as aliases.

And ebcdic-cp-us didn't work because of a bug in
I18N::Charset (sorry about that).
 The proper solution IMO is to use the &add_enco_alias() function of
I18N::Charset.  

Possibly true, but Encode has its own alias scheme as I wasn't aware
of I18N::Charset at the time.

In the meantime, I have studied the Encode documentation and
I have added some default aliases, and I will release a new version of
I18N::Charset soon.

- - Martin

-----Original Message-----
From: Bjoern Hoehrmann [mailto:derhoermi(_at_)gmx(_dot_)net]
Sent: Wednesday, October 20, 2004 15:53
To: perl-unicode(_at_)perl(_dot_)org
Cc: mthurn(_at_)verizon(_dot_)net
Subject: Resolving charset names with Encode


Hi,

  What is currently the best way to resolve charset names to use them
with Encode.pm? I would have expected that e.g.

  Encode::decode('ebcdic-cp-us', '')

would just work but it does not appear to know that alias. Then I've
tried to use I18N::Charset as in

  Encode::decode(I18N::Charset::enco_charset_name('ebcdic-cp-us'), '')

which also fails. Then I've tried something simpler using the cp037
alias

  Encode::decode('cp037', '')
  Encode::decode(I18N::Charset::enco_charset_name('cp037'), '')

which both fail, too. In order to use the encoding with Encode it seems
I have to use "CP37" which is not registered in the IANA registry... So
this does not seem to work very well. It works better in other cases
such as

  Encode::decode('l1', '')                                   # fails
  Encode::decode(I18N::Charset::enco_charset_name('l1'), '') # works

Now I would have hoped there is a foo() in I18N::Charset that I could
use as in

  foreach my $name (I18N::Charset::foo)
  {
    my $alias = I18N::Charset::enco_charset_name();
    Encode::Alias::define_alias($name, "'$alias'") if defined $alias;
  }

that would make

  Encode::decode('l1', '');

work, but it seems that there is no such routine... What could be done
to improve this? Ideally I would like to reduce my code to deal with
this stuff to at most

  use I18N::Charset qw(...);

preferably less.

regards.


<Prev in Thread] Current Thread [Next in Thread>