perl-unicode

Re: Encode::compat (was Re: Encode functionality for Perl 5.6.1)

2002-09-21 20:30:05
On Sun, Sep 22, 2002 at 11:06:27AM +0800, Autrijus Tang wrote:
Dan san,

  a) if it uses 'Encode' as module name it needs to work both in 5.8
  and 5.6.1.  Bottom line is that backported version will not breach
  what it is now.  If it ain't broke, don't fix it (and 5.6.1 was
  broke Unicode-wise) b) if you just implemented Encode functionality
  in perl 5.6.1 but incompatible w/ 5.8, give it a different name;
  i.e) Encode::Compat

Incidentally I have just finished a skeleton of Encode::compat, named
after Apache::compat (lower case) since it's a 'pragma' for Encode
usage, instead of a subcomponent of Encode.

It is available on CPAN, or at:

    http://www.autrijus.org/Encode-compat-0.01.tar.gz

All it does is translate whatever call it receives into Text::Iconv, or
(in the future) Unicode::MapUTF8 to perform the actual work.

The is_utf8(), _utf8_on() and _utf8_off() calls are performed by the
method native to the perl version -- 5.6.1 would use pack/unpack, 5.6.0
uses tr//CU, etc.

If one has the GNU recode installed one can also fall back to
Convert::Recode, and if all else fails one can do at least the
Latin-1 <-> UTF-8 byte level conversion rather trivially with the:

    s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
    s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen