perl-unicode

Re: My favorite bug to fix for 5.8.0

2002-03-10 11:43:43
On Sun, Mar 10, 2002 at 06:35:50PM +0000, Markus Kuhn wrote:
Jarkko Hietaniemi wrote on 2002-03-10 16:45 UTC:
Oh, then there is this in open.pm:

            # Could do more heuristics based on the country and language
            # parts of LC_ALL and LANG (the parts before the dot (if any)),
            # since we have Locale::Country and Locale::Language available.
            # TODO: get a database of Language -> Encoding mappings
            # (the Estonian database at http://www.eki.ee/letter/
            # would be excellent!) --jhi

I've got the said databases stashed somewhere.

My friend Mike Bond likes to say: "To really understand the wheel, you
have to reinvent it". Nevertheless, this LC_* -> MIME charset conversion
has already been done and widely tested

  http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c

so why not use that? I am interacting with many communities like

The said database does not do LC_* -> MIME mapping, really.  Given an
ISO639 language tag the database tells what Unicode characters the
language needs (and if there are romanization schemes, what then), and
from those characters one can then see what *legacy* (non-Unicode)
encodings might be in use.  So it's a one-to-many mapping.

yourself (Perl, TCL, Emacs, ...) and they are all facing the exact same
backwards compatibility problems, therefore I decided to maintain such
code for all of them, such that we have a single point for updating the
necessary mappings.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen