perl-unicode

Determine encoding from $LANG

2001-06-28 01:18:55
On Tue, 26 Jun 2001, Bruno Haible wrote:

     A program cannot be considered properly internationalized
     until it obeys the current locale (LC_ALL || LC_CTYPE || LANG).

The programs we are waiting for are:
[...]

Add to that list many of the programming languages that use Unicode
internally but that do not yet set the default i/o encoding correctly
automatically based on LC_ALL || LC_CTYPE || LANG.

For example TCL currently uses some primitive LANG substring matching,
which basically gets only a few Japanese and Russian encodings right. The
TCL function unix/tclUnixInit.c:TclpSetInitialEncodings really should call
libcharset or nl_langinfo(CODESET) instead:

  
https://sourceforge.net/tracker/?func=detail&aid=418645&group_id=10894&atid=110894

I suspect that Perl and Python are not much better and don't call
nl_langinfo(CODESET) or the portable libcharset wrapper around it either
to properly determine the locale-dependent external encoding.

References on how to determine the character encoding from the locale in a
safe and portable manner:

http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate
http://clisp.cons.org/~haible/packages-libcharset.html
http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

<Prev in Thread] Current Thread [Next in Thread>