perl-unicode

Re: My favorite bug to fix for 5.8.0

2002-03-11 13:27:12
Nick Ing-Simmons writes:
: Markus Kuhn <Markus(_dot_)Kuhn(_at_)cl(_dot_)cam(_dot_)ac(_dot_)uk> writes:
: >Nick Ing-Simmons wrote on 2002-03-11 12:08 UTC:
: >> >>  http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c
: >>
: >> For perl I think that we would want to treat "C" and "POSIX" as meaning
: >> iso-8859-1 rather than ASCII.
: >
: >Decide yourself. But understand that sooner or later, the "C" and
: >"POSIX" locales will be extended from "ASCII" to "UTF-8", and then you
: >will be faced with a backwards incompatible change if you had ISO 8859-1
: >in "C" so far. Backwards compatibility with a future extension of "C"/
: >"POSIX" to "UTF-8" is the reason for why under glibc 2.2, "C" is
: >explicitely ASCII and not "ISO\xA08859-1" at the moment.
: 
: But we have a pile of legacy stuff in US and UK with no locale set at all
: (which defaults to "C" IIRC) - which have been happily processing iso-8859-1
: HTML etc. for years. To suddenly have them barf on \xA33 or 49\xA2 is not 
acceptable.

At least on input, there is little problem between ISO-8859-1 and UTF-8
if Perl does autorecognition.  And in Perl 5's internal representation,
there's not even any conversion--just flipping the UTF8 bit.  (:text
will have to work harder in other locales, of course.)

As usual, output is another matter.  In-place edits should probably
keep the current format, and other output should probably pay some
attention to locale or other environment variables that say what
environment you're running in.  (That's why they're called environment
variables, after all.)

Larry