perl-unicode

Re: Keeping byte-wise processing as an option

2004-01-02 16:30:05
"In future, Perl-level operations will be expected to work with characters rather than bytes."

I very much appreciate all your hard work on the internationalization of Perl.
However, recently I have been working on some things that let me think
that the above statement, if taken directly, may be going somewhat too far.

I don't think there is any fear of Perl ever going that far.
There is just too much legacy code that would go bang.

All these were written assuming a simple bytes-in-bytes-out model.
At least the later fails with Perl 5.8.1 when the PERL_UNICODE
environment variable is defined.

If you have set PERL_UNICODE you have explicitly requested that your
legacy code should go bang.

Jungshik has also reported that
it fails with Perl 5.8.0 with an UTF-8 locale.

Perl 5.8.0 was very broken with UTF-8 locales since it "auto-PERL_UNICODEd". We saw (keep seeing) a lot of that since RedHat 8 and 9 had the unfortunate combination of both Perl 5.8.0 _and_ UTF-8 locales (which the users didn't expect/know about/care about). Lots of code that expected to produce e.g.
0xff started to produce 0xc3 0xbf.  Bang!
Use rather 5.8.1 or later.

What I'm looking for is a very simple way to write perl programs
that work on byte streams. This should be possible without depending
on versions, working both on very old versions as well as future
versions.

Off-hand I can say that getting both 5.6 and 5.8 work at the same time
may be impossible in spots simply because 5.6 was badly unfinished as
regards to Unicode.  No, it won't get fixed.  Beyond 5.8, I don't.
Some people may have some tricks they use to get Unicode code working both
in 5.6 and 5.8, but _in_principle_ the bytes pragma should tell Perl in
both 5.6 and 5.8 that "I want bytes, darn it."

--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen