Re: Keeping byte-wise processing as an option

"In future, Perl-level operations will be expected to work withcharacters rather than bytes."
I very much appreciate all your hard work on the internationalizationof Perl.
However, recently I have been working on some things that let me think
that the above statement, if taken directly, may be going somewhat toofar.


I don't think there is any fear of Perl ever going that far.
There is just too much legacy code that would go bang.

All these were written assuming a simple bytes-in-bytes-out model.
At least the later fails with Perl 5.8.1 when the PERL_UNICODE
environment variable is defined.


If you have set PERL_UNICODE you have explicitly requested that your
legacy code should go bang.

Jungshik has also reported that
it fails with Perl 5.8.0 with an UTF-8 locale.

Perl 5.8.0 was very broken with UTF-8 locales since it"auto-PERL_UNICODEd".We saw (keep seeing) a lot of that since RedHat 8 and 9 had theunfortunatecombination of both Perl 5.8.0 _and_ UTF-8 locales (which the usersdidn'texpect/know about/care about). Lots of code that expected to producee.g.

0xff started to produce 0xc3 0xbf.  Bang!
Use rather 5.8.1 or later.

What I'm looking for is a very simple way to write perl programs
that work on byte streams. This should be possible without depending
on versions, working both on very old versions as well as future
versions.


Off-hand I can say that getting both 5.6 and 5.8 work at the same time
may be impossible in spots simply because 5.6 was badly unfinished as
regards to Unicode.  No, it won't get fixed.  Beyond 5.8, I don't.

Some people may have some tricks they use to get Unicode code workingboth

in 5.6 and 5.8, but _in_principle_ the bytes pragma should tell Perl in
both 5.6 and 5.8 that "I want bytes, darn it."

--

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is thisspecial

biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen