perl-unicode

Re: Keeping byte-wise processing as an option

2004-01-02 18:30:05
Hello Jarkko,

Many thanks for your very quick answer.

At 00:31 04/01/03 +0200, Jarkko Hietaniemi wrote:
"In future, Perl-level operations will be expected to work with characters rather than bytes."

I very much appreciate all your hard work on the internationalization of Perl.
However, recently I have been working on some things that let me think
that the above statement, if taken directly, may be going somewhat too far.

I don't think there is any fear of Perl ever going that far.
There is just too much legacy code that would go bang.

Very good. I didn't really assume there was, but I'd suggest to
tweak that sentence above a bit to make this clear.


Jungshik has also reported that
it fails with Perl 5.8.0 with an UTF-8 locale.

Perl 5.8.0 was very broken with UTF-8 locales since it "auto-PERL_UNICODEd".
We saw (keep seeing) a lot of that since RedHat 8 and 9 had the unfortunate
combination of both Perl 5.8.0 _and_ UTF-8 locales (which the users didn't
expect/know about/care about).  Lots of code that expected to produce e.g.
0xff started to produce 0xc3 0xbf.  Bang!
Use rather 5.8.1 or later.

If it were just me, that would be easy. But stating on an FAQ
page 'use Perl 5.8.1 or later' for something that worked
probably even in Perl 4 doesn't look like a good idea.


What I'm looking for is a very simple way to write perl programs
that work on byte streams. This should be possible without depending
on versions, working both on very old versions as well as future
versions.

Off-hand I can say that getting both 5.6 and 5.8 work at the same time
may be impossible in spots simply because 5.6 was badly unfinished as
regards to Unicode.  No, it won't get fixed.  Beyond 5.8, I don't.

Sorry, I think you missed something in the last sentence. Did you
want to say "I don't know?".

Some people may have some tricks they use to get Unicode code working both
in 5.6 and 5.8, but _in_principle_ the bytes pragma should tell Perl in
both 5.6 and 5.8 that "I want bytes, darn it."

Yes, that seems to do the job. But is this available in 5.0 or earlier?
Or is it possible to write some little code at the start that says
something like:

if (eval "use bytes;") { use bytes; }

(without making the actual invocation restricted to the { ... } ?


Regards,    Martin.