On Mon, May 10, 2004 at 04:45:55PM +0100, Nick Ing-Simmons wrote:
: Larry Wall <larry(_at_)wall(_dot_)org> writes:
: >
: >Right now, the meaning of "text" is subject to severe distortions
: >due to legacy issues. But in the long run, "text" is going to mean
: >Unicode, and that probably means a UTF-8 file encoding at least in
: >the western world,
:
: Microsoft seem to be somewhat focused on some 16-bit form.
Yeah, well, they've never minded if you have to buy a new computer to
run their new software... :-)
: This thread started as complaint that perl5 can't read a
: script saved as UCS-2/UTF-16 or whatever Windows uses.
That's why I said "probably". And I probably should have said
"hopefully" instead. :-)
But my main point was that "text" will eventually mean "Unicode",
whether or not that means "UTF-8". (I probably should have
parenthesized the two subthoughts about what will end up the default
where.) Really, though, once you've guaranteed a Unicode view at the
appropriate input boundaries, the differences between the various UTFs
should be fairly insignificant from a language point of view, provided
you maintain the abstractions. The Perl 5 engine unfortunately doesn't
provide quite enough abstraction power to pull it off. We're hoping to
do a better job of pulling it off with Perl 6, but that implies a more
strongly typed string implementation underneath than Perl 5 provides.
Perl's always been about providing reasonable defaults, and will
continue to do so. But changing what's reasonable is tricky, and
sometimes you have to go through a period in which nothing can be
considered reasonable.
Larry