Re: BOM and principle of least surprise

On Mon, May 10, 2004 at 04:45:55PM +0100, Nick Ing-Simmons wrote:
: Larry Wall <larry(_at_)wall(_dot_)org> writes:
: >
: >Right now, the meaning of "text" is subject to severe distortions
: >due to legacy issues.  But in the long run, "text" is going to mean
: >Unicode, and that probably means a UTF-8 file encoding at least in
: >the western world, 
: 
: Microsoft seem to be somewhat focused on some 16-bit form.

Yeah, well, they've never minded if you have to buy a new computer to
run their new software... :-)

: This thread started as complaint that perl5 can't read a 
: script saved as UCS-2/UTF-16 or whatever Windows uses.

That's why I said "probably".  And I probably should have said
"hopefully" instead.  :-)

But my main point was that "text" will eventually mean "Unicode",
whether or not that means "UTF-8".  (I probably should have
parenthesized the two subthoughts about what will end up the default
where.)  Really, though, once you've guaranteed a Unicode view at the
appropriate input boundaries, the differences between the various UTFs
should be fairly insignificant from a language point of view, provided
you maintain the abstractions.  The Perl 5 engine unfortunately doesn't
provide quite enough abstraction power to pull it off.  We're hoping to
do a better job of pulling it off with Perl 6, but that implies a more
strongly typed string implementation underneath than Perl 5 provides.

Perl's always been about providing reasonable defaults, and will
continue to do so.  But changing what's reasonable is tricky, and
sometimes you have to go through a period in which nothing can be
considered reasonable.

Larry