perl-unicode

Re: BOM and principle of least surprise

2004-05-16 23:30:04
No.  The patch I submitted peeks at the beginning of a Perl script and
if it either sees a BOM or something that looks like raw BOMless UTF-16
(every other byte zero, every other not) of either endianness, Perl will
understand.


I think I understood that the change was only for the script as such. Let's
forget input files for the moment.

So Perl 5.8.5 will be able to read a UTF-16 file?

Assuming the perl maintainers will approve that patch in to the 5.8
maintenance branch, yes.

And if it sees a UTF-8 BOM, that will imply a "use utf8"?

Not quite 100% semantically the same (use utf8 does many things behind
the curtains), but for your purposes (that the script has been stored in
UTF-8), I think so.

Though I must say that personally I would avoid using BOM with UTF-8:
there is little reason to use a byte order mark with UTF-8 since UTF-8
is byte order independent.

Will this require that I specify a an option to Perl, or will this be 
the default behaviour?

The default.  It was supposed to be the default already in 5.8.0 but
it seems the feature wasn't tested well enough.  Having it optional
makes little sense because *without* the detection the script is simply
illegal Perl: UTF-16 doesn't parse as Perl, and the UTF-8 BOM doesn't
parse as Perl.

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen