perl-unicode

Re: BOM and principle of least surprise

2004-05-04 23:30:06
Paul Hoffman wrote:

At 2:17 PM -0800 3/31/04, Larry Wall wrote:

Perl 6 will assume that script is in some kind of recognizable Unicode
encoding, any of:

   UTF-8
   UTF-16
   UTF-32
   SCSU

Of those, probably only SCSU requires a BOM, since Perl scripts are almost
certain to be strict ASCII in the first few bytes where it matters.

If it starts parsing as UTF-8, and runs into trouble, it might or might
not try to intuit the real encoding.  Haven't really decided that yet.

You can always explicitly switch the encoding with "use encoding" or
some such.


Is it too late in the Perl 6 process to ask for fewer options here? 

Ummm, why?  Giving fewer options to users has never been a strong Perl
tradition :-)  Besides, recognizing the various Unicode encodings is
pretty trivial, especially if we know something that's likely to be
present in the first line, like "perl".

Saying "it's always UTF8, and if you want it different, you must 
convert it yourself each time" would save lots and lots (and lots) of 
problems with guessing. Predictability is good, yes?



-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen