perl-unicode

Re: BOM and principle of least surprise

2004-05-05 08:30:12
At 8:56 AM +0300 5/5/04, Jarkko Hietaniemi wrote:
Paul Hoffman wrote:

 At 2:17 PM -0800 3/31/04, Larry Wall wrote:

Perl 6 will assume that script is in some kind of recognizable Unicode
encoding, any of:

    UTF-8
    UTF-16
    UTF-32
    SCSU

Of those, probably only SCSU requires a BOM, since Perl scripts are almost
certain to be strict ASCII in the first few bytes where it matters.

If it starts parsing as UTF-8, and runs into trouble, it might or might
not try to intuit the real encoding.  Haven't really decided that yet.

You can always explicitly switch the encoding with "use encoding" or
some such.
 >

 Is it too late in the Perl 6 process to ask for fewer options here?

Ummm, why?  Giving fewer options to users has never been a strong Perl
tradition :-)  Besides, recognizing the various Unicode encodings is
pretty trivial, especially if we know something that's likely to be
present in the first line, like "perl".

My mistake. When I saw "script" in the first line, I assumed we were talking about subsections of Unicode, not "script as in a program". You're right about doing a guess based on looking for "perl" being pretty definitive.

My hope for fewer options is for reading input. That is, I'd like the default encoding for all inputs and outputs to be UTF8, unless it has been converted and that conversion is somehow flagged.