perl-unicode

Re: original_string method

1999-12-24 10:50:25
Bart Schuller writes:
: On Mon, Dec 20, 1999 at 09:17:55AM -0800, Larry Wall wrote:
: > Matt Sergeant writes:
: > : have a heredoc with chinese writing embeded in your script - what a 
: > : nightmare...) but I don't think you need to worry about that.
: > 
: > I wasn't going that far, though it may be possible in practice.  Generally,
: > if the script is in utf8, all the literals would default to being in utf8
: > as well.
: 
: "The script" can only be in one encoding at once I hope.

Most scripts would only be in one encoding.

: >     use bytes;
: >     markoem <<"END";
: >     (Big5 stuff here.)
: >     END
: 
: But here you imply that the script as a whole starts out as valid utf8,
: but then when it sees "use bytes" it suddenly accepts any 8-bit value.
: That means that if you write a perl script to scan your source code, the
: script will complain about lots of malformed utf8 characters.
: 
: In other words, just the situation Matt was afraid of.

I wasn't recommending the above, nor trying to make it easy.  But
I'm not trying to make it impossible either.

: If "use bytes" *doesn't* do all that, then I don't see what the "(Big5
: stuff here.)" would look like in an octal dump.
: 
: Or is it the "markoem" that changes the document encoding halfway
: through the script?

No.  The "use bytes" is an explicit declaration of the old Perl semantics,
and that had better include the interpretation of literals.  Anybody
who intends to remain sane had better make sure the rest of the file
is in bare ASCII, but I'm not interested in enforcing that.

: If you've decided you want the functionality then I think a new quoting
: and/or here-document syntax might be a better way to change the meaning
: of the bytes that make up the script.

Not worth the agony.  Most new scripts are just gonna be straight utf8.

Larry

<Prev in Thread] Current Thread [Next in Thread>