perl-unicode

Re: :encoding() layer modifies read-only scalars

2004-11-29 11:30:08
* Nick Ing-Simmons wrote:
 Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, 

 #!perl -w
 use strict;
 use warnings;
 use Encode;
 
 my $string = encode(UTF16 => "");
 
 for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/)
 {
   my $backup = $string;
   open F, "<:encoding($_)", \$backup;
   my $char;
   read F, $char, 1, 0;
   close F;
 
   die unless $backup eq $string;
 }

There are no "readonly scalars" there.

It is opened for input according to `perldoc -f open` which suggests
that it should not be written to it. I would neither expect that the
code above modifies the scalar if it was opened for input and output
though, as I do not ask for any write operation. At least I am not
aware of documentation from which I should expect the behavior.

Personally, I would be more motivated to fix whichever is at fault 
if you explain why you are trying to read one character from 
a stream with no characters in it (except a BOM).

I think I ran into this while looking for better Encode usage for my
HTML/XHTML/XML document encoding detection module HTML::Encoding which
needs to do some trial and error decoding; for example, the module
attempts to check the document for a byte order mark and currently does
so by encoding U+FEFF in suspected encodings and compares the resulting
octets with the octets of the document. Reading the first character from
the string and comparing it to U+FEFF might make more sense than that, I
thus looked into doing that and ran into this unexpected behavior.
-- 
Björn Höhrmann · mailto:bjoern(_at_)hoehrmann(_dot_)de · 
http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

<Prev in Thread] Current Thread [Next in Thread>