Lo all,
I was wondering if someone could help me out with this little problem.
A large part is probably down to my ignorance, anyway...
I have the following small script:
#!/usr/bin/perl -w
use Encode qw(is_utf8 _utf8_on encode_utf8 decode_utf8 decode encode);
use Devel::Peek;
my $data = "\xC3\x84";
_utf8_on($data);
print 'IS: ', is_utf8($data)?1:0,"\n",'ORD: ', ord $data, "\n";
print 'LENGTH: ', length $data, "\n";
print 'PEEK: ', Dump($data);
open FH1, "> file";
binmode FH1, ":raw";
print FH1 $data ;
close FH1;
Basically I have xC3 x84 and let perl think it is utf-8.
It is valid utf-8 ie A with diaresis.
This is the output and what Devel::Peek produces:
IS: 1
ORD: 196
LENGTH: 1
SV = PVMG(0x80ae27c) at 0x805af24
REFCNT = 1
FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x8051118 "\303\204"\0 [UTF8 "\x{c4}"]
CUR = 2
LEN = 3
MAGIC = 0x804ee78
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 1
I don't understand what the [UTF8 "\x{c4}"]
is telling me. xc4 is not valid utf-8. It is however
valid unicode as xc4 is a precomposed char.
What's worse is that the output file contains xc4 and not
the utf-8 sequence I expected.
Could one of you kind souls give me some clue please?
Thx
John
here's my perl -V
Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
Platform:
osname=linux, osvers=2.6.10, archname=i686-linux-ld
uname='linux silent-running 2.6.10 #1 sat feb 19 23:23:07 gmt 2005 i686
unknown '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=define
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -I/usr/include'
ccversion='', gccversion='3.2.2', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='long double', nvsize=12, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags ='-L/usr/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.3.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.8.6/i686-linux-ld/CORE'
cccdlflags='-fpic', lddlflags='-shared -L/usr/lib'
Characteristics of this binary (from libperl):
Compile-time options: USE_LONG_DOUBLE USE_LARGE_FILES
Built under linux
Compiled at Mar 2 2005 15:03:34
@INC:
/usr/lib/perl5/5.8.6/i686-linux-ld
/usr/lib/perl5/5.8.6
/usr/lib/perl5/site_perl/5.8.6/i686-linux-ld
/usr/lib/perl5/site_perl/5.8.6
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
.