perl-unicode

is it utf8 or unicode?

2005-03-09 13:03:16
Lo all,

I was wondering if someone could help me out with this little problem.
A large part is probably down to my ignorance, anyway...
I have the following small script:

#!/usr/bin/perl -w
use Encode qw(is_utf8 _utf8_on encode_utf8 decode_utf8 decode encode);
use Devel::Peek;
my $data = "\xC3\x84";
_utf8_on($data);
print 'IS: ', is_utf8($data)?1:0,"\n",'ORD: ', ord $data, "\n";
print 'LENGTH: ', length $data, "\n";
print 'PEEK: ', Dump($data);
open FH1, "> file";
binmode FH1, ":raw";
print FH1 $data ;
close FH1;

Basically I have xC3 x84 and let perl think it is utf-8.
It is valid utf-8 ie A with diaresis.
This is the output and what Devel::Peek produces:
IS: 1
ORD: 196
LENGTH: 1
SV = PVMG(0x80ae27c) at 0x805af24
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x8051118 "\303\204"\0 [UTF8 "\x{c4}"]
  CUR = 2
  LEN = 3
  MAGIC = 0x804ee78
    MG_VIRTUAL = &PL_vtbl_utf8
    MG_TYPE = PERL_MAGIC_utf8(w)
    MG_LEN = 1

I don't understand what the [UTF8 "\x{c4}"]
is telling me. xc4 is not valid utf-8. It is however
valid unicode as xc4 is a precomposed char.
What's worse is that the output file contains xc4 and not
the utf-8 sequence I expected.
Could one of you kind souls give me some clue please?

Thx

John

here's my perl -V

Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
  Platform:
    osname=linux, osvers=2.6.10, archname=i686-linux-ld
    uname='linux silent-running 2.6.10 #1 sat feb 19 23:23:07 gmt 2005 i686 
unknown '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef 
usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=define
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/include 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -I/usr/include'
    ccversion='', gccversion='3.2.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='long double', nvsize=12, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-L/usr/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E 
-Wl,-rpath,/usr/lib/perl5/5.8.6/i686-linux-ld/CORE'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/lib'


Characteristics of this binary (from libperl): 
  Compile-time options: USE_LONG_DOUBLE USE_LARGE_FILES
  Built under linux
  Compiled at Mar  2 2005 15:03:34
  @INC:
    /usr/lib/perl5/5.8.6/i686-linux-ld
    /usr/lib/perl5/5.8.6
    /usr/lib/perl5/site_perl/5.8.6/i686-linux-ld
    /usr/lib/perl5/site_perl/5.8.6
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    .



<Prev in Thread] Current Thread [Next in Thread>