perl-unicode

Segfault using HTML::Entities

2004-06-29 11:30:05
Hi,

The following script causes a segmentation fault on redhat but is fine on OSX (see perl -V's below). I'm not sure where to start looking, so any advice much appreciated.

I know when you install HTML:::Entities you are asked if you want it to encode unicode characters. I have no idea if thats relevant, but while I know it was compiled with that support on the OSX box, I don't know if it was on the redhat box.

I apologize if this is common knowledge - google did not enlighten me and I'm always confused by unicode. (And is there a searchable archive of this list? - the one linked at list.perl.org is down).

Thanks,

richard

#### Script

#!/usr/bin/perl
use strict;
use warnings;

use XML::RAI;
use HTML::Entities;

my $xml = do { local $/; <DATA> };
my $r = XML::RAI->parse( $xml );

foreach ( @{$r->items} ) {
  my $t = $_->title;
  print "$t\n";

  $t = decode_entities($t);
  print "$t\n";

  $t = encode_entities($t);
  print "$t\n";

}


__DATA__
<?xml version="1.0" ?>
        <rss version="0.91">
        <channel>
                <title>Smartmoney.com - Consumer Action</title>
<link>http://www.smartmoney.com/consumer/?nav=RSS091</link> <description>Investing, Saving and Personal Finance</description>
                <language>en-us</language>
<copyright>Copyright 2004 Smartmoney.com, joint venture of Dow Jones &amp; Co. and Hearst Communications, Inc.</copyright>


                <item>
<title>The Modern R&amp;eacute;sum&amp;eacute;</title> <link>http://www.smartmoney.com/consumer/index.cfm?story=20040505&amp; nav=RSS091</link> <description>R&amp;eacute;sum&amp;eacute;s that worked even a few years ago aren&apos;t effective today. Here are five essential updates.</description>
                </item>


        </channel>
        </rss>


#### output
[jollyr(_at_)devbox jollyr]$ ./test.pl
The Modern R&eacute;sum&eacute;
Wide character in print at ./test.pl line 16, <DATA> line 1.
The Modern R?sum?
Malformed UTF-8 character (unexpected end of string) at /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/HTML/Entities.pm line 435, <DATA> line 1. Malformed UTF-8 character (unexpected non-continuation byte 0x73, immediately after start byte 0xe9) in substitution iterator at /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/HTML/Entities.pm line 435, <DATA> line 1.
Segmentation fault



#### broken on this
> perl -V
Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
osname=linux, osvers=2.4.21-9.elsmp, archname=i386-linux-thread-multi uname='linux bugs.devel.redhat.com 2.4.21-9.elsmp #1 smp thu jan 8 17:08:56
est 2004 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 -Dversion=5 .8.3 -Dmyhostname=localhost -Dperladmin=root(_at_)localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/u sr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd _dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ex t=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr
/bin/less -isr -Dinc_version_list=5.8.2 5.8.1 5.8.0'
    hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGI NG -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_
BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -march=i386 -mcpu=i686',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-st
rict-aliasing -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='3.3.2 20031218 (Red Hat Linux 3.3.2-5)', gccosandv
ers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize
=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.2'

#### fine on this
11 ~>perl -V
Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=darwin, osvers=7.3.0, archname=darwin-2level
uname='darwin noras-computer.local 7.3.0 darwin kernel version 7.3.0: fri mar 5 14:22:55 pst 2004; root:xnuxnu-517.3.15.obj~4release_ppc power macintosh powerpc ' config_args='-des -Dprefix=/opt/local -Dccflags=-I'/opt/local/include' -Dldflags=-L/opt/local/lib -Dvendorprefix=/opt/local'
    hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
cc='cc', ccflags ='-I/opt/local/include -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/include -I/opt/local/include',
    optimize='-Os',
cppflags='-no-cpp-precomp -I/opt/local/include -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/include -I/opt/local/include' ccversion='', gccversion='3.3 20030304 (Apple Computer, Inc. build 1495)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='-L/opt/local/lib -L/usr/local/lib'
    libpth=/usr/local/lib /opt/local/lib /usr/lib
    libs=-lgdbm -ldbm -ldl -lm -lc
    perllibs=-ldl -lm -lc
libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dyld.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: USE_LARGE_FILES
  Built under darwin
  Compiled at Jun 24 2004 19:12:14
  %ENV:
    PERL5LIB="/opt/local/lib/perl5/site_perl/5.8.2/"
  @INC:
    /opt/local/lib/perl5/site_perl/5.8.2/
    /opt/local/lib/perl5/5.8.4/darwin-2level
    /opt/local/lib/perl5/5.8.4
    /opt/local/lib/perl5/site_perl/5.8.4/darwin-2level
    /opt/local/lib/perl5/site_perl/5.8.4
    /opt/local/lib/perl5/site_perl
    /opt/local/lib/perl5/vendor_perl/5.8.4/darwin-2level
    /opt/local/lib/perl5/vendor_perl/5.8.4
    /opt/local/lib/perl5/vendor_perl
    .

<Prev in Thread] Current Thread [Next in Thread>