perl-unicode

Re: Segfault using HTML::Entities

2004-06-30 09:30:08
Le 30 juin 04, à 14:46, Richard Jolly a écrit :

In my original mail the offending line was:

<title>The Modern R&amp;eacute;sum&amp;eacute;</title>

Now this is a bit off, because is RSS, therefore utf8, but its got encoded latin1 entities (&eacute;) in there, with the & further encoded for xml safety.

I'm no XML expert, but this doesn't look right. An e acute is &eacute;
whereas &amp;eacute is &eacute. It's not "safer", it's different.
IMHO the double encoding is in the XML data itself.

Also, saying &eacute; et al are "latin1" entities doesn't make
sense to me, since entities are a way to encode non ASCII characters
into an ASCII representation-- this is orthogonal to the XML document's
encoding or the XML parser's output encoding.

--
Eric Cholet

<Prev in Thread] Current Thread [Next in Thread>