It looks like a simple explanation - you were using a product with a
serious bug in it.
Michael Kay
Software AG
home: Michael(_dot_)H(_dot_)Kay(_at_)ntlworld(_dot_)com
work: Michael(_dot_)Kay(_at_)softwareag(_dot_)com
-----Original Message-----
From: owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
[mailto:owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com] On Behalf Of
mark_fletcher(_at_)peoplesoft(_dot_)com
Sent: 23 April 2003 18:01
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Character entities in attribute values
Hi Mike (and others who have responded),
First, I've found and fixed the problem. I'm using
Arbortext's E3 product to do my processing and there was an
instruction in their internal code to write out non-ASCII
characters as numeric character references. So, that's how
the accented unicode characters in the tag attributes became
character references. Once I fixed that problem, the HTML
output was fine, as there were no ampersands in any of the
attribute values.
However, it still sounds like you're all saying that even
when a character reference does exist in an attribute value,
I should not be seeing escaped ampersands when that attribute
value is output as text. Well, if anyone's interested (and
I'm not sure why you would be, at this point ;-) here's a
sample of my previous input and output data and my xsl code
that demonstrates the problem I was having:
source xml tag:
<xref linkend="i090f42a68009c2c9" book_code="cmkt"
book_title="Guide Marketing du système GRC de
PeopleSoft, version 8.8" chapter_title="Définition des
entités de l'application Marketing de PeopleSoft"
XREF_type="3" target_title="Définition des entités
de l'application Marketing de PeopleSoft"
chapter_type="Chapitre" file_name="cmkt03.htm"/>
xsl template for this element:
<xsl:template name="xref">
<A
HREF="../../{(_at_)book_code}/htm/{(_at_)file_name}#{@linkend}"><xsl:value-of
select="@target_title"/></A>
</xsl:template>
html output:
<A
HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&#xe9;finition
des entit&#xe9;s de l'application Marketing de PeopleSoft</A>
Mark Fletcher
PeopleSoft Language Engineering
925.694.3753
mark_fletcher(_at_)peoplesoft(_dot_)com
"Mike Brown"
<mike(_at_)skew(_dot_)org> To:
xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Sent by: cc:
owner-xsl-list(_at_)lists(_dot_)mulbe
Subject: Re: [xsl] Character entities in attribute values
rrytech.com
04/23/2003 06:05 AM
Please respond to xsl-list
mark_fletcher(_at_)peoplesoft(_dot_)com wrote:
the output text looks something like this: &eacute; instead of
this: é
First please realize that when you output XML or HTML, the
XSLT processor is (effectively, not necessarily) running a
node tree through a serializer, and the serializer is what is
escaping "&" and "<" and certain other characters appearing
in places where they would otherwise be confused with markup.
If you're getting &eacute; in the output, then you must
have put the 8 characters "&" "e" "a" "c" "u" "t" "e" ";"
into an attribute node (or text node, but you mentioned
attribute) in your result tree, perhaps by copying this text
from the source tree. Since you told the processor you wanted the
*node* to contain those 8 characters, rather than 1 entity
reference, it serialized the node in such a way that you'd
get the characters when the output document is parsed. In
other words, it preserved the semantics of the data, clearly
distinguishing between character data and the structures
implied by markup.
Given that the XML parser feeding parsed data to the XSLT
processor would have interpreted "é" in your original
source document as a reference to the entity named acute,
there's no way the 8 characters could have ended up in your
source tree unless you did one of the following:
- explicitly constructed that string in your stylesheet
- copied text that was originally written like &eacute;
- copied text that was originally written like <![CDATA[é]]>
Both of the latter two mean exactly the same thing, and since
the most common FAQ and misconception on this list (well, one
of the most common) is the mistaken assumptions people make
about what CDATA sections are, I'm going to guess that
whoever made your XML decided to try to use it as a transport
for entity-laden, non-well-formed HTML, saying that this data
is just text, not markup. Then you tried to use XSLT to copy
it through, and were surprised to see that you can't use XSLT
to pretend character data is actually markup.
However, as others have mentioned, this is just a wild guess.
Explain more about what you're doing, with sample code (brief).
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list