RE: Character entities in attribute values

It looks like a simple explanation - you were using a product with a
serious bug in it.

Michael Kay
Software AG
home: Michael(_dot_)H(_dot_)Kay(_at_)ntlworld(_dot_)com
work: Michael(_dot_)Kay(_at_)softwareag(_dot_)com

-----Original Message-----
From: owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com 
[mailto:owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com] On Behalf Of 
mark_fletcher(_at_)peoplesoft(_dot_)com
Sent: 23 April 2003 18:01
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Character entities in attribute values



Hi Mike (and others who have responded),

First, I've found and fixed the problem.  I'm using 
Arbortext's E3 product to do my processing and there was an 
instruction in their internal code to write out non-ASCII 
characters as numeric character references.  So, that's how 
the accented unicode characters in the tag attributes became 
character references.  Once I fixed that problem, the HTML 
output was fine, as there were no ampersands in any of the 
attribute values.

However, it still sounds like you're all saying that even 
when a character reference does exist in an attribute value, 
I should not be seeing escaped ampersands when that attribute 
value is output as text.  Well, if anyone's interested (and 
I'm not sure why you would be, at this point ;-) here's a 
sample of my previous input and output data and my xsl code 
that demonstrates the problem I was having:

source xml tag:

<xref linkend="i090f42a68009c2c9" book_code="cmkt" 
book_title="Guide Marketing du syst&#xe8;me GRC de 
PeopleSoft, version 8.8" chapter_title="D&#xe9;finition des 
entit&#xe9;s de l'application Marketing de PeopleSoft" 
XREF_type="3" target_title="D&#xe9;finition des entit&#xe9;s 
de l'application Marketing de PeopleSoft" 
chapter_type="Chapitre" file_name="cmkt03.htm"/>

xsl template for this element:

<xsl:template name="xref">
  <A 
HREF="../../{(_at_)book_code}/htm/{(_at_)file_name}#{@linkend}"><xsl:value-of
select="@target_title"/></A>
</xsl:template>

html output:

<A 
HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&amp;#xe9;finition
des entit&amp;#xe9;s de l'application Marketing de PeopleSoft</A>




Mark Fletcher
PeopleSoft Language Engineering
925.694.3753
mark_fletcher(_at_)peoplesoft(_dot_)com



                                                              
                                                            
                      "Mike Brown"                            
                                                            
                      <mike(_at_)skew(_dot_)org>                   To:   
    xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com                         
                      Sent by:                          cc:   
                                                            
                      owner-xsl-list(_at_)lists(_dot_)mulbe        
Subject:  Re: [xsl] Character entities in attribute values        
                      rrytech.com                             
                                                            
                                                              
                                                            
                                                              
                                                            
                      04/23/2003 06:05 AM                     
                                                            
                      Please respond to xsl-list              
                                                            
                                                              
                                                            
                                                              
                                                            





mark_fletcher(_at_)peoplesoft(_dot_)com wrote:

the output text looks something like this: &amp;eacute; instead of 
this: &eacute;


First please realize that when you output XML or HTML, the 
XSLT processor is (effectively, not necessarily) running a 
node tree through a serializer, and the serializer is what is 
escaping "&" and "<" and certain other characters appearing 
in places where they would otherwise be confused with markup.

If you're getting &amp;eacute; in the output, then you must 
have put the 8 characters "&" "e" "a" "c" "u" "t" "e" ";" 
into an attribute node (or text node, but you mentioned 
attribute) in your result tree, perhaps by copying this text 
from the source tree. Since you told the processor you wanted the
*node* to contain those 8 characters, rather than 1 entity 
reference, it serialized the node in such a way that you'd 
get the characters when the output document is parsed. In 
other words, it preserved the semantics of the data, clearly 
distinguishing between character data and the structures 
implied by markup.

Given that the XML parser feeding parsed data to the XSLT 
processor would have interpreted "&eacute;" in your original 
source document as a reference to the entity named acute, 
there's no way the 8 characters could have ended up in your 
source tree unless you did one of the following:
 - explicitly constructed that string in your stylesheet
 - copied text that was originally written like &amp;eacute;
 - copied text that was originally written like <![CDATA[&eacute;]]>

Both of the latter two mean exactly the same thing, and since 
the most common FAQ and misconception on this list (well, one 
of the most common) is the mistaken assumptions people make 
about what CDATA sections are, I'm going to guess that 
whoever made your XML decided to try to use it as a transport 
for entity-laden, non-well-formed HTML, saying that this data 
is just text, not markup. Then you tried to use XSLT to copy 
it through, and were surprised to see that you can't use XSLT 
to pretend character data is actually markup.

However, as others have mentioned, this is just a wild guess. 
Explain more about what you're doing, with sample code (brief).


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list









 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list