xsl-list
[Top] [All Lists]

RE: Multiple CDATA tags...again

2005-05-09 19:20:46
What processor are you using?  With xalan, for the following XML:


<data>
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo&apos;ndude] Comarade Big John Il
</Field>
</data>

By the applying the following XSL:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
       version="1.0">
   <xsl:output method="xml" omit-xml-declaration="no"
           indent="yes" cdata-section-elements="TEXT" />

   <xsl:template match="Field">
       <xsl:if test="contains ('TEXT', @OutputName)">
           <xsl:element name="{(_at_)OutputName}">
               <xsl:copy-of select="."/>
           </xsl:element>
       </xsl:if>
   </xsl:template>
</xsl:stylesheet>

I get this result:

<?xml version="1.0" encoding="UTF-8"?>
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo'ndude] Comarade Big John Il
</Field>

Regards,

--A

From: mylistaddress(_at_)canada(_dot_)com
Reply-To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Multiple CDATA tags...again
Date: Mon, 09 May 2005 18:02:41 -0700 (PDT)

Hi,
Thanks for responding. I am pretty much ready to throw
myself off of a bridge...but I guess I can't complain
about learning on the job.

OK, here's the deal. I am sending XML requests via Java
1.4 to a library DB called STAR XML (made by Cuadra)
which sends back a very verbose XML response of a news
item. I have no control over the format of the output.
I was able to make sense out of it (thanks to your
responses) and transform it into a format more
acceptable to the Verity search indexing spider.

When the output from STAR XML is HTML, the < and > tags
are converted to &lt; and &gt; and so on. Oddly it
appears to also convert a quote as &amp;quot; instead
of &quot;. When I try to index the resulting XML
document without placing CDATA tags (not really a tag,
right?) around the content, the indexer fails.
The content also contains [ and ] and non english text.

So, I added the cdata-section-elements declaration to
my xsl:output and this is when I encountered the
multiple cdata tags. At first i suspected they appeared
wherever there is a line-break, but this does not
appear to be the case.

Here is a portion of the XML response from STAR XML:
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo&apos;ndude] Comarade Big John Il
</Field>

Here is a portion of the XSL dealing with the TEXT
element:
<xsl:output method="xml" omit-xml-declaration="no"
indent="yes" cdata-section-elements="TEXT" />
<xsl:strip-space elements="*" />
...
<xsl:template match="Field">
<xsl:if test="contains ('TEXT', @OutputFieldName)">
<xsl:element name="{(_at_)OutputFieldName}">
<xsl:apply-templates/>
</xsl:if>
</xsl:template>

Resulting XML:
<TEXT>
<![CDATA[2010 &quot;We
     ]]><![CDATA[       Respectfully Wish
Hea]]><![CDATA[lth of the great leader
    ]]><![CDATA[      [yo'ndude] Brother ]]><![CDATA[
Big John Il]      ]]>
</TEXT>

As you can see, the CDATAs are appearing all over the
place. This is just a small clip. The actual doc has
dozens. Also notice how the &quot; (no more &amp;
before the quot;) appear now. Do I have to transform
them again? My literal [ and ] are intact.

I visited dpawson.co.uk and read up on the doe stuff,
but am still stuck. Could anyone recommend a book? XSLT
cookbook? I borrowed the O'reiley XML hack (and noticed
your name) but it is slim on xsl.

Thanks so much for any help.

JW

_________________________________________________________________
Don?t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--