xsl-list
[Top] [All Lists]

RE: Multiple CDATA tags...again

2005-05-10 03:28:23
This CDATA problem is odd but it's essentially a distraction. The root cause
of your problem is that you're getting some very peculiar XML out of the
database.

I don't know it this is the fault of the database vendor - it's entirely
possible that the rot started with the data that was put into the database
in the first place. You should be trying to identify where the special
characters such as ampersand got double-escaped, and fix the problem at its
origin.

Meanwhile, if you want to tidy up the rubbish that you're getting from the
database, I would think a good start would be to get rid of the
double-escaping using something like:

<xsl:template match="text()">
  <xsl:variable name="doc">
    <x><xsl:copy-of select="."/></x>
  </xsl:variable>
  <xsl:value-of select="saxon:parse($doc)"/>
</xsl:template>

That's a Saxon-specific solution of course, but it's probably the easiest.

Michael Kay
http://www.saxonica.com/


-----Original Message-----
From: mylistaddress(_at_)canada(_dot_)com 
[mailto:mylistaddress(_at_)canada(_dot_)com] 
Sent: 10 May 2005 02:03
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Multiple CDATA tags...again

Hi,
Thanks for responding. I am pretty much ready to throw
myself off of a bridge...but I guess I can't complain
about learning on the job.

OK, here's the deal. I am sending XML requests via Java
1.4 to a library DB called STAR XML (made by Cuadra)
which sends back a very verbose XML response of a news
item. I have no control over the format of the output.
I was able to make sense out of it (thanks to your
responses) and transform it into a format more
acceptable to the Verity search indexing spider.

When the output from STAR XML is HTML, the < and > tags
are converted to &lt; and &gt; and so on. Oddly it
appears to also convert a quote as &amp;quot; instead
of &quot;. When I try to index the resulting XML
document without placing CDATA tags (not really a tag,
right?) around the content, the indexer fails.
The content also contains [ and ] and non english text.

So, I added the cdata-section-elements declaration to
my xsl:output and this is when I encountered the
multiple cdata tags. At first i suspected they appeared
wherever there is a line-break, but this does not
appear to be the case. 

Here is a portion of the XML response from STAR XML:
<Field outputName="TEXT">
2010 &amp;quot;We
respectfully Wish the health of the great leader
[yo&apos;ndude] Comarade Big John Il 
</Field>

Here is a portion of the XSL dealing with the TEXT
element:
<xsl:output method="xml" omit-xml-declaration="no"
indent="yes" cdata-section-elements="TEXT" />
<xsl:strip-space elements="*" />
...
<xsl:template match="Field">
<xsl:if test="contains ('TEXT', @OutputFieldName)">
<xsl:element name="{(_at_)OutputFieldName}">
<xsl:apply-templates/>
</xsl:if>
</xsl:template>

Resulting XML:
<TEXT>
<![CDATA[2010 &quot;We
     ]]><![CDATA[       Respectfully Wish
Hea]]><![CDATA[lth of the great leader
    ]]><![CDATA[      [yo'ndude] Brother ]]><![CDATA[  
Big John Il]      ]]>
</TEXT> 

As you can see, the CDATAs are appearing all over the
place. This is just a small clip. The actual doc has
dozens. Also notice how the &quot; (no more &amp;
before the quot;) appear now. Do I have to transform
them again? My literal [ and ] are intact.

I visited dpawson.co.uk and read up on the doe stuff,
but am still stuck. Could anyone recommend a book? XSLT
cookbook? I borrowed the O'reiley XML hack (and noticed
your name) but it is slim on xsl.

Thanks so much for any help.

JW

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--