xsl-list
[Top] [All Lists]

Re: [xsl] remove tags + CDATA tag out of big xml file

2010-01-29 09:18:30
On 29/01/2010 11:02, bw wrote:
Hello,

I have a big xml feed out of my content management system that
includes wysiwyg html tags inside CDATA tags.

I am looking for a way to remove the CDATA and only get the text.
CURRENT:
<add>
    <doc>
       <some_title>My title</some_title>
          <content><![CDATA[
<p>The<strong>keyword</strong>  is nice to have but is not needed to
include in a solr feed</p><p><table cellspacing="2" cellpadding="2"
border="1" width="100%"><tbody><tr><td>&#201;tape 1&nbsp;:</td></tr>
]]></content>
    </doc>
    <doc>
       ....
    </doc>
</add>

WANTED:
<add>
    <doc>
       <some_title>My title</some_title>
          <content>The keyword is nice to have but is not needed to
include in a solr feed</content>
    </doc>
    <doc>
       ....
    </doc>
</add>

Cheers


XSLT has no access to any tags in the input file, they are all resolved by an XML parser before XSLT sees the input.

So your input is


   <content>
&lt;p>The&lt;strong>keyword&lt;/strong> is nice to have but is not needed to include in a solr feed&lt;/p>&lt;p>&lt;table cellspacing="2" cellpadding="2" border="1" width="100%">&lt;tbody>&lt;tr>&lt;td>&#201;tape 1&nbsp;:&lt;/td>&lt;/tr>
 </content>


The best way to get from such a string to an XML element tree is to parse the string. saxon and some other systems havve extensions to do that

<xsl:copy-of seelct="saxon:parse(content)"/>

for example.

Otherwise as a deprecated and non portable alternative you may be able to get away with

<xsl:value-of disable-output-escaping="yes" select="content"/>

which doesn't create the element nodes, but just makes the appearance of them in the serialised reult.


david



________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>