xsl-list
[Top] [All Lists]

Re: [xsl] text extraction

2006-10-12 06:50:25
<E1> text1 <E2> text2 </E2> text3 </E1>

I want to have something like:
text1 text2 text3

Folks have indicated that you can take advantage of the natural
processing/handling that XSLT defines, so that something like
this would, for your example XML, emit what you wanted:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">
  <xsl:output method="text"/>
</xsl:stylesheet>

But if your markup was more complicated, so it had embedded elements
within E1 that you wanted to ignore, you could walk through each node
in the document, and on text nodes with the proper parent, emit the
normalized string:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">
  <xsl:output method="text"/>
  <xsl:template match="node()">
    <xsl:apply-templates select="node()"/>
  </xsl:template>
  <xsl:template match="text()[parent::*[self::E1|self::E2]]">
    <xsl:sequence select="normalize-space(.)"/>
  </xsl:template>
</xsl:stylesheet>

That would let you handle, for example, something like

<V><E1>text<E2>text2<baz>smorth</baz></E2>text3<flober>chum</flober></E1></V>

Jim

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       
jim(_dot_)robinson(_at_)stanford(_dot_)edu
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--