xsl-list
[Top] [All Lists]

Re: [xsl] XSLT to remove characters and whitespaces

2006-07-07 08:28:58
Hi Georg,

A couple of things:

I'm unsure of why you are normalizing the spaces after converting CRs and tabs to spaces, and stripping line feeds, with translate() (in two separate operations). Why not simply normalize the spaces, since that takes care of line feeds and tabs? (The parser should already have normalized CRs away so they shouldn't even be there.)

More basically, and this is what accounts for your problem: you are matching elements, creating new elements with the same names (any reason not to use the simpler xsl:copy instruction?), writing out their string values (i.e. all the text inside the elements) and then descending the tree to do the same. This results in your string values being written out over and over again, every time an ancestor element gets processed.

So if your input were

<greeting>
  <to>Georg</to>
  <from>XSL-List</from>
  <text>Hey, how are tricks?</text>
</greeting>

you'll get

<greeting>GeorgXSL-ListHey, how are tricks?
  <to>Georg</to>
  <from>XSL-List</from>
  <text>Hey, how are tricks?</text>
</greeting>

since the greeting element gets its text value copied before its own element contents are traversed.

Instead of this, you only want to normalize values of the *text* nodes, letting element nodes take care of themselves ... so:

<xsl:template match="text()"/>
  <xsl:value-of select="normalize-space()"/>
</xsl:template>

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

... as you can see, fairly simple, and a garden-variety near-identity transform.

Cheers,
Wendell

 At 06:33 AM 7/7/2006, you wrote:
Hello,

i have a xml file with some content in it which contains some unwanted
carriage returns and whitespaces. Now I'm trying to write a stylesheet
which makes an exact copy of the source file but without the returns
and whitespaces. I thought this should work:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output name="stripped" method="xml" version="1.0"
encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
  <xsl:result-document format="stripped" href="result.xml">
     <xsl:apply-templates/>
  </xsl:result-document>
</xsl:template>
<xsl:template match="*">
  <xsl:element name="{name()}">
     <xsl:value-of select="normalize-space(translate(translate(.,
'&#x0d;&#x0a;', ' '), '&#09;', ' '))"/>
  <xsl:apply-templates/>
  </xsl:element>
</xsl:template>
</xsl:stylesheet>

But the output is a mess in parts. What am I doing wrong?


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>