xsl-list
[Top] [All Lists]

Re: [xsl] Removing unwanted space

2021-06-04 06:36:39
Hey Charles,

A couple of techniques I use in this situation:

text()[. is ancestor::p/descendant::text()[1]] -  matches the first text
node in a p, no matter how deep.
text()[. is ancestor::p/descendant::text()[last()]] - same for the end

text()[not(matches(.,'\S')] - text that has no non-whitespace character

replace($str,'^\s*','') - strip *leading whitespace only* from a string.
replace($str,'\s*$','') - same for trailing whitespace

Et sim.

I am not sure I would use xsl:analyze-string here since as you observe it
can be (um) pesky. I might do something as simple as

<xsl:template match=" text()[. is ancestor::p/descendant::text()[1]]">
  <xsl:value-of select=" replace($str,'^\s*','') "/>
</xsl:template>

But the match might have to be greedier if the inline markup is also
deep, and this is only the front end.

This is not an easy problem since the (very smart) computer doesn't know
the difference between "white space that matters" and "white space that
doesn't matter". Indeed its whole notion of "white space" is somewhat
problematic. So I'm not sure who's actually smarter. :-)

Cheers, Wendell





On Thu, Jun 3, 2021 at 7:54 PM Charles O'Connor 
coconnor(_at_)ariessys(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

OK, I've tried this a bunch of ways and failed (using XSLT 2.0).

The XML I'm working with has a bunch of unwanted whitespace in all sorts
of places, but looking just at paragraphs, it can have

<p>
        The rain in <bold>Spain</bold> <italic>is</italic> wet.
</p>

Or

<p>
        <bold>The rain in Spain is wet.</bold>
</p>

What I and any semi-sane person wants is (TBH, it's the online XML editor
that wants it):

<p>The rain in <bold>Spain</bold> <italic>is</italic> wet.</p>

Or

<p><bold>The rain in Spain is wet.</bold></p>

In some places the XML actually starts this way, but it's not consistent
at all.

One track I went down dead-ended at regular expressions not being able to
be constructed in a way that could return an empty string. Me, I'd have
been fine with the occasional empty string, because it would have been an
empty string of things I did not want, if that makes any sense (and it does
not).

Anyway, my attempt to get around that was to look at the first text node
and see if it started with spaces and if so to get rid of them:

    <xsl:template match="p/text()[1]">
        <xsl:choose>
            <xsl:when test="matches(.,'^\s+.*')">
                 <xsl:analyze-string select="." regex="^\s+(\S?.*)">
                    <xsl:matching-substring>
                        <xsl:value-of select="regex-group(1)"/>
                    </xsl:matching-substring>
                </xsl:analyze-string>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

And sure, I know that the first text node might in reality come after some
content in a child of <p>, but I was willing to cross that bridge when I
actually mangled some content. But for this template, I got a warning: "The
child axis starting at a text node node will never select anything", which
is rather dreary.

Anyway, I'm a little loopy with banging my head against this, but one way
or another, I'm missing this. I'm only treating the text node as a string,
not as a node with children, but apparently I only think that and I am
wrong, because the machine is smarter than I am.

Any help for how to get rid of the space at the beginning and end of
paragraphs without getting rid of the space between elements within the
paragraph would be appreciated.

Thanks!
Charles


Charles O'Connor l Business Systems Analyst
Pronouns: He/Him
Aries Systems Corporation l www.ariessys.com
50 High Street, Suite 21 l North Andover, MA l 01845 l USA


Main: +1 (978) 975-7570
Cell: +1 (802) 585-5655






-- 
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>