xsl-list
[Top] [All Lists]

Re: [xsl] why not match text()? (fork from "Novice Question - matching entire text children")

2010-12-20 18:07:18
I didn't save this group’s messages, but I remember someone saying on this list today that in 99.99 percent of all real-life stylesheets don’t use text() matching. (I’ll look that up tomorrow in the archives, but I think it was Wendell G. Carlisle (wink).

Being able to do upconversions is the single most valuable “macrofeature” of XSLT 2. In my daily life, text() is quite important:

find . -name \*xsl | xargs grep -ch 'text()' | sed -e 's/^/.+/' | bc | tail -1

yields the result: 2278. Try that in your home directory. Ok, there may be duplicates, for example stemming from multiply included svn externals or from generated stylesheets, or text() may be used outside of patterns, but this figure is somehow relevant. To attach a scale to this: there are `find . -name \*xsl | wc -l` = 1016 xsl files in my home dir. So approx. two text() per xsl.

-Gerrit

On 21.12.2010 00:40, Wendell Piez wrote:
Syd,

These all fall into the category of text processing or upconversion, not
tree processing. While XSLT 2.0 provides a nice set of tools for doing
this sort of thing, these are tasks that demonstrate the principle being
discussed precisely in the way they violate it.

That is, it's not the fact that it contains mixed content, but the fact
that your text isn't clean coming in, and has to be ameliorated as it
passes through, that makes these examples fall out.

It's also why in an advanced architecture we might also like to do this
kind of processing in separate transformations from formatting. Note
that these are things you often want to happen as data comes into your
data set, not as it goes out.

Cheers,
Wendell

On 12/20/2010 6:05 PM, Syd Bauman wrote:
Wendell is absolutely correct, a lot of learning goes on 'round here
just "watching over each other's shoulders", which I don't get to do
as much as I'd like.

Watching this conversation, I found myself interested in how many
experts say there is rarely a need to match text nodes. I do that
all the time. Some examples include:
* transforming straight quotes to curly quotes
* conditionally changing "-" to en-dash or em-dash
* convert PUA characters to<tei:g> elements
* deal with soft hyphens
* ditching the punctuation that follows a<list>
* finding the character before a footnote

So I'm guessing either
a) there's a better way for me to be doing these things, or
b) those experts who spoke of the rare need for matching text()
either don't deal with data like mine -- TEI, i.e., mostly mixed
content --, or deal with so much more of it that my needs for
matching text() are, in fact, pretty rare from their point of view

I'm hoping someone will post the better way if it's (a). :-)

--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>