Hi David,
At 02:20 AM 10/28/2004, you wrote:
I am marking up foreign words so that I can display the Japanese/Chinese
characters in my text, but only for only the first usage of a term, and
also to have the stylesheet generate a glossary at the end. Cutting and
pasting and rewriting, it is easy to forget if I have already "glossed" a
term or not, and I cannot figure out a correct xsl test to see if the term
is properly marked up the first time it is used. That is to say, I need to
test to see if the word has been used before (in document order) in the
source tree, before the first glossary markup.
Okay. Be on notice that XSLT 1.0 isn't industrial-strength on string
handling; but it should be able to manage this okay particularly if your
input documents aren't huge.
It seems like it should be easy to check with a simple <xsl:if test---> but
I do not seem to be able to get it to work, and I do not see anything
relevant in the archives.
The test may not be so simple, depending on the requirements. (See above.)
If I have this source, to make the smallest example:
<p>The Buddhist way is zazen. When first practicing sitting meditation
<mygloss><gr>zazen</gr><gk>==Japanese Characters==</gk></mygloss> select a
quiet place. </p>
And my stylesheet includes:
<xsl:template match="mygloss">
<xsl:if test="contains(ancestor::text(), gr) or
contains(preceding::text(), gr)">
XXX ERROR-- NOT FIRST USAGE XXX
</xsl:if>
<!----- rest of template to put in italic, proper font, check for first
gloss markup of this word etc -->
</xsl:template>
The above test does not find and output the error of the earlier first use,
i.e. when I forgot to gloss it when I first used in the beginning of the
paragraph (or anywhere earlier, to make the general case).
Right. You are falling into a couple of traps here. The second is really
subtle.
The first (not so subtle) is that no nodes in the tree ever have ancestor
text nodes, since text nodes are always "leaf" nodes -- at the "bottom" of
the tree. So the first part of the test will never test true.
The second (the subtle one) is that the contains() function takes two
strings as input, and the XSLT 1.0 rule for converting a node-set to a
string is to take the first node in the set, in document order. Taking the
first text node from the set (preceding::text()) will always return the
first text node in your document -- which is not the one you want to test.
I tried this test:
contains(ancestor::*, gr) or contains(preceding::*, gr)
But the test is always true, even if there is no prior use.
It is always true because the first element ancestor, in document order, of
the matching mygloss element (i.e. the document element) always contains
the gr child of the same mygloss (as well as all the others).
This following test works if the word is not further marked up in the
paragraph, but of course it fails to look in end notes and such:
contains(ancestor::p/text(), gr) or contains(preceding::p/text(), gr)
Right. It's also prone to give you false hits sometimes -- if the first
text node in the document inside a p happens to contain the string.
So, in sum, how do I do this properly? And of course, a kind explanation to
get me out of my confusion about the xpath and xsl would be very much
appreciated.
I think the root of the confusion is in knowing a bit about XPath node
tests and axes (hint: terms to look up :-), plus the rule I cited about how
contains() works and how a node set is converted to a string.
Really, it's this rule that is causing the trouble. You don't want your
contains() to work on a *particular* text node, much less the first such
text node in the document. Rather, you want it to work on an aggregation of
*all* earlier text nodes.
You can get this -- you need it as a string -- by collecting the text nodes
you want into a variable:
<xsl:variable name="preceding-text">
<xsl:copy-of select="preceding::text()"/>
<!-- copies all preceding text nodes into a result-tree-fragment, where
they are concatenated into one since there are no elements to
keep them separate -->
</xsl:variable>
and then
<xsl:if test="contains($preceding-text, gr)">
XXX ERROR-- NOT FIRST USAGE XXX
</xsl:if>
<!----- rest of template to put in italic, proper font, check for first
gloss markup of this word etc -->
Note that this technique of aggregating the text nodes first allows quite a
bit of flexibility. Since the rule for converting a result tree fragment
into a string, unlike that for a node set, is to take the text value of the
entire fragment (not the text value of the first node in the fragment,
since such fragments aren't transparent in XSLT 1.0), you can do things
inside of that variable declaration, or in the select expression of the
copy-of, to refine how that string is made. So, for example,
<xsl:variable name="preceding-text">
<xsl:copy-of select="preceding::text()[not(ancestor::note)]"/>
</xsl:variable>
... would leave out the text nodes that had a note ancestor from the set of
text nodes gathered together and then tested. (So you could mention "zazen"
inside a note and not throw the error.)
I hope that explains it all sufficiently. Note that the preceding:: axis
can be expensive, so on large documents things may get slow. If performance
suffers, there's a trick that can be helpful: passing an aggregation of
earlier text nodes down through the templates as a parameter so it doesn't
have to be regathered all the time. Ask again if you need to see this.
Happy sitting!
Wendell
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================