Re: Prior Instance of term in main text, before first glossary markup

Hi David,

At 02:20 AM 10/28/2004, you wrote:

I am marking up foreign words so that I can display the Japanese/Chinese
characters in my text, but only for only the first usage of a term, and
also to have the stylesheet generate a glossary at the end. Cutting and
pasting and rewriting, it is easy to forget if I have already "glossed" a
term or not, and I cannot figure out a correct xsl test to see if the term
is properly marked up the first time it is used. That is to say, I need to
test to see if the word has been used before (in document order) in the
source tree, before the first glossary markup.

Okay. Be on notice that XSLT 1.0 isn't industrial-strength on stringhandling; but it should be able to manage this okay particularly if yourinput documents aren't huge.

It seems like it should be easy to check with a simple <xsl:if test---> but
I do not seem to be able to get it to work, and I do not see anything
relevant in the archives.


The test may not be so simple, depending on the requirements. (See above.)

If I have this source, to make the smallest example:

<p>The Buddhist way is zazen. When first practicing sitting meditation
<mygloss><gr>zazen</gr><gk>==Japanese Characters==</gk></mygloss> select a
quiet place. </p>

And my stylesheet includes:

<xsl:template match="mygloss">

<xsl:if test="contains(ancestor::text(), gr) orcontains(preceding::text(), gr)">

       XXX ERROR-- NOT FIRST USAGE XXX
     </xsl:if>
<!----- rest  of template to put in italic, proper font, check for first
gloss markup of this word etc -->
</xsl:template>

The above test does not find and output the error of the earlier first use,

i.e. when I forgot to gloss it when I first used in the beginning of theparagraph (or anywhere earlier, to make the general case).

Right. You are falling into a couple of traps here. The second is reallysubtle.

The first (not so subtle) is that no nodes in the tree ever have ancestortext nodes, since text nodes are always "leaf" nodes -- at the "bottom" ofthe tree. So the first part of the test will never test true.

The second (the subtle one) is that the contains() function takes twostrings as input, and the XSLT 1.0 rule for converting a node-set to astring is to take the first node in the set, in document order. Taking thefirst text node from the set (preceding::text()) will always return thefirst text node in your document -- which is not the one you want to test.

I tried this test:

contains(ancestor::*, gr) or contains(preceding::*, gr)

But the test is always true, even if there is no prior use.

It is always true because the first element ancestor, in document order, ofthe matching mygloss element (i.e. the document element) always containsthe gr child of the same mygloss (as well as all the others).

This following test works if the word is not further marked up in the
paragraph, but of course it fails to look in end notes and such:

contains(ancestor::p/text(), gr) or contains(preceding::p/text(), gr)

Right. It's also prone to give you false hits sometimes -- if the firsttext node in the document inside a p happens to contain the string.

So, in sum, how do I do this properly? And of course, a kind explanation to
get me out of my confusion about the xpath and xsl would be very muchappreciated.

I think the root of the confusion is in knowing a bit about XPath nodetests and axes (hint: terms to look up :-), plus the rule I cited about howcontains() works and how a node set is converted to a string.

Really, it's this rule that is causing the trouble. You don't want yourcontains() to work on a *particular* text node, much less the first suchtext node in the document. Rather, you want it to work on an aggregation of*all* earlier text nodes.

You can get this -- you need it as a string -- by collecting the text nodesyou want into a variable:


<xsl:variable name="preceding-text">
  <xsl:copy-of select="preceding::text()"/>
  <!-- copies all preceding text nodes into a result-tree-fragment, where
       they are concatenated into one since there are no elements to
       keep them separate -->
</xsl:variable>

and then

<xsl:if test="contains($preceding-text, gr)">
  XXX ERROR-- NOT FIRST USAGE XXX
</xsl:if>
<!----- rest  of template to put in italic, proper font, check for first
gloss markup of this word etc -->

Note that this technique of aggregating the text nodes first allows quite abit of flexibility. Since the rule for converting a result tree fragmentinto a string, unlike that for a node set, is to take the text value of theentire fragment (not the text value of the first node in the fragment,since such fragments aren't transparent in XSLT 1.0), you can do thingsinside of that variable declaration, or in the select expression of thecopy-of, to refine how that string is made. So, for example,


<xsl:variable name="preceding-text">
  <xsl:copy-of select="preceding::text()[not(ancestor::note)]"/>
</xsl:variable>

... would leave out the text nodes that had a note ancestor from the set oftext nodes gathered together and then tested. (So you could mention "zazen"inside a note and not throw the error.)

I hope that explains it all sufficiently. Note that the preceding:: axiscan be expensive, so on large documents things may get slow. If performancesuffers, there's a trick that can be helpful: passing an aggregation ofearlier text nodes down through the templates as a parameter so it doesn'thave to be regathered all the time. Ask again if you need to see this.


Happy sitting!
Wendell


======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================