HI, I need help finding resources (examples and/or XSL) for this
situation, for which I haven't found quite the right recipe in my
searches of the list archives.
Given an XML file containing a list of terms and another file
containing a mix of elements containing text (narrative content, some
inline markup for emphasis and footnotes), I was asked if I could
find occurrences of each term wherever it appeared in the narrative
content, and wrap each occurrence with a tag. So my first thought is
to load up each document into a variable. But then I don't know what
the most effective method of string comparison would be, given that
the narrative document might have the term's words with different
capitalization. If anyone can point me in the right direction, I'd
appreciate it. Also I would like to know if there is a practical
limit to how large a narrative file I can run with about 150 terms to
find in the text. And if a different approach would work better,
such as writing Java to do brute force search and replace, please
tell me so. (I work with a Java programmer. Everything looks like a
Java problem to her and an XSL problem to me.)
-- Dorothy
Note: Using Saxon B 9.1.0.7. I just made up a set of terms and a bad
sentence as an example.
Example of terms (indexTerms.xml):
<?xml version="1.0" encoding="UTF-8"?>
<terms>
<term index1="anxiety">Anxiety</term>
<term index1="children">Children</term>
<term index1="children" index2="illness">Children, illness</term>
<term index1="children" index2="nightmare">Children, nightmare</term>
<term index1="cure" index2="fever">Cure fever</term>
<term index1="cure" index2="illness">Cure illness</term>
<term index1="anxiety" index2="nightmare">Nightmare</term>
<term index1="children" index2="illness">Sick children</term>
<term index1="anxiety" index2="phobia">Worries, phobias and anxiety</term>
</terms>
Example of narrative (sampleTopic.xml):
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN"
"http://docs.oasis-open.org/dita/v1.1/OS/dtd/topic.dtd">
<topic id="sampleTopic">
<title>sampleTopic</title>
<body>
<p>markup for sample terms testing a set of phrases to match to
the content of index terms:</p>
<p>Texttexttext text some of the terms are already in <ph>
i.e. <ph id="cure_fever">curing fever</ph>, <ph
id="children_illness">sick children</ph> and sometime the same terms
occur, <i>but different case</i>, not in a ph: Curing fever and
<b>Sick children</b>. I need to get all the occurrences of each of
the term element strings marked up with <ph> </p>
</body>
</topic>
Desired result:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN"
"http://docs.oasis-open.org/dita/v1.1/OS/dtd/topic.dtd">
<topic id="sampleTopic">
<title>sampleTopic</title>
<body>
<p>markup for sample terms testing a set of phrases to match to
the content of index terms:</p>
<p>Texttexttext text some of the terms are already in <ph>
i.e. <ph id="cure_fever">curing fever</ph>, <ph
id="children_illness">sick children</ph> and sometime the same terms
occur, <i>but different case</i>, not in a ph: <ph
id="cure_fever">Curing fever</ph> and <b><ph
id="children_illness">Sick children</ph></b>. I need to get all the
occurrences of each of the term element strings marked up with <ph> </p>
</body>
</topic>
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:param name="indexFile">indexTerms.xml</xsl:param>
<xsl:param name="textFile">sampleTopic.xml</xsl:param>
<xsl:variable name="termsDocument"
select="document($indexFile)"></xsl:variable>
<xsl:variable name="textDocument" select="document($textFile)"></xsl:variable>
<xsl:template match="*" name="test1"><xsl:result-document
href="matchText-test.xml" method="xml">
<!-- proof that I can get the terms -->
<xsl:text> </xsl:text><xsl:comment><xsl:text>first term is
</xsl:text><xsl:value-of select="$termsDocument/terms/term[1]"/></xsl:comment>
<xsl:text> </xsl:text><xsl:comment><xsl:text>second term is
</xsl:text><xsl:value-of select="$termsDocument/terms/term[2]"/></xsl:comment>
<xsl:text> </xsl:text><xsl:comment><xsl:text>third term is
</xsl:text><xsl:value-of select="$termsDocument/terms/term[3]"/></xsl:comment>
<!-- now how to I find them in the $textDocument elements and mark them up? -->
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--