Susan,
I'm sorry for the delay in responding. A large tree fell on my house
about 1 AM Tuesday morning and I have been away from work finding
a tree service and contractors, etc. It's quite a challenge.
wow, I can believe that... and I thought this stylesheet was quite a challenge!
I have been thinking about the sorting problem:
1. if a record doesn't have a title, we can look it up (by its doc-number),
let's call it "found-title"
2. the sort procedure should use the "found-title" rather than the actual title.
no: actually it should use the "found-title-without-stopwords".
3. the output shows the actual title (empty, if it's empty)
Problem: can't use variables or if-constructs because xsl:sort must be first
child of xsl:for-each. The solution so far uses "actual-title-without-stopwords"
(can be empty) by means of the "Becker method" [1]
<xsl:sort select="concat(substring(substring-after(.,' '), 0 div boolean
($stop-words[starts-with(translate(current(), $uppercase, $lowercase),
concat(translate(., $uppercase, $lowercase), ' '))])), substring(., 0 div
not
($stop-words[starts-with(translate(current(), $uppercase, $lowercase),
concat(translate(., $uppercase, $lowercase), ' '))])))"/>
I tried to put a "found-title" inside the xsl:sort select, but I couldn't make
it work.
The processor is Saxon but it's being called from within another application.
I do not believe I can do a two-step process.
But Saxon does support exsl:node-set [2] so it should be possible to generate a
temporary tree (pun not intended!!) and transform that in a second pass,
within one stylesheet. You could create a global variable with a structure
like
<sort-titles>
<title doc-number="53690">american artist</title>
<title doc-number="57769">american city & country</title>
<title doc-number="58345">american demographics</title>
<title doc-number="58615">forbes.</title>
</sort-titles>
and then use
<xsl:sort select="exsl:node-set($sort-titles)/*[(_at_)doc-number=$doc-number]"/>
Using exsl:node-set also means that you don't need the "Becker hack" anymore,
improving maintainability. Here's a stylesheet that sorts titles correctly:
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:sw="http://my.stopwords/sw"
extension-element-prefixes="exsl sw"
>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
/>
<sw:stop>
<sw:word>the</sw:word>
<sw:word>a</sw:word>
<sw:word>an</sw:word>
</sw:stop>
<xsl:variable name="stop-words"
select="document('')/xsl:stylesheet/sw:stop/sw:word"/>
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="sort-titles">
<xsl:for-each select="//section-02">
<xsl:if test="string(title)">
<title doc-number="{doc-number}">
<xsl:variable name="lower-title" select="translate(title, $uppercase,
$lowercase)"/>
<xsl:choose>
<xsl:when test="$stop-words[starts-with($lower-title, concat(translate(.,
$uppercase, $lowercase), ' '))]">
<xsl:value-of select="substring-after($lower-title,' ')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$lower-title"/>
</xsl:otherwise>
</xsl:choose>
</title>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>sort without stop
words</title></head><body>
<table border="1">
<tr>
<th>doc-number</th>
<th>title</th>
<th>description</th>
<th>arrival-date</th>
</tr>
<xsl:for-each select="//section-02/title">
<xsl:sort select="exsl:node-set($sort-titles)/*[(_at_)doc-number =
current()/../doc-number]"/>
<xsl:sort select="number(concat(substring(../arrival-date, 7,4),
substring(../arrival-date, 1,2), substring(../arrival-date, 4,2)))"
order="descending"/>
<tr>
<td><xsl:value-of select="../doc-number"/></td>
<td><xsl:value-of select="."/></td>
<td><xsl:value-of select="../description"/></td>
<td><xsl:value-of select="../arrival-date"/></td>
</tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>
Saxon 6.5.3 output:
doc-number title description arrival-date
53690 American Artist v.68:no.738(2004:Jan.) 02/26/2004
57769 v.119:no.3(2004:Mar.) 03/25/2004
57769 The American city & country v.119:no.1(2004:Jan.) 02/11/2004
58345 v.26:no.3(2004:Apr.) 04/12/2004
58345 v.26:no.2(2004:Mar.) 03/06/2004
58345 American demographics v.26:no.1(2004:Feb.) 02/05/2004
58615 v.173:no.5(2004:Mar.15) 03/15/2004
58615 v.173:no.2(2004:Feb. 02) 01/21/2004
58615 Forbes. v.173:no.1(2004:Jan. 12) 01/12/2004
The records without a title are sorted in their correct position, now.
One problem seems to remain: the titles tend to display in the last record,
rather than the first, because the dates are sorted descending. But that
shouldn't be too difficult to solve.
I wish this will work in your application... and I wish you strength and
all else you can use to solve the other tree challenge too!
Best regards
Anton Triest
[1] http://www.biglist.com/lists/xsl-list/archives/200008/msg00525.html
[2] http://exslt.org/exsl/functions/node-set/index.html