Recently I needed to merge several back-of-the-book indexes that were marked up
in XML. After experimenting a bit, I decided that given an appropriate
collation, the following sequence of (XSLT 3.0) xsl:sort instructions was an
adequate approximation of what indexers call the "letter-by-letter" style of
alphabetizing:
<xsl:sort select="replace(., '[\s\p{P}-[(,]]', '') ! replace(.,
',.*|\(.*','')"/>
<xsl:sort select="matches(., '^[^(]+,')"/>
<xsl:sort select="replace(., '[\s\p{P}-[(,]]+', '')"/>
If anyone wants to test it out with the examples that are used in the Chicago
Manual of Style to illustrate the system, I've put the full script, data, and
relevant chunk of the CMS up here: http://lister.ei.virginia.edu/~drs2n/alpha/ .
(Suggested refinements/improvements would be welcome.)
I spent a bit of time trying to figure out how one might implement the
word-by-word system (described at the above URL) using xsl:sort, but I'm not
sure it's possible--it seems that word-by-word would require a full-blown
recursive sorting routine. I'm happy to be proven wrong, though, by anyone who
has tackled this before or is cleverer than I am about such things.
David
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell(_at_)virginia(_dot_)edu Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--