You can strip the accents by unicode decomposition and then removing the
diacritical marks:
<xsl:for-each-group select="index-0"
group-by="substring(
upper-case(
replace(
normalize-unicode(heading, 'NFKD'),
'[̀-ͯ]',
''
)
), 1, 1
)">
<xsl:sort select="current-grouping-key()"/>
When writing the group (= starting letter) to an output file further
down in you template, you should sort it according to the upper-case(…)
part as first sort key, then according to the actual heading as a second
(tie-breaker) sort key.
So it’s best to make a function (call it, e.g., my:sortkey) out of
upper-case(…).
In that function, you can also do other useful stuff, such as
eliminating stop words or replacing all numbers with a zero, so that
everything that starts with a number will be in the same group.
Gerrit
On 2012-04-21 02:03, Graydon wrote:
So I've got an XML index file, which is too large for some downstream
processing to be entirely pleased with. The requirement is to split the
file up, grouping index entries (index-0 elements; the index element is
the overall container element) by the first character of their child
heading element.
Using XSLT 2.0, this is pretty easy:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet exclude-result-prefixes="xs xd" version="2.0"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/wkna-shared-cms/index">
<xsl:for-each-group group-by="substring(heading,1,1)" select="index-0">
<xsl:sort select="./heading"/>
<xsl:result-document
href="eitaindex+Topical_Index_{current-grouping-key()}.xml">
<wkna-shared-cms>
<index area="{/wkna-shared-cms/index/@area}"
xml:lang="{/wkna-shared-cms/index/@xml:lang}">
<num cite="Topical Index {current-grouping-key()}">
<xsl:sequence select="current-grouping-key()"/>
</num>
<xsl:copy-of select="/wkna-shared-cms/index/index-metadata"/>
<xsl:copy-of select="current-group()"/>
</index>
</wkna-shared-cms>
</xsl:result-document>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
The problem is that some of the initial characters of the headings have
accents, and it's desired that the accented characters and the
unaccented characters group together, so that E and É and Ê, etc. all
group together in a group with a current-grouping-key() of "E".
I can imagine doing this in a painful way with conditional statements
and an exhaustive list of characters, but I'm hoping someone can tell me
there's a better way.
Thanks!
-- Graydon
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--