I have a built a custom collation and there are a number of
multigraphs in the language I am working in. Here is a sampling of the
sort sequence (minus non-ASCII characters) from the java collation
class.
("='-';'=';'*' " + /** -,=,* are used to indicate various types of
affixes and clitics. These should be ignored.*/
"< a,A " +
"< '''a,'''A " + /** 'a,'A*/
"< aa,Aa " +
"< b,B " +
"< c,C " +
"< d,D " +
"< dz,Dz " +
"< e,E " +
"< '''e,'''E " + /** 'e,'E*/
"< ee,Ee " +
"< f,F " +
"< g,G " +
"< gw,Gw " +
"< gy,Gy " +
"< h,H " +
"< i,I " +
"< '''i,'''I " + /** 'i,'I*/
"< ii,Ii " +
"< k,K " +
"< k'''K''' " + /** k',K'*/
"< kw,Kw " +
"< ky,Ky " +
"< k'''w,K'''w " + /** k'w,K'w */
"< k'''y,K'''y " + /** k'y,K'y */
"< l,L " +
etc.
"< '''y,'''Y ")
Desired output is something like this:
a,A
**********
-ana
atata
'a,'A
**********
'ap
'atata
etc.
k,K
**********
kaba
kopii
ks=
-ks
ksa
k',K'
*********
k'aba
k'ol
kw,kW
*********
kwduun
kwtaxs
k'w,K'w
*********
k'was
k'wiss
kwiloolag
The source XML structure for each entry looks like this:
<dictionary>
<entry>
<lexical-unit>
<form lang="tsi"><text>kaba=</text></form>
</lexical-unit>
<trait name="morph-type" value="proclitic"/>
<sense>
<grammatical-info value="prenominal"/>
<gloss lang="en"><text>small</text></gloss>
</sense>
</entry>
<!--more entries ....->
</dictionary>
Any suggestions as to how to most efficiently group the data according
to the parameters of the custom collation?
Currently, I manually build a regular expression, putting the largest
multigraphs first so that the greedy regex parser chooses the longest
matching string. I use this with xsl:analyze-string to add
@alphaGroupKey to each entry as shown below.
<xsl:attribute name="alphaGroupKey">
<xsl:analyze-string select="lexical-unit/form[(_at_)lang='tsi']/text"
regex="^[-=]*((aa|Aa)|(a|A)|(kw|Kw)|(ky|Ky)|(k|K)|(ḵ|Ḵ))"
default-collation="http://saxon.sf.net/collation?class=com.lhtrees.xslt.LangXCollation;">
<xsl:matching-substring>
<xsl:analyze-string select="." regex="[^-=\*]+$">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:attribute>
I can then do the grouping of entries using for-each-group with the
attribute alphaGroupKey.
But I am wondering if there is a pre-existing way to use the custom
collation class to do the grouping so I don't need to build the regex
string. It seems like all of the information that is needed is already
in that class.
Larry
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--