xsl-list
[Top] [All Lists]

[xsl] Calculating groups of repeating elements

2008-12-10 15:16:05
Hello,

I'm trying to calculate all of the groups of 2+ elements (in the sample data below, words) that appear together in more than one place. Ideally, I'd like to be able to sort descending both by length of group (5-word group, 4-word groups, etc), and by number of places the groups occur (100 places, 99 places, etc.) I also need to be able to list the place numbers where they occur.

I started doing it manually this way but the number of possible combinations quickly became too big a task:

<xsl:template match="/">
<xsl:value-of select="count(atlas/place/place_number[../words/word='Aa'] intersect atlas/place/place_number[../words/word='C'])"/>
</template>
(adding more "intersects" as necessary, and getting rid of the "count" to see the place numbers)

Here's a sample of the data. Almost every word appears in multiple places, but each appears only once in the index, which I've used in other applications for matching to avoid re-calculating stats for the word over and over. Any help would be wonderful!

<atlas>
<place>
<place_number>1</place_number>
<words>
<word>Aa</word>
<word>C</word>
<word>Qqq</word>
</words>
</place>

<place>
<place_number>2</place_number>
<words>
<word>Aa</word>
<word>Bbbb</word>
<word>C</word>
<word>W</word>
<word>Zz</word>
</words>
</place>

<place>
<place_number>3</place_number>
<words>
<word>Aa</word>
<word>C</word>
<word>Bb</word>
<word>Qqq</word>
<word>Wwww</word>
<word>Zz</word>
</words>
</place>

[etc]

<index>

<index_entry>
<underlying_word>A</underlying_word>
<word>A</word>
<word>Aa</word>
<word>Aaa</word>
</index_entry>

<index_entry>
<underlying_word>B</underlying_word>
<word>Bb</word>
<word>Bbbb</word>
</index_entry>

[etc]

</index>
</atlas>

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--