xsl-list
[Top] [All Lists]

Re: [xsl] Calculating groups of repeating elements

2008-12-10 19:42:51
Hi Ken,

Thanks for looking at this! Sorry for not being as clear as I could've been with what I'm looking for. For the example data set, I'm trying to automatically generate an output something like this:

Aa + C + Qqq: 2 places (1, 3)
Aa + C: 3 places (1, 2, 3)
Aa + Zz: 2 places (2, 3)
C + Qqq: 2 places (1, 3)

So it lists all the groups of 2+ words that appear together in 2+ places. This list is sorted by length of the group (3 words is the maximum number of words that occurs in 2+ places in the sample data), but it'd be nice to also be able to sort by number of places:

Aa + C: 3 places (1, 2, 3)
Aa + Zz: 2 places (2, 3)
etc.

I was using intersects to get places with Aa AND C AND Qqq (<xsl:value-of select="count(atlas/place/place_number[../words/word='Aa'] intersect atlas/place/place_number[../words/word='C'])"/>), but got overwhelmed by the number of ways I'd have to plug in all the different words to go through the data exhaustively-- the real data has 250+ places and 75+ words.



G. Ken Holman wrote:
At 2008-12-10 14:15 -0600, Quinn Dombrowski wrote:
I'm trying to calculate all of the groups of 2+ elements (in the sample data below, words) that appear together in more than one place. Ideally, I'd like to be able to sort descending both by length of group (5-word group, 4-word groups, etc), and by number of places the groups occur (100 places, 99 places, etc.) I also need to be able to list the place numbers where they occur.

You don't show how these places are to be listed, so I guessed.

I started doing it manually this way but the number of possible combinations quickly became too big a task:

<xsl:template match="/">
<xsl:value-of select="count(atlas/place/place_number[../words/word='Aa'] intersect atlas/place/place_number[../words/word='C'])"/>
</template>
(adding more "intersects" as necessary, and getting rid of the "count" to see the place numbers)

Not sure where you are going with the intersects, so I approached this as a grouping problem.

Here's a sample of the data. Almost every word appears in multiple places, but each appears only once in the index, which I've used in other applications for matching to avoid re-calculating stats for the word over and over. Any help would be wonderful!

I hope the code below helps, though I am a bit unclear on what you want so my comments should reveal what I think you want.

. . . . . . .  Ken


T:\ftemp>type quinn.xml
<atlas>
<place>
<place_number>1</place_number>
<words>
<word>Aa</word>
<word>C</word>
<word>Qqq</word>
</words>
</place>

<place>
<place_number>2</place_number>
<words>
<word>Aa</word>
<word>Bbbb</word>
<word>C</word>
<word>W</word>
<word>Zz</word>
</words>
</place>

<place>
<place_number>3</place_number>
<words>
<word>Aa</word>
<word>C</word>
<word>Bb</word>
<word>Qqq</word>
<word>Wwww</word>
<word>Zz</word>
</words>
</place>
</atlas>

T:\ftemp>type quinn.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="2.0">

<xsl:output indent="yes"/>

<!--keep track for counting purposes-->
<xsl:key name="words" match="word" use="substring(.,1,1)"/>

<xsl:template match="atlas">
  <!--process the document element as is-->
  <xsl:next-match/>
  <!--add an index at the end-->
  <index>
    <!--basing the "underlying word" as the first character-->
    <xsl:for-each-group select="//word" group-by="substring(.,1,1)">
      <!--sort descending by the number of words in the group-->
      <xsl:sort select="count(key('words',substring(.,1,1)))"
                order="descending"/>
      <!--sort descending by the number of places for the word group-->
      <xsl:sort select="count(key('words',substring(.,1,1))/../..)"
                order="descending"/>
      <!--create the index entry for the word group-->
      <index_entry>
        <!--embed some diagnostics-->
        <xsl:comment select="current-grouping-key(),'=',
                             'Words:',count(current-group()),
                             'Places:',count(current-group()/../..)"/>
        <xsl:text>
</xsl:text>
        <!--what underlying word are we at?-->
        <underlying_word>
          <xsl:value-of select="current-grouping-key()"/>
        </underlying_word>
        <!--which words are related?-->
        <xsl:for-each-group select="current-group()" group-by=".">
          <word><xsl:value-of select="."/></word>
        </xsl:for-each-group>
        <!--where are these words used?-->
        <places>
          <xsl:for-each select="current-group()/../..">
            <place><xsl:value-of select="place_number"/></place>
          </xsl:for-each>
        </places>
      </index_entry>
    </xsl:for-each-group>
  </index>
</xsl:template>

<xsl:template match="@*|node()"><!--identity for all other nodes-->
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>
T:\ftemp>call xslt2 quinn.xml quinn.xsl quinn.out

T:\ftemp>type quinn.out
<?xml version="1.0" encoding="UTF-8"?>
<atlas>
   <place>
      <place_number>1</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
         <word>Qqq</word>
      </words>
   </place>

   <place>
      <place_number>2</place_number>
      <words>
         <word>Aa</word>
         <word>Bbbb</word>
         <word>C</word>
         <word>W</word>
         <word>Zz</word>
      </words>
   </place>

   <place>
      <place_number>3</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
         <word>Bb</word>
         <word>Qqq</word>
         <word>Wwww</word>
         <word>Zz</word>
      </words>
   </place>
</atlas>
<index>
   <index_entry><!--A = Words: 3 Places: 3-->
<underlying_word>A</underlying_word>
      <word>Aa</word>
      <places>
         <place>1</place>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--C = Words: 3 Places: 3-->
<underlying_word>C</underlying_word>
      <word>C</word>
      <places>
         <place>1</place>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--Q = Words: 2 Places: 2-->
<underlying_word>Q</underlying_word>
      <word>Qqq</word>
      <places>
         <place>1</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--B = Words: 2 Places: 2-->
<underlying_word>B</underlying_word>
      <word>Bbbb</word>
      <word>Bb</word>
      <places>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--W = Words: 2 Places: 2-->
<underlying_word>W</underlying_word>
      <word>W</word>
      <word>Wwww</word>
      <places>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
   <index_entry><!--Z = Words: 2 Places: 2-->
<underlying_word>Z</underlying_word>
      <word>Zz</word>
      <places>
         <place>2</place>
         <place>3</place>
      </places>
   </index_entry>
</index>


--
Upcoming XSLT/XSL-FO, UBL and code list hands-on training classes:
:  Sydney, AU 2009-01/02; Brussels, BE 2009-03; Prague, CZ 2009-03
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video sample lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg
Video course overview:  http://www.youtube.com/watch?v=VTiodiij6gE
G. Ken Holman                 mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--