xsl-list
[Top] [All Lists]

Re: [xsl] Calculating groups of repeating elements

2008-12-11 12:59:04
Wendell Piez schrieb:

It seems to me that if you are wanting to collect groups of 2+ words
that appear in 2+ places, a useful first step would be to collect the
set of intersections of words occuring in every pairing of places.
This would be a large number, n(n-1)/2 for n places, but not the huge
exponent of 2 cited by Michael, and hence possibly a more direct route
to your goal.

Great! This looks like a much more useful approach to the problem!

yields this result:

<?xml version="1.0" encoding="UTF-8"?>
<collection>
   <common_words>
      <place_number>2</place_number>
      <place_number>1</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
      </words>
   </common_words>
   <common_words>
      <place_number>3</place_number>
      <place_number>1</place_number>
      <words>
         <word>Aa</word>
         <word>C</word>
         <word>Qqq</word>
      </words>
   </common_words>
   [...]

Now, generating all interesting subsets of "words/word/string()" can be
done far more efficiently, as the input sets are probably *much* smaller
on average.

While this isn't quite what you want, the results you want could be
derived by grouping these lists further, skipping pairings that
contain less than two 'word' elements, and collecting together those
have have the same sets (and thus represent sets of words that occur
in more than two places).

Yes. But I think you must still generate the subsets, because if you
have, say, three occurrences of (a,b,c) and two of (a,b,d), you have
five occurrences of (a,b), which is interesting, if my understanding of
the requirement is correct.

This continues to be interesting.

Michael Ludwig

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--