xsl-list
[Top] [All Lists]

Re: Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method

2004-12-21 13:30:18
Cheers Sergiu,

This is a very well stated response and follow-up post.  If one thing
is for sure the members of the XSL community and specifically those
who post often to this list tend to have a fondness to towards those
in whom make a concentrated effort at pushing the XSLT envelope.  I
hope you continue to do so as it is pushing the envelope that brings
innovation and new ideas while also showcasing new problems to be
solved that may not have been realized or considered or in some cases
specifically designed and optimized for one purpose knowing full well
that it wasnt a general purpose solution.  The world of XML data is
small in comparison to what it will be in the next 5 years as RDF and
the Semantic Web become less of a vision and much more a reality.

The Semantic Web allows this amazing framework of interconnected data
sources that allow complex relationship descriptions to be created via
mechanisms of both man and machine...  I believe it is important for
us to note that ideas generally come from real recognition of need,
necessity being the mother of all invention.  What problems are solved
now with one mechanism may or may not be appropriate in a world where
interconnection of disconnected data sources will be brought together
and woven into amazing structures of complex relationships made simple
by those in whom take it upon themselves to find new ways to make the
old ways better or new ways to solve the yet unforeseen problems that
we didn't realize would even be problems when the current solutions
were developed.  Nearly all solutions have come from the same muse in
which we all have come to know and love: trial and error, failure and
success, tragedy and triumph.  Its the way of the hacker and its the
wave of the future just as it is a reflection of the pioneers of our
past.

Keep pushing the envelope!

<M:D/>


On Tue, 21 Dec 2004 19:54:26 +0200, Sergiu Ignat 
<sergiu(_dot_)ignat(_at_)dyve(_dot_)com> wrote:
Hello everyone.
Many thanks to all the people that reviewed this algorithm. Special thanks
to Dimitre Novachev for performed tests.

From: "Michael Kay" <mike(_at_)saxonica(_dot_)com>
The use of xsl:key is essential to Muenchian grouping, so there seems to
be
some misunderstanding here.
Muenchian grouping without key is presented in
http://www.jenitennison.com/xslt/grouping/muenchian.html and the example is
contact[not(surname = preceding-sibling::contact/surname)]
It is of course a very slow method.

The recursion grouping algorithm was born from ignorance. I didn't use
xsl:key because I was not aware of its implementation. I thought it is
evaluated every time key() function is called. I did not know that it
creates an index, so I did not expect any performance improvment. I just
tried to find a faster algorithm, and it works faster than Muenchian without
xsl:key .

Howether, later M. David Peterson found it usefull to group data from
external data sources. This is indeed a problem hard to solve using
Muenchian method (or maybe I am wrong). Wendell Piez found it easy to
understand for the newbie and fast enough for many real life situations.
There may be also cases when grouping rules are too difficult to described
using xsl:key, and recursion grouping will be an easyer solution.

So I hope it will find its niche and will be useful for many people.

Best regards,
Sergiu Ignat



The advantage of recursive grouping is that it is possible to define
grouping rules that are much more complex than one accepted by the use
attribute of xsl:key or by generate-id() function. For
example if in a list
of items that have attributes price and quantity the grouping
should be done
by price*quantity.

Muenchian grouping makes it easy to group on the result of any path
expression, e.g.

<xsl:key name="m" match="order-line" group="@price * @qty"/>

If there is a limitation, it is that Muenchian grouping is inconvenient
when
the node-set to be grouped is anything other than "all nodes in a single
document that match pattern P": that is, it's inconvenient when the
population spans multiple documents, or when it is scoped to a particular
subtree within a document.


A recoursive grouping with xsl:key is presented below.


The downside here is:

    <xsl:with-param name="list"
select="$list[not(@author=$group-identifier)]"/>

which performs a serial search of the grouping population (actually, on
average, half the population) once for each distinct value of the grouping
key. The algorithm therefore has order (at least) O(P*G) where P is the
size
of the population and G the number of groups. In applications where the
number of groups increases with the population (e.g. grouping employees by
surname) this is effectively O(N^2).

I agree that the algorithm is viable in cases where the number of groups
is
small and almost fixed, e.g. grouping sales by continent.

But of course if the number of groups is completely fixed, the simplest
approach is:

for-each continent
  for-each P[continent=current()]


Michael Kay
http://www.saxonica.com/


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




-- 
:: M. David Peterson ::
XML & XML Transformations, C#, .NET, and Functional Languages Specialist

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--