xsl-list
[Top] [All Lists]

Re: Recursive grouping - simple, XSLT 1.0, fast non-Muenchian grouping method

2004-12-20 14:27:35
If I could throw in a quick curve ball to this conversation.  It seems
that what Surgiu has worked out is that by using variables to match
more than just the value of the attribute or element (e.g. element
name, for example, when using such information to determine proper
mapping to another XML file, similar to that of aspect weaving.)

Surgiu, you seem like a bright guy who would enjoy wrapping his teeth
around  a concept such as bringing together any number of external
data sources and weaving them together using commonalities between
these sources, whether through Regular Expressions (WooHoo XSLT 2.0!
:) or mapping of simpler comarisons using the translate() and
contains() functions to search for a particular straight string of
characters.  AspectXML is a project I worked on with one of my very
best friends Russ Miles who lives over in the UK (for now... just got
engaged to his LA based girlfriend so we'll see where he ends up)...  
While I think the concept is really cool and the code is Ok (refering
to the XSLT that I wrote and not the Java that Russ wrote... his Java
is near to perfect, reference his new release from Oreilly >
http://www.aspectjcookbook.com to see what I mean :) and while it has
technically been recommended as a production ready product by AOSD.net
there's still a TON more to do with it such as all the schema work.
You can access it at http://www.AspectXML.org and if you have a chance
to do so I would be interested to hear your take on how we could
improve it to incorporates some of your thoughts and ideas on this
topic, as this is where it seems you really seem to be headed and it
seems you have some good ideas on how to make that type of data
grouping really, really, fast...

I'll look forward to hearing from you if you have a chance to take a
look at it... you can come find me over at my blog, xsltblog.com or
post back to the list.  Ive been heads down for over a week and havent
even so much as opened up any email other than my Gmail account during
that time... but with the holidays here I hope to slow down so I will
keep an eye out for you here as well.

Best regards to you!

<M:D/>


On Mon, 20 Dec 2004 21:03:25 -0000, Michael Kay <mike(_at_)saxonica(_dot_)com> 
wrote:
The presented algorithm works much faster than Muenchian
grouping without xsl:key

The use of xsl:key is essential to Muenchian grouping, so there seems to be
some misunderstanding here.


The advantage of recursive grouping is that it is possible to define
grouping rules that are much more complex than one accepted by the use
attribute of xsl:key or by generate-id() function. For
example if in a list
of items that have attributes price and quantity the grouping
should be done
by price*quantity.

Muenchian grouping makes it easy to group on the result of any path
expression, e.g.

<xsl:key name="m" match="order-line" group="@price * @qty"/>

If there is a limitation, it is that Muenchian grouping is inconvenient when
the node-set to be grouped is anything other than "all nodes in a single
document that match pattern P": that is, it's inconvenient when the
population spans multiple documents, or when it is scoped to a particular
subtree within a document.


A recoursive grouping with xsl:key is presented below.


The downside here is:

    <xsl:with-param name="list"
select="$list[not(@author=$group-identifier)]"/>

which performs a serial search of the grouping population (actually, on
average, half the population) once for each distinct value of the grouping
key. The algorithm therefore has order (at least) O(P*G) where P is the size
of the population and G the number of groups. In applications where the
number of groups increases with the population (e.g. grouping employees by
surname) this is effectively O(N^2).

I agree that the algorithm is viable in cases where the number of groups is
small and almost fixed, e.g. grouping sales by continent.

But of course if the number of groups is completely fixed, the simplest
approach is:

for-each continent
  for-each P[continent=current()]


Michael Kay
http://www.saxonica.com/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




-- 
:: M. David Peterson ::
XML & XML Transformations, C#, .NET, and Functional Languages Specialist

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>