xsl-list
[Top] [All Lists]

RE: [xsl] How To Calculate Set of Unique Values Across a Tree of Input Documents

2008-03-21 12:11:58
There was a recent thread on processing graphs in XSLT 2.0, see

http://markmail.org/message/tlletsiznepd5no6

I provided a (sketch of a) solution that involved listing all the paths
starting at a given node (while avoiding looping in the event of a cycle); a
simple adaptation of that will give you all the nodes reachable from a given
node. In your case the node identifiers can be obtained using
document-uri(); you then simply need to apply distinct-values() to the
returned set of URIs.

Michael Kay
http://www.saxonica.com/ 

-----Original Message-----
From: Eliot Kimber [mailto:ekimber(_at_)reallysi(_dot_)com] 
Sent: 21 March 2008 18:52
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] How To Calculate Set of Unique Values Across a 
Tree of Input Documents

I have a tree of DITA map documents where each map references 
zero or more other map or topic documents. The same map or 
topic could be referenced multiple times.

I need to calculate the "bounded object set" of unique 
documents referenced from within the compound map so that I 
can then use an XSLT process to create new copies of each 
document. Since I can't write to a given result more than 
once I have to first remove any duplicates.

Each target document is referenced by a relative URI that can 
be different for different references to the same file (and 
in fact will almost always be different in my particular data set).

I am using XSLT 2.

Because key() tables are bound to input documents I don't 
think I can build a table of references indexed by target 
document URI (that is, the absolute URI of the target of the 
reference). If I could I would simply build that table and 
then just process the first member of each entry.

I can't think of any other efficient way to approach this. 
The best idea I can come up with is to build an intermediate 
document that reflects each document reference and then use 
something like for-each-group on that to treat it as a set 
for the purpose of processing each referenced file exactly 
once. If I build a flat list of elements containing the 
document URI of each reference I can easily sort the values 
and then remove duplicates. So maybe that's as efficient as 
anything else would be.

My other challenge is that my input data set is very large so 
I have the potential to run into memory issues, so it may be 
that writing out an intermediate file as part of a 
multi-stage, multi-transform pipeline is
  the best process, but my current processor will handle the 
entire data set in one process for the purpose of applying 
the (mostly) identity transform to the map set.

Can anyone suggest other solution approaches to this problem?

Once again I feel like I might be missing a clever solution 
hidden in the haze of my XSLT 1 brain damage.

Thanks,

Eliot

--
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>