Create a variable that contains the element counts for each document,
something like:
<xsl:variable name="foo">
<xsl:for-each select="collection()">
<doc name="{document-uri()}">
<xsl:for-each-group select="//*" group-by="name()">
<elem name="{current-grouping-key()}"
count="{count(current-group())}"/>
</xsl:for-each-group>
</doc>
</
</
That will give you:
<doc name="doc1.xml">
<elem name="foo" count="20"/>
<elem name="bar" count="44"/>
</doc>
<doc name="doc2.xml">
<elem name="baz" count="1"/>
...
Then just use grouping again to generate the report.
cheers
andrew
On 01/02/2008, James Cummings <cummings(_dot_)james(_at_)gmail(_dot_)com> wrote:
Hiya,
I'm using the collection() function and Saxon to produce some
statistics about how many of which elements of which type in a
particular set of documents.
Let's say that document one has something like:
<p xml:id="doc1" type="hypothetical">
There is some text with <seg type="foo">some foo</seg> and
occasionally <seg type="blort">blort</seg> and <other
type="wibble">wibble</other></p>
and document two (and up to some really large number) is like:
<p xml:id="doc2">
There is another doc with <seg type="foo">some foo</seg> and
occasionally <seg type="notBlort">notBlort</seg> and <other
type="fluffy">fluffy other</other> and <some
name="thing">someThing</some></p>
What I want to produce are tables of counts of specific elements, by
document and type. So something like the following (though using
table/row/cell xml markup):
table: other
document | fluffy | wibble | stuff
doc1 | 0 | 1 | 0
doc2 | 1 | 0 | 0
doc3 | 20 | 12 | 54
table: seg
document | blort | foo | notBlort
doc1 | 1 | 1 | 0
doc2 | 0 | 1| 1
doc3 | 23 | 44 | 58
table: some
document | thing | else | now
doc1 | 0 | 0 | 0
doc2 | 1 | 0 | 0
doc3 | 12 | 5 | 24
I can build this manually (and for one element I have done so) by doing:
<xsl:variable name="docs" select="collection('../../working/xml/docs.xml')"/>
<xsl:template name="main">
<table><head>seg by type</head>
<row rend="label">
<cell>document</cell>
<cell>blort</cell>
<cell>foo</cell>
<cell>notBlort</cell>
</row>
<xsl:for-each select="$docs//p"> <!-- let's pretend p is the root element -->
<row>
<xsl:variable name="doc" select="@xml:id"/>
<cell><xsl:value-of select="$doc"/></cell>
<cell><xsl:value-of select="count(.//seg[(_at_)type='blort'])</cell>
<cell><xsl:value-of select="count(.//seg[(_at_)type='foo'])</cell>
<cell><xsl:value-of select="count(.//seg[(_at_)type='notBlort'])</cell>
</row>
</xsl:for-each>
</table>
</xsl:template>
But that isn't really the point now is it? I tried to use <xsl:key>
but I ran into the problem of it not liking the collection() function
as part of the match.
What I want to do is be able to say for-each doc, build me a table of
all the (let's pretend unknown) values of this attribute on this
element. So something like:
<xsl:for-each select="$docs//p">
<xsl:value-of select="my:function(other/@type, seg/@type, thing/@name,
new/@type)"/>
</xsl:for-each>
and without knowing the values of @type in advance it makes a table
like above of them (using distinct-values()?) and counting their
occurrences.
This is a case where I know it must be possible, and I could just go
and do it manually, (in reality there are about 10 elements with a
number of attributes, with around 20 values each), but it just seems
*wrong* to do it that way. ;-)
Suggestions?
Thanks,
-James
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--