xsl-list
[Top] [All Lists]

[xsl] Collect word count with xslt2.0 on saxon 8

2006-05-15 16:48:39

I have the following structure that i need to collect
word counts for from each element that has a class
attribute that contains " topic/topic " 
without counting its child elements that also contain
the the class attribute " topic/topic " 



root>
    <topic class=" topic/topic foo/bar ">
        <p> communications and information theory</p>
        <title> top element</title>
        <relinfo> elements can be nested</relinfo> 
        Generalized Markup Language defined by ISO
8879.
            <concept class=" topic/topic foo/bar ">
            <p> communications and information
theory</p>
            <title> top element</title>
            <relinfo> elements can be nested</relinfo>

            (for a number of technical reasons beyond
the scope of this article).
            <topic  class=" topic/topic foo/bar ">
                <p> communications and information
theory</p>
                <title> top element</title>
                <relinfo> elements can be
nested</relinfo> 
                maintain repositories of structured 
documentation for more than a decade, but it is not
well 
                <concept class=" topic/topic foo/bar
">
                 But the metrics for XML on the Web
                        <p> communications and
information theory</p>
                    <title> top element</title>
                    <relinfo> elements can be
nested</relinfo> 
                    measures, or are a little polluted
by voodoo ideology about good 
                    </concept>
            </topic>
        </concept>
    </topic>
</root>

I have this template that gets the word count for each
element and its child elements including the elements
that have class attributes that contains  "
topic/topic ".

 <xsl:template match="*[contains(@class, 'topic/topic
')]">
        <xsl:variable name="level"
select="count(ancestor::*[contains(@class,
'topic/topic ')]) + 1"/>
        <xsl:variable name="ct" select="if ($level =
1) then concat(title,' ') else ' '"/>
        <xsl:variable name="h1" select="if ($level =
2) then concat(title,' ') else ' '"/>
        <xsl:variable name="h2" select="if ($level =
3) then concat(title,' ') else ' '"/>
        <xsl:variable name="h3" select="if ($level =
4) then concat(title,' ') else ' '"/>

        <xsl:variable name="wc"
select="count(tokenize(lower-case(.),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"
/>                            

        <xsl:apply-templates/>
    </xsl:template>


I added another template that contains the count of
its child elements â??

    <xsl:template match="*[contains(@class,
'topic/topic ')]" mode="filterCount">
        <sum>
            <xsl:value-of
select="count(tokenize(lower-case(.),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"/>
            
        </sum>
    </xsl:template>

That I store in a variable and then subtract from the
total within in the first template above

        <xsl:variable name="childcounts">
            <sums>
                <xsl:apply-templates
mode="filterCount"/>                            
            </sums>
        </xsl:variable>

        <xsl:variable name="total-child"
select="sum($childcounts/sums/sum)"/>
        <xsl:variable name="total-roman"
select="sum($wc - $total-child)"/>


I would like to find a more elegant approach to this
because there are also other attributes in this
content that need to have the same technique applied
to â??

Would it be a better approach to copy the elements to
another document node and then perform the word count
which would be applied recursively to all child
elements to arrive at the count and what would this
template match look like?


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>