xsl-list
[Top] [All Lists]

[xsl] Which is less expensive group by or select distinct-values

2016-07-15 14:19:28
So I have a large document that I need to pull a list of unique values
from a given element. These are taxonomy and term tag values from a 4,000
topic collection of DITA content.

Without knowing how these are implemented, is there something I should be
able to intuit just from the spec? This is some code that I inherited and
it wouldn't have been how I would have attacked the problem:

<xsl:variable name="TermList">
<xsl:value-of select="distinct-values(.//term[not(@keyref)])" 
separator=", " />
</xsl:variable>
<data type="topicreport" name="WDTermList">
  <xsl:for-each select="tokenize(normalize-space($TermList), ', ')">
        <xsl:sort select="." />
        <xsl:value-of select="."/>
         <xsl:if test="position() != last()">, </xsl:if>
   </xsl:for-each>
</data>

If this hadn't existed in the stylesheet already, I would have probably
done something like:

<xsl:for-each-group select=".//term[not(@keyref)])" group-by=".">
   <xsl:sort select="current-grouping-key()" />
   <xsl:value-of select="current-grouping-key()"/>
   <xsl:if test="position() != last()">, </xsl:if>
</xsl:for-each-group>

Currently the process (with a bunch of other checks) runs for a very long
time due to the size of the file I'm processing and the number of the
checks. Recently after adding a couple of more checks it keeps requiring
the java heap to be increased as it runs out of memory.

I don't think the above is my major time synch in this process but it is
one class of things that I'm reporting. I think the real processing time
issue is coming from a lot of string analysis/parsing that is occurring.

I'll probably run a physical test in a simple stylesheet with this content
to try and time any significant difference, but I was wondering what your
thoughts would be.
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>