xsl-list
[Top] [All Lists]

RE: [xsl] distinct-values() optimization, sorting by frequency

2008-02-08 07:49:00
In the alphabetical list,

count($persNames[normalize-space(lower-case(.)) =$current-name])"/

could be optimized by:

(a) using keys

(b) using Saxon-SA which will optimize it to use a key automatically

(c) using xsl:for-each-group rather than distinct-values(), though that will
require some restructuring of your code.

In the frequency-sorted list, I think for-each-group would definitely be
better:

<xsl:for-each-group select="$persNames" group-by="lower-case(.)">
  <xsl:sort select="count(current-group())"/>
  ...

(Note also the use of a case-blind collation rather than lower-case(),
discussed in another thread today)

Michael Kay
http://www.saxonica.com/


 

-----Original Message-----
From: James Cummings [mailto:cummings(_dot_)james(_at_)gmail(_dot_)com] 
Sent: 08 February 2008 14:28
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] distinct-values() optimization, sorting by frequency

Hiya,

I'm wondering the best way to optimize a distinct-values() 
based transformation.  What I'm basically doing is:
======
<xsl:variable name="docs"  
select="collection('../../working/xml/files.xml')"/>

<xsl:template name="main" >
 <xsl:variable name="persNames" 
select="$docs//tei:text//tei:persName"/>
 <xsl:variable name="norm-persNames"
select="$persNames/normalize-space(lower-case(.))"/>
 <xsl:variable name="distinct-persNames"
select="distinct-values($norm-persNames)"/>
<!-- I realize that I could be more specific on the 
$persNames variable, but doing so doesn't seem to affect 
speed much at all. --> <div type="main">

<!-- Some overall counts -->
<div><head>Overall Counts</head>
<list type="unordered">
  <item>Number of <gi>persName</gi> elements total:
    <xsl:value-of select="count($persNames)"/></item>
  <item>Number of <gi>persName</gi> elements which have a  
@key attribute total: <xsl:value-of 
select="count($persNames[(_at_)key])"/></item>
<item>Number of distinct-value <gi>persName</gi> elements total:
<xsl:value-of select="count($distinct-persNames)"/></item>
</list></div>

<!-- An Alphabetical List -->
<div><head>Alphabetical List</head>
  <list type="unordered">
    <xsl:for-each select="$distinct-persNames">
      <xsl:sort select="."/>
      <xsl:variable name="current-name" select="."/>
      <xsl:variable name="count-distinct-current-name"
     select="count($persNames[normalize-space(lower-case(.)) 
=$current-name])"/>
      <item><xsl:value-of select="concat($current-name,
          '  --  ', $count-distinct-current-name)"/></item>
      </xsl:for-each>
   </list>
</div>

<!-- A Frequency Sorted List  -->
<div>
  <head>Frequency List</head>
  <list type="unordered">
    <xsl:for-each select="$distinct-persNames">
      <xsl:sort 
select="count($persNames[normalize-space(lower-case(.))
        = .])"/>
<!-- I think it is this sort statement which slows things 
down, since I have to repeat it twice. -->
      <xsl:variable name="current-name" select="."/>
      <xsl:variable name="count-distinct-current-name"
        select="count($persNames[normalize-space(lower-case(.))
        = $current-name])"/>
      <item><xsl:value-of select="concat($count-distinct-current-name,
          '  --  ', $current-name)"/> </item>
    </xsl:for-each>
  </list>
</div>
</div>
======

I think the real slow-down comes in the second xsl:for-each 
where I want to sort by frequency of distinct-value by doing:
<xsl:sort 
select="count($persNames[normalize-space(lower-case(.)) = 
.])"/> I have to have it for the sort, and then I have to 
re-do it for the output inside the <item> element.  I'm 
obviously not allowed a variable between the for-each and the 
sort... but I have a feeling I'm missing some clever 
optimization here.

Although this is for a pre-generated transformation, it 
currently takes a *hugely* long time, and I'm thinking I must 
be able to optimize it somehow.

Any suggestions appreciated,

-James

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--