xsl-list
[Top] [All Lists]

Re: Statistics - Calculating Standard Deviation

2003-06-13 10:17:33

"Andrew Welch" <AWelch(_at_)piper-group(_dot_)com> wrote in message
news:3BAAB77DB787FC4C961601D815DAF1E50E6C41(_at_)piper7(_dot_)Piper(_dot_)Internal(_dot_)(_dot_)(_dot_)
The performance is the thing that is worrying me most.  Ideally the
target processor is MSXML 4.0, but that is open to negotiation...

Well using saxon 7.x (use the latest) and exslt/math you could use the
following simple stylesheet.  Im just wondering how much > of this can be
done using straight xslt 2 now... Is there a square root function? I had a
quick look but didn?t see anything.


The solution I posted earlier today runs OK without any modifications in
XSLT 2.0 (Saxon 7.5):

http://aspn.activestate.com/ASPN/Mail/Message/XSL-List/1670297




<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:exsl="http://exslt.org/math";>

<xsl:variable name="mean" select="sum(/root/node) div count(/root/node)"/>

<xsl:variable name="diffs">
  <root>
    <xsl:for-each select="/root/node">
      <node squaredDiff="{exsl:power($mean - .,2)}">


Why is this necessary? Probably multiplying a number with itself in pure
XSLT will not be slower?


         <xsl:copy-of select="."/>
      </node>
    </xsl:for-each>
  </root>
</xsl:variable>

<xsl:variable name="mean.Of.Sum.Of.Diffs">
  <xsl:for-each select="$diffs">
    <xsl:value-of select="sum(/root/node/@squaredDiff) div (count
(/root/node)-1)"/>
  </xsl:for-each>
</xsl:variable>

<xsl:template match="/">
  standard deviation: <xsl:value-of
select="exsl:sqrt(number($mean.Of.Sum.Of.Diffs))"/>
</xsl:template>

</xsl:stylesheet>


This solution will use 2 * N units of memory, which may be limiting its
applicability especially when processing long node-sets.
It may require from three to five traversals of a node-set with the length N
of the initial node-set (one each for sum() and count())

An advantage (in efficiency) is that it does not require any recursion.

However, I guess it would be much more efficient if sequences were
used/built instead of node-sets.


=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list