xsl-list
[Top] [All Lists]

Re: [xsl] A super-efficient way to compute the sum of A[i] * B[i] for i=1 to n?

2020-05-09 12:40:14
On Sat, 2020-05-09 at 12:00 +0000, Costello, Roger L.
costello(_at_)mitre(_dot_)org wrote:
Hi Folks,

I need a super-efficient way to compute the sum of A[i] * B[i] for
i=1 to n.

The most efficient way to do something in computing is often to avoid
doing it. So if it's a performance bottleneck, see if you can avoid it
altogether.

I did some timings (see below) but found overall that building the XML
tree dominated: a stylesheet that just produced a message took 15
elapsed seconds on a 700MByte input file, including JVM startup and
compiling the stylesheet; the computation took another 4 or 5 seconds.


Timings:

For what it's worth, i tried with slightly different XML -- a v element
having a and b children,
  <v><a>.438743</a><b>.4874343</b></v>
  <v>...

Saxon 9 EE took 19 seconds to process 10,000,000 v elements, of which
the first 1.5 seconds or so was JVM startup and stylesheet complilaion
(based on a test file with only 3 v elements).

    <xsl:template match="/">
        <r>
          <xsl:message select="'count: ' || count(//v)"/>
          <xsl:value-of select="sum( //v ! my:pair(./a, ./b))" />
        </r>
    </xsl:template>

    <xsl:function name="my:pair">
        <xsl:param name="a" as="xs:double" />
        <xsl:param name="b" as="xs:double" />
        <xsl:sequence select="$a * $b"/>
    </xsl:function>

If the element names were longer, it'd take more time (my test file was
half a gigabyte).

Eliminating the function and using
   <xsl:value-of select="sum( //v ! (./a * ./b) )" />
was about the same time or a second slower, BUT the first time i had a
typo, a , instead of a *, and this was not an error. The function
version is more robust against errors.

Using (./a treat as xs:double) * ./b treat as xs:double) produced a
runtime error after 14.5 seconds (using an XML schema would have
removed the error) showing that constructing the sequence of v elemnts
is taking most of the time.

Using cast instead of treat (xs:facepalm) worked and took 19 to 21
seconds (running it multiple times).

Changing //v to /s/v made a small improvement (18 or 19 seconds instead
of 20).

Using a template to match v elements and return a * b, storing tha tin
a variable of tytpe xs:double*, and returning sum() on it, was about
the same speed.

Using <double-value> instead of <v> to have a 763 MByte file took only
slighty longer.

I did notice, however, that the process was getting about twice as many
CPU seconds as elapsed seconds, showing evidence of multi-threading.

For what it's worth i tried constructing two sequences, instead of
reading elements from an XML file; it took about 5 or 6 seconds, or,
with the same input file as before, about 18 seconds - in other words
it's not the XML part that's taking time.


Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>