xsl-list
[Top] [All Lists]

RE: Grouping problem with large files in .Net

2004-06-07 16:49:32
Hi Frederik,

I've not used XslTransform yet for the type of transformation you're doing
so I cannot comment on that.

If the problems with XslTransform are this magnitude (and your results look
quite severe) then you could revert to using MSXML 4.0 as a COM object in
your .NET application.

I realize that this option may not be the best but it could serve you as a
temporal workaround. 

HTH,
<prs/>

-----Original Message-----
From: Frederik Willaert [mailto:f(_dot_)w(_at_)advalvas(_dot_)be] 
Sent: Sunday, June 06, 2004 6:47 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Grouping problem with large files in .Net

Hi,

 

I have a problem with grouping large record-style XML documents using the
Net XslTransform class.

 

My source document has the following structure:

 

<REPORT>

    <ROW>

        <CUSTOMER>XXX</CUSTOMER>

        <ACCOUNT>YYY</ACCOUNT>

        <HOURNUMBER>1</HOURNUMBER>

        <VALUE1>...</VALUE1>

        <VALUE2>...</VALUE2>

        <VALUE3>...</VALUE3>

        <!-- ... -->

    </ROW>

    <ROW>

            <!-- ... -->

    </ROW>

    <!-- ... -->

</REPORT>

 

 

The stylesheet I'm executing is the following:

 

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3

org/1999/XSL/Transform">

<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="rows-by-customer" match="/REPORT/ROW" use="CUSTOMER"/>

<xsl:key name="rows-by-customer-and-account" match="/REPORT/ROW" use=

concat(CUSTOMER,'+',ACCOUNT)"/>

<xsl:template match="/REPORT">

    <Report>

        <xsl:for-each select="ROW[generate-id() = generate-id(key(

rows-by-customer', CUSTOMER)[1])]">

            <xsl:variable name="customer" select="CUSTOMER" />

            <Customer Name="{$customer}">

                <xsl:for-each select="key('rows-by-customer'
$customer)[generate-id() =

generate-id(key('rows-by-customer-and-account', concat(CUSTOMER,'+'

ACCOUNT))[1])]">

                    <xsl:variable name="account" select="ACCOUNT" />

                    <Account Name="{$account}">

                        <xsl:for-each select="key(
rows-by-customer-and-account',

concat(CUSTOMER,'+',$account))">

                            <xsl:copy-of select="." />

                        </xsl:for-each>

                    </Account>

                </xsl:for-each>

            </Customer>

        </xsl:for-each>

    </Report>

</xsl:template>

</xsl:stylesheet>

 

This performs a two-level grouping: by Customer, then by Account.

 

The source document can contain several tens of thousands of rows.

 

 

=> When performing this transformation using MSXML, performance is very
acceptible.< 1 minute for a file with 60000 records.

=> However, the same transformation using .Net (1.1) XslTranform seems to
take forever - haven't been able to have it processed completely so far...
Unfortunately, .Net is the intended platform.

 

==> Am I doing something wrong, is this a known problem, and/or can
something be done about this?

 

Remarks:

- I have also tried with the count(. | key('rows-by-customer', CUSTOMER)[1])
= 1 approach, same problem.

- I've found a document on MSDN mentioning that the xsl:key implementation
had a performance problem. However, this seems to apply to .Net v1.0 (?)

- Following recommendations, I'm using XPathDocument for the input file, and
a stream for the output - or would there be better options?

- I've included the source code for the transformation, and the timings of
several transformations (using MSXSL and XslTransform) below.

 

Any help would be greatly appreciated...

 

Thanks in advance,

Frederik

 

*****************

C# code to do transformation:

 

string folder = @"D:\Test\grouping\";

string inputUri = folder + "FlatInput.xml";

string stylesheet1uri = folder + "FlatInput2Grouped.xslt";

 

string outputUri = folder + "groupedOutput_XslTransform.xml";

 

DateTime beforeStart = DateTime.Now;

DateTime afterLoadingInput, afterLoadingStylesheet, afterTransform;

using(FileStream output = new FileStream(outputUri,FileMode.Create
FileAccess.Write,FileShare.Read))

{

XPathDocument inputDocument = new XPathDocument(inputUri);

afterLoadingInput = DateTime.Now;

 

XslTransform transform = new XslTransform();

 

transform.Load(

new XPathDocument(stylesheet1uri), 

null,

this.GetType().Assembly.Evidence);

afterLoadingStylesheet = DateTime.Now;

 

transform.Transform(inputDocument,null,output,null);

afterTransform = DateTime.Now;

}

 

******************

Timings:

 

MSXSL:

 

groupedOutput_verysmall_msxsl.xml (approx. 48 records)

---------------------------------

Source document load time: 27.68 milliseconds

Stylesheet document load time: 1.810 milliseconds

Stylesheet compile time: 1.266 milliseconds

Stylesheet execution time: 6.178 milliseconds

 

groupedOutput_small_msxsl.xml (144 records)

-----------------------------

Source document load time: 45.77 milliseconds

Stylesheet document load time: 2.145 milliseconds

Stylesheet compile time: 1.297 milliseconds

Stylesheet execution time: 48.66 milliseconds

 

groupedOutput_medium_msxsl.xml (approx. 10000 records)

------------------------------

Source document load time: 1507 milliseconds

Stylesheet document load time: 11.85 milliseconds

Stylesheet compile time: .648 milliseconds

Stylesheet execution time: 1634 milliseconds

 

groupedOutput_msxsl.xml (approx. 60000 records, 30MB file size)

-----------------------

Source document load time: 11276 milliseconds

Stylesheet document load time: 3.053 milliseconds

Stylesheet compile time: .652 milliseconds

Stylesheet execution time: 40403 milliseconds

 

============

 

XSLTRANSFORM:

(timings of second transformation, to exclude JIT compilation time)

 

groupedOutput_verysmall_XslTransform.xml (48 records)

----------------------------------------

Source document load time: 30 milliseconds

Stylesheet document load time: 10 milliseconds

Stylesheet execution time: 130 milliseconds

 

groupedOutput_small_XslTransform.xml (144 records)

------------------------------------

Source document load time: 50 milliseconds

Stylesheet document load time: 10 milliseconds

Stylesheet execution time: 270 milliseconds

 

groupedOutput_medium_XslTransform.xml (approx. 10000 records)

-------------------------------------

[SEVERAL HOURS]

 

groupedOutput_XslTransform.xml (approx. 60000 records, 30MB file size)

------------------------------

[FOREVER ?]

--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--


<Prev in Thread] Current Thread [Next in Thread>