xsl-list
[Top] [All Lists]

Coding Optimization for big files

2004-03-10 06:38:48
Hi all,
I would like to get your help about a performance problem I've experienced.
I'm sure there are some workaround to overcame that problem, for example a 
physical splitting of the input xml file in several chunks (I have already 
tried and it works fine)
But what I should need is only a logical splitting or just a better usage of 
variables/keys in the XSL coding.

The execution time for a small-medium files (size: 1,5MByte containing 100 
contracts and 1500 Gr22) is around 100 seconds.
The execution time fot the biggest (worst case) file (size: 25MByte containing 
1800 <contracts> where each contract has several <Gr22> for a total of around 
30000 Gr22!!!) is 6 hours!!!
I tried to analysed the problem and of course it is related to the memory 
loading of the variables allSUMGr22 and allContracts, and their access by the 
XSLT processor.
The goal would be, for example to generate input xml files grouped by group of 
contracts <SUM groupId='1'> or to generate different tags for each group 
<SUM1>, <SUM2> etc...
I guess I would need to define variables that don't required too much memory 
and that are able to filter the 30000 items.
But I don't be sure that I can avoid defining big variables.

Is there any suggestions about this optimization?

Thanks in advance
Diego

TRANSFORM.XML

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output version="1.0" method="xml" indent="yes"/>

        <xsl:key name="gr22CustomerContractKey" match="/TIMM-MESSAGE/SUM/Gr22" 
use="concat(@customer,'|',@contract)"/>

        <xsl:variable name="allSUMGr22" select="/TIMM-MESSAGE/SUM/Gr22"/>
        <xsl:variable name="allContracts" 
select="/TIMM-MESSAGE/SUM/Gr22[count(. | key('gr22CustomerContractKey', 
concat(@customer,'|',@contract))[1])=1][IMD/servicecodeid='DNNUM'][IMD/productdes='8']"/>

        <xsl:template match="/">
                <doc_result>
                        <xsl:for-each select="$allContracts">
                        <xsl:variable name="indexCustomer" select="@customer"/>
                        <xsl:variable name="indexContract" select="@contract"/>
                        <contract>
                                <xsl:call-template name="getNumber">
                                        <xsl:with-param name="pIndexCustomer" 
select="$indexCustomer"/>
                                        <xsl:with-param name="pIndexContract" 
select="$indexContract"/>
                                </xsl:call-template>
                        </contract>
                        </xsl:for-each>
                </doc_result>
        </xsl:template>

        <xsl:template name="getNumber">
                <xsl:param name="pIndexCustomer"/>
                <xsl:param name="pIndexContract"/>
                <contract_number>
                                <xsl:value-of 
select="$allSUMGr22[(_at_)customer=$pIndexCustomer][@contract=$pIndexContract]/IMD[productdes='8'][servicecodeid='DNNUM']/fulldesc"/>
                </contract_number>
        </xsl:template>
</xsl:stylesheet>

XML structure

<TIMM-MESSAGE>
<SUM>
...other tags...
<Gr22 customer='1' contract='1'>
<IMD>
        <productdes>8</productdes>
        <servicecodeid>DNNUM</servicecodeid>
        <shortdesc></shortdesc>
        <fulldesc>number1</fulldesc>
</IMD>
...other tags...
</Gr22>
...other Gr22 related to the customer='1' contract='1'...

<Gr22 customer='1' contract='2'>
<IMD>
        <productdes>8</productdes>
        <servicecodeid>DNNUM</servicecodeid>
        <shortdesc></shortdesc>
        <fulldesc>number2</fulldesc>
</IMD>
...other tags...
</Gr22>
...other Gr22 related to the customer='1' contract='2'...

...other Gr22 related to the customer='1' for all the other contracts...

</SUM>
</TIMM-MESSAGE>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>