xsl-list
[Top] [All Lists]

RE: Merging 2 XML's in to 1 output XML (Performance Issue)

2005-10-12 08:09:58
This is basically a join query. Most XSLT processors don't do very much in
the way of join optimization, so you tend to get quadratic performance
(double the file size, and it takes four times as long).

There's a join optimizer in Saxon-SA, so you could try that; or if you've
got more time than money, you could hand-optimize it using keys. Take a look
at xsl:key and the key() function.

Actually, it's worth trying it on Saxon-B (the open source version) just to
see how much that helps even without doing anything else. It's been known to
go ten times faster than Xalan.

Michael Kay
http://www.saxonica.com/
 

-----Original Message-----
From: Kusunam, Srinivas [mailto:SKusunam(_at_)rlpt(_dot_)com] 
Sent: 12 October 2005 15:52
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Merging 2 XML's in to 1 output XML (Performance Issue)

Hi Group,

         This is my first message in this group. Looks like group is
very active and helpful please suggest me something on this. 


I have InputXSL1   & InputXML2 which needs to be combined in to
OutputXML based on some conditions say for each record in 
InputXSL1 with
Key match in InputXSL2 get all the  child elements in InputXSL2 and
append to InputXSL1's Node.

Size of Input Files can go up to 100MB (MAX).
Environment:   JDK-1.4.2_06   \   xalan-j_2_7_0 

I have written simple Style sheet to do this but to process 
10MB file it
is taking around 14mins and which is definitely not acceptable. Please
suggest me if this is common with XSLT (as everybody say that XSLT is
not efficient for Large Input files). What are the alternatives to do
this?

InputXSML1:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<st:VehicleTDoc>

      <st:VehicleTT>
              <xdd:DFFileId>213044</xdd:DFFileId>
              <xdd:RSNo>10</xdd:RSNo>
              <xdd:SNameAddGroup>
                      <xdd:SAddRoleCode>P</xdd: SAddRoleCode >
              </xdd:SNameAddGroup>
      </st: VehicleTT >
</st: VehicleTDoc>

InputXML2:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<st:VehicleTDoc>

      <st:VehicleTT>
              <xdd:DFFileId>213044</xdd:DFFileId>
              <xdd:RSNo>10</xdd:RSNo>
              <xdd:SName>
                      <xdd:RoleCode>Primary</xdd:RoleCode>
              </xdd:SName>
      </st: VehicleTT >
</st: VehicleTDoc>

XSL:
<xsl:stylesheet version="1.0">

<xsl:output method="xml" indent="yes" />

<xsl:variable name="aggregator" select="document('InputXML2.xml')" />

<xsl:template match="/">
      <st:VehicleTDoc>
              <xsl:apply-templates/>
      </st:VehicleTDoc>
</xsl:template>

<xsl:template match="//st:VehicleTT">

    <st:VehicleTT>
          <xsl:copy-of select="*"/>
    <xsl:apply-templates
select="$aggregator/st:VehicleTDoc/st:VehicleTT[xdd:DFFileId=c
urrent()/x
dd:DFFileId and xdd:RSNo=current()/xdd:RSNo][1]/xdd:SName" />
    </st:VehicleTT>
</xsl:template>

<xsl:template match="xdd:SName">
      <xsl:copy-of select="current()" />
</xsl:template>
      
</xsl:stylesheet>

Instead of hard coding some of the tags I can actually get it from the
current node but my main problem is performance of the main 
task. Looks
like this call is taking lot of time......
xsl:apply-templates
select="$aggregator/st:VehicleTDoc/st:VehicleTT[xdd:DFFileId=c
urrent()/x
dd:DFFileId and xdd:RSNo=current()/xdd:RSNo][1]/xdd:SName" />

I really appreciate any comments and suggestions.

Thanks,
Sree


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--