xsl-list
[Top] [All Lists]

RE: [xsl] Joining two XML-files, can be a O(n)?

2007-03-11 02:09:53
Saxon-SA will optimize this kind of join for you automatically.
Alternatively, you can do it by hand using xsl:key and the key() function.

Michael Kay
http://www.saxonica.com/ 


-----Original Message-----
From: Jiang Xin [mailto:worldhello(_dot_)net(_at_)gmail(_dot_)com] 
Sent: 10 March 2007 17:55
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Joining two XML-files, can be a O(n)?

I write a xslt to join two xml files a year ago. But I can 
not stand with its low performance.
So I ask for help here.

It was a hack to FreeMind.
If you like to know what mmx_file is, you can follow the 
following URLs:
    * 
http://freemind.sourceforge.net/wiki/index.php/User:Jiangxin/P
atch_save_extra_attributes_outof_mmfile
    * 
http://freemind.sourceforge.net/wiki/index.php/User:Jiangxin/P
atch_load_mm_file_with_mmx_file

========== XSLT file: join.xslt ========== <xsl:stylesheet 
version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
    <xsl:output method="xml" version="1.0" encoding="utf-8" 
indent="yes" />

    <xsl:param name="mmx_file" />

    <xsl:template match="map">
        <map>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates />
        </map>
    </xsl:template>

    <xsl:template match="node">
        <xsl:param name="mmx_node" select="document($mmx_file)" />
        <xsl:copy>
            <xsl:copy-of select="@*" />
            <xsl:copy-of  
select="$mmx_node//node[(_at_)ID=current()/@ID]/@*" />
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*">
      <xsl:copy-of select="."/>
    </xsl:template>

</xsl:stylesheet>

========== XML file 1: x.mm ==========
<?xml version="1.0" encoding="UTF-8"?>
<map version="0.9.0_Beta_8">
<node ID="Freemind_Link_1439916855" TEXT="something"> <node 
FOLDED="true" ID="_" POSITION="right" TEXT="..."> <node 
ID="Freemind_Link_1446446787" TEXT="..."/> <node 
ID="Freemind_Link_1864715670" TEXT="..."/> </node> </node> ... ...
another 8000 nodes!
... ...
</map>

========== XML file 2: x.mmx  ========== <?xml version="1.0" 
encoding="UTF-8"?> <map version="0.9.0_Beta_8"> <node 
CREATED="1173523728454" ID="Freemind_Link_1439916855"
MODIFIED="1173523728454">
<node FOLDED="FALSE" CREATED="1173523728455" ID="_" 
MODIFIED="1173523881485"> <node CREATED="1173523728456" 
ID="Freemind_Link_1446446787"
MODIFIED="1173523888376"/>
<node CREATED="1173523728457" ID="Freemind_Link_1864715670"
MODIFIED="1173523894471"/>
</node>
<node CREATED="1173523728458" ID="Freemind_Link_1476641610"
MODIFIED="1173523728458"/>
</node>
... ...
another 8000 nodes!
... ...
</map>

========== xsltproc test  ==========
when operate on large XML file(contain more then 8000 nodes), 
it will cost 8 minites!!!
# time  xsltproc  --stringparam mmx_file x.mmx  join.xslt  
x.mm > /dev/null

real    8m7.242s
user    7m48.237s
sys     0m0.084s

========== O(n^2) ==========
I know the problem is the process is a o(n^2).
    <xsl:param name="mmx_node" select="document($mmx_file)" />

I want to know whether there is a solution in xslt scope?

Thanks.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>