xsl-list
[Top] [All Lists]

Re: [xsl] Efficient way to do an identity transform, eliminating duplicate elements, in XSLT 1.0?

2013-12-13 08:58:03
Dear Roger,

I think you need to do some more diagnostics.

Try a stylesheet with a single empty template:

<xsl:template match="/"/>

If you still run out of memory, the problem is not the copying. (It is
probably because you are just putting too much stuff in your blender.
Try switching processors and/or allocating more RAM. :-)

If this runs, you still don't know it's the copying. Try:

<xsl:template match="/">
  <xsl:copy-of select="/"/>
</xsl:template>

If this also runs, then you can look more closely at the logic of your
stylesheet.

Does "efficient" mean "uses less RAM"? If so, consider pipelining. But
if the document doesn't even go through your 1.0 pipe, you may need to
use another technology to split it into chunks first.

I hear there is this new feature coming on line, XSLT streaming.... :-)

Cheers, Wendell
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Dec 13, 2013 at 9:33 AM, Costello, Roger L. 
<costello(_at_)mitre(_dot_)org> wrote:
Hi Folks,

I need to do an identity transform on XML files like this:

<Document>
    <First>
        <id>A</id>
        <blah>B</blah>
        <id>A</id>
    </First>
    <Second>
        <id>C</id>
        <blah>D</blah>
        <id>C</id>
    </Second>
</Document>

I want the identity transform to remove duplicate elements in <First> and 
remove duplicate elements in <Second>. So the output should be:

<Document>
    <First>
        <id>A</id>
        <blah>B</blah>
    </First>
    <Second>
        <id>C</id>
        <blah>D</blah>
    </Second>
</Document>

I need to use XSLT 1.0 to implement this.

I created an implementation, but it uses <copy> statements. The actual XML 
document that I am transforming is huge, nearly 1 GB. When I run my XSLT 
implementation the processor runs out of memory. I think it's due to the 
<copy> statements. I need a very efficient implementation. Any suggestions?

/Roger

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--