xsl-list
[Top] [All Lists]

Re: [xsl] copying a node

2006-03-07 18:07:50
At 2006-03-08 00:44 +0100, lists(_at_)bitfaeule(_dot_)net wrote:
I have a simple problem but I'm not sure whether xslt is the proper tool.

I have thousands of XML files

For such a simple issue it might make sense not to incur the overhead of building the tree for thousands of files, so I would question the use of XSLT.

Below is both an XSLT solution using a derivative of the identity transform, and a Python solution that buffers the group element and re-emits it with a modified attribute. Note that I have made a number of assumptions in the Python that may or may not apply in your actual situation instead of this test.

The advantage of the Python implementation is speed: it is using the SAX streaming interface and is not incurring the overhead of building the input tree. This might help for your thousands of files.

Note that switching to SAX from XSLT will also help if the input files are very large. For my UBL schema analysis work I had simple transforms for input XML files of 165Mb and rewriting my initial XSLT solution to Python/SAX improved performance to an acceptable amount (in one case it changed a one-hour invocation to less than a minute).

I hope this helps.

. . . . . . . Ken

T:\ftemp>type bitfaeule.xml
<?xml version="1.0"?>
<filter name="ARMCheckTest">
  <group section="mini">
  </group>
  <group section="basic">
    <test name="testCheck1"></test>
    <test name="testCheck2"></test>
  </group>
</filter>

T:\ftemp>xslt bitfaeule.xml bitfaeule.xsl con
<?xml version="1.0" encoding="utf-8"?><filter name="ARMCheckTest">
  <group section="mini">
  </group>
  <group section="basic">
    <test name="testCheck1"/>
    <test name="testCheck2"/>
  </group><group section="basic-alt">
    <test name="testCheck1"/>
    <test name="testCheck2"/>
  </group>
</filter>
T:\ftemp>type bitfaeule.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="1.0">

<xsl:template match="group[(_at_)section='basic']">
  <xsl:copy-of select="."/>
  <xsl:copy>
    <xsl:attribute name="section">basic-alt</xsl:attribute>
    <xsl:copy-of select="node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="@*|node()"><!--identity for all other nodes-->
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>
T:\ftemp>type bitfaeule.xml
<?xml version="1.0"?>
<filter name="ARMCheckTest">
  <group section="mini">
  </group>
  <group section="basic">
    <test name="testCheck1"></test>
    <test name="testCheck2"></test>
  </group>
</filter>

T:\ftemp>python bitfaeule.py <bitfaeule.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<filter name="ARMCheckTest">
  <group section="mini">
  </group>
  <group section="basic">
    <test name="testCheck1"></test>
    <test name="testCheck2"></test>
  </group><group section="basic-alt">
    <test name="testCheck1"></test>
    <test name="testCheck2"></test>
  </group>
</filter>
T:\ftemp>type bitfaeule.py
# A python program to capture and repeate generated XML syntax

from xml.sax import parse, SAXParseException
from xml.sax.xmlreader import AttributesImpl
from xml.sax.saxutils import XMLGenerator
import sys
import StringIO

false = 0
true = not false

# define a class that both buffers and outputs strings based on status
# Note: this does not support nested elements being copied, only one at a time

class copyOut:
    def __init__(this, out):
        if out is None:
            out = sys.stdout
        this._out = out       # remember to whom writing is being done
        this._buffer = false  # start off with no buffering of writing
        this._output = true   # start off with all writing to output
        this._store = ""      # local store of the copy

    # an opportunity to change the direction of writing
    def target( this, buffer, output ):
        this._buffer = buffer
        this._output = output

    # accommodate a writing request
    def write( this, str ):
        if this._buffer: # then buffer the string
            this._store += str
        if this._output: # then write out the string
            this._out.write( str )

    # the store of data is ready to be written
    def flushStore( this ):
        this._out.write( this._store )
        this._store = "" # nothing need be remembered

class myGenerator( XMLGenerator ):
    def __init__(this, out=None, encoding="iso-8859-1"):
        # take advantage of existing generator, but override output
        this._copyOut = copyOut( out )
        XMLGenerator.__init__(this, this._copyOut, encoding)

    def startElement( this, name, attrs):
        # put out the current element regardless
        XMLGenerator.startElement( this, name, attrs )
        # determine if this is the element to be copied
        if name == "group":
            # might be it, check attributes
            for ( attr, value ) in attrs.items():
                if ( attr, value ) == ( "section", "basic" ):
                    # yes, this is the element to be copied, so buffer
                    # modified start tag (assume only one attribute)
                    this._copyOut.target( true, false )
                    XMLGenerator.startElement( this, name,
AttributesImpl( { "section":"basic-alt" } ) )
                    # now buffer and write all content of element
                    this._copyOut.target( true, true )

    def endElement( this, name, ):
        # put out the end of the current element regardless
        XMLGenerator.endElement( this, name )
        # determine if this is the element being copied
        if name == "group":
            # it may or may not be, but it won't hurt to flush empty buffer
            this._copyOut.flushStore()
            this._copyOut.target( false, true )

gen = myGenerator()

#=============================================================================
#
# Main logic

try: # processing the input file using the defined SAX events
    parse( sys.stdin, gen )
except IOError, (errno, strerror):
    exit( "I/O error(%s): %s: %s" % (errno, strerror, file) )
except SAXParseException:
    exit( "File does not parse as well-formed XML: %s" % file )

sys.exit( )

# end of file


--
Upcoming XSLT/XSL-FO hands-on courses: Washington,DC 2006-03-13/17
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>