At 2006-03-08 00:44 +0100, lists(_at_)bitfaeule(_dot_)net wrote:
I have a simple problem but I'm not sure whether xslt is the proper tool.
I have thousands of XML files
For such a simple issue it might make sense not to incur the overhead
of building the tree for thousands of files, so I would question the
use of XSLT.
Below is both an XSLT solution using a derivative of the identity
transform, and a Python solution that buffers the group element and
re-emits it with a modified attribute. Note that I have made a
number of assumptions in the Python that may or may not apply in your
actual situation instead of this test.
The advantage of the Python implementation is speed: it is using the
SAX streaming interface and is not incurring the overhead of building
the input tree. This might help for your thousands of files.
Note that switching to SAX from XSLT will also help if the input
files are very large. For my UBL schema analysis work I had simple
transforms for input XML files of 165Mb and rewriting my initial XSLT
solution to Python/SAX improved performance to an acceptable amount
(in one case it changed a one-hour invocation to less than a minute).
I hope this helps.
. . . . . . . Ken
T:\ftemp>type bitfaeule.xml
<?xml version="1.0"?>
<filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group>
</filter>
T:\ftemp>xslt bitfaeule.xml bitfaeule.xsl con
<?xml version="1.0" encoding="utf-8"?><filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"/>
<test name="testCheck2"/>
</group><group section="basic-alt">
<test name="testCheck1"/>
<test name="testCheck2"/>
</group>
</filter>
T:\ftemp>type bitfaeule.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="group[(_at_)section='basic']">
<xsl:copy-of select="."/>
<xsl:copy>
<xsl:attribute name="section">basic-alt</xsl:attribute>
<xsl:copy-of select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|node()"><!--identity for all other nodes-->
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
T:\ftemp>type bitfaeule.xml
<?xml version="1.0"?>
<filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group>
</filter>
T:\ftemp>python bitfaeule.py <bitfaeule.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<filter name="ARMCheckTest">
<group section="mini">
</group>
<group section="basic">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group><group section="basic-alt">
<test name="testCheck1"></test>
<test name="testCheck2"></test>
</group>
</filter>
T:\ftemp>type bitfaeule.py
# A python program to capture and repeate generated XML syntax
from xml.sax import parse, SAXParseException
from xml.sax.xmlreader import AttributesImpl
from xml.sax.saxutils import XMLGenerator
import sys
import StringIO
false = 0
true = not false
# define a class that both buffers and outputs strings based on status
# Note: this does not support nested elements being copied, only one at a time
class copyOut:
def __init__(this, out):
if out is None:
out = sys.stdout
this._out = out # remember to whom writing is being done
this._buffer = false # start off with no buffering of writing
this._output = true # start off with all writing to output
this._store = "" # local store of the copy
# an opportunity to change the direction of writing
def target( this, buffer, output ):
this._buffer = buffer
this._output = output
# accommodate a writing request
def write( this, str ):
if this._buffer: # then buffer the string
this._store += str
if this._output: # then write out the string
this._out.write( str )
# the store of data is ready to be written
def flushStore( this ):
this._out.write( this._store )
this._store = "" # nothing need be remembered
class myGenerator( XMLGenerator ):
def __init__(this, out=None, encoding="iso-8859-1"):
# take advantage of existing generator, but override output
this._copyOut = copyOut( out )
XMLGenerator.__init__(this, this._copyOut, encoding)
def startElement( this, name, attrs):
# put out the current element regardless
XMLGenerator.startElement( this, name, attrs )
# determine if this is the element to be copied
if name == "group":
# might be it, check attributes
for ( attr, value ) in attrs.items():
if ( attr, value ) == ( "section", "basic" ):
# yes, this is the element to be copied, so buffer
# modified start tag (assume only one attribute)
this._copyOut.target( true, false )
XMLGenerator.startElement( this, name,
AttributesImpl( {
"section":"basic-alt" } ) )
# now buffer and write all content of element
this._copyOut.target( true, true )
def endElement( this, name, ):
# put out the end of the current element regardless
XMLGenerator.endElement( this, name )
# determine if this is the element being copied
if name == "group":
# it may or may not be, but it won't hurt to flush empty buffer
this._copyOut.flushStore()
this._copyOut.target( false, true )
gen = myGenerator()
#=============================================================================
#
# Main logic
try: # processing the input file using the defined SAX events
parse( sys.stdin, gen )
except IOError, (errno, strerror):
exit( "I/O error(%s): %s: %s" % (errno, strerror, file) )
except SAXParseException:
exit( "File does not parse as well-formed XML: %s" % file )
sys.exit( )
# end of file
--
Upcoming XSLT/XSL-FO hands-on courses: Washington,DC 2006-03-13/17
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05 http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--