Re: [xsl] Using 'collection'
2015-08-29 11:48:20
Hi Michael,
When I attempted the stylesheet, I got a namespace error, so I added the
the list of namespaces: xlmns:saxon="http://saxon.sf.net/" and got the
error:
Saxon-PE 9.5.1.3] The prefix "xlmns" for attribute "xlmns:saxon"
associated with an element type "xsl:stylesheet" is not bound.
I have never used an extension before so am unsure how to proceed.
Mark
On 8/29/2015 9:38 AM, Mark Wilson pubs(_at_)knihtisk(_dot_)org wrote:
Hi Michael,
I downloaded BaseX as suggested earlier, but can see that it and my
XQuery learning curve are going to be steep. Since I will only be
doing this process once (for the Royal Philatelic Society London,
actually), I think I would rather try XSLT.
Thanks for the tip,
Mark
On 8/29/2015 8:59 AM, Michael Kay mike(_at_)saxonica(_dot_)com wrote:
It’s worth putting the data in an XML database such as BaseX if
you’re going to use it often enough to justify the cost of database
loading. If you just want to use it once, e.g. to extract a subset of
the data, then collection() should do the job - either in XQuery or
XSLT.
To keep memory usage down, assuming you’re implementing with Saxon,
the simplest way is to ensure that each document is unloaded from
memory as soon as it has been processed, which you can do with
saxon:discard-document:
<xsl:for-each select=“collection(‘docs?select=*.xml’)”>
<xsl:apply-templates select=“saxon:discard-document(.)”>
</xsl:for-each>
discard-document() is a pseudo-function that returns a document
unchanged, but with the side effect that it is marked as available
for garbage collection.
Streamed processing is an alternative - but unfortunately in Saxon
(until the next release) streaming can’t be used together with
collection().
Michael Kay
Saxonica
On 29 Aug 2015, at 15:25, Mark Wilson pubs(_at_)knihtisk(_dot_)org
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi Elliot,
I have never used XQuery or BaseX and will look into that, but what
you have said about the XSLT looks good. I will try to sort this out
and see where it goes. Thanks for taking the time.
Regards,
Mark
On 8/29/2015 7:13 AM, Eliot Kimber ekimber(_at_)contrext(_dot_)com wrote:
This sounds like a job better done using XQuery. A quick solution
would be
to install BaseX and use its GUI to load your XML files and then
apply the
query you need to the loaded docs. If you have to do complex
transformations on the things you find you can have the XQuery emit
an XML
file that you can then apply an XSLT to, rather than trying to
implement
the transform entirely in XQuery.
With XSLT and Saxon you could do something like:
<xsl:stylesheet ...>
<xsl:template name="run">
<xsl:apply-templates select="collection('docs?select=*.xml')"/>
</xsl:template>
<xsl:template match="/">
<!-- do stuff to find what you want in each doc -->
</xsl:template>
</xsl:stylesheet>
Then use the -i flag for Saxon to specify the initial template to run
("run").
The size of the documents shouldn't be a big issue, especially if
you can
allocate sufficient memory to the processor. You could probably take
advantage of new streaming features in XSLT 3 and implemented in the
latest Saxon versions.
For something like this you might have to see how much virtual
memory the
process requires by running it and if it fails with an out-of-memory
error, give it more until it either runs or you've run out of
available
real memory.
Cheers,
Eliot
----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com
On 8/29/15, 8:36 AM, "Mark Wilson pubs(_at_)knihtisk(_dot_)org"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
I have been asked to isolate two elements each from a set of
individual
xml files containing hundreds of elements. I thought collect() would
work, but each individual file is very large (36,000 + lines) and
there
are 8000 of them. I have no idea as how to begin. I would include a
sample file, but as I said, they are very large. Where might I
look to
get ideas?
Thanks,
Mark
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection', Martin Honnen martin(_dot_)honnen(_at_)gmx(_dot_)de
- Re: [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection', Eliot Kimber ekimber(_at_)contrext(_dot_)com
- Re: [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection', Michael Kay mike(_at_)saxonica(_dot_)com
- Re: [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection',
Mark Wilson pubs(_at_)knihtisk(_dot_)org <=
- [xsl] using -it in command line, Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] using -it in command line, Mark Wilson pubs(_at_)knihtisk(_dot_)org
- [xsl] Diacritics in original document, Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] using -it in command line, Michael Kay mike(_at_)saxonica(_dot_)com
- Re: [xsl] using -it in command line, Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
- Re: [xsl] Using 'collection', Mark Wilson pubs(_at_)knihtisk(_dot_)org
|
|
|