xsl-list
[Top] [All Lists]

[xsl] Your recent posting to XSL-List

2015-08-29 14:00:13
You recently sent a posting the XSL-List administrative address. I cannot tell 
if you meant to send it to the whole list or to the individual who appears on 
the address line. 

Please resend the message, either to:  
<xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
or to: Mark Wilson <pubs(_at_)knihtisk(_dot_)org>

which ever you intended. 

— Tommie



Begin forwarded message:

From: BIGLIST Assistant 
<biglist-assistant(_at_)lists(_dot_)mulberrytech(_dot_)com>
Subject: [BigList Fwd] Aw: Re: [xsl] Using 'collection' 
Date: August 29, 2015 at 2:56:18 PM EDT
To: <listmaster(_at_)mulberrytech(_dot_)com>
Reply-To: <Martin(_dot_)Honnen(_at_)gmx(_dot_)de>

The following message was originally sent to "Mark Wilson 
pubs(_at_)knihtisk(_dot_)org" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>

Subscription information on the sender at:
http://lists.mulberrytech.com/list/xsl-list/roster/sub/?match=Martin(_dot_)Honnen(_at_)gmx(_dot_)de

From: "Martin Honnen" <Martin(_dot_)Honnen(_at_)gmx(_dot_)de>
Subject: Aw: Re: [xsl] Using 'collection'
Date: August 29, 2015 at 2:56:07 PM EDT
To: "Mark Wilson pubs(_at_)knihtisk(_dot_)org" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Reply-To: "Martin Honnen" <Martin(_dot_)Honnen(_at_)gmx(_dot_)de>


Try with
java -jar c:\saxon\saxon9.jar -xsl:read1.xsl -it:runit
-- 
Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail gesendet.



"Mark Wilson pubs(_at_)knihtisk(_dot_)org" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>schrieb:
Not sure what I am doing wrong here.

Using this batch file:
set SAXON_HOME=C:\saxon
set SAXON_JAR=%SAXON_HOME%\saxon9.jar
java -jar c:\saxon\saxon9.jar read1.xsl -it:runit

I get this error.
P:\British Library>set SAXON_HOME=C:\saxon
P:\British Library>set SAXON_JAR=C:\saxon\saxon9.jar
P:\British Library>java -jar c:\saxon\saxon9.jar read1.xsl -it:runit
Stylesheet file -it:runit does not exist

Using this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:xs="http://www.w3.org/2001/XMLSchema";
xmlns:saxon="http://saxon.sf.net/"; xmlns:mets="http://www.loc.gov/METS/";
xmlns:blprocess="http://bl.uk/namespaces/blprocess";
exclude-result-prefixes="xs" version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template name="runit">
<xsl:apply-templates select="collection('docs?select=*.xml')"/>
<xsl:for-each select="collection('docs?select=*.xml')">
<xsl:apply-templates select="saxon:discard-document(.)"/>
</xsl:for-each>
</xsl:template>

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="mets:amdSec">
<xsl:if test="@ID eq 'amd0002'">
<xsl:copy-of select="descendant::blprocess:processMetadata"
copy-namespaces="no"/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>


On 8/29/2015 8:59 AM, Michael Kay mike(_at_)saxonica(_dot_)com wrote:
It’s worth putting the data in an XML database such as BaseX if you’re 
going to use it often enough to justify the cost of database loading. If 
you just want to use it once, e.g. to extract a subset of the data, then 
collection() should do the job - either in XQuery or XSLT.

To keep memory usage down, assuming you’re implementing with Saxon, the 
simplest way is to ensure that each document is unloaded from memory as 
soon as it has been processed, which you can do with saxon:discard-document:

<xsl:for-each select=“collection(‘docs?select=*.xml’)”>
<xsl:apply-templates select=“saxon:discard-document(.)”>
</xsl:for-each>

discard-document() is a pseudo-function that returns a document unchanged, 
but with the side effect that it is marked as available for garbage 
collection.

Streamed processing is an alternative - but unfortunately in Saxon (until 
the next release) streaming can’t be used together with collection().

Michael Kay
Saxonica


On 29 Aug 2015, at 15:25, Mark Wilson pubs(_at_)knihtisk(_dot_)org 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi Elliot,
I have never used XQuery or BaseX and will look into that, but what you 
have said about the XSLT looks good. I will try to sort this out and see 
where it goes. Thanks for taking the time.
Regards,
Mark

On 8/29/2015 7:13 AM, Eliot Kimber ekimber(_at_)contrext(_dot_)com wrote:
This sounds like a job better done using XQuery. A quick solution would be
to install BaseX and use its GUI to load your XML files and then apply the
query you need to the loaded docs. If you have to do complex
transformations on the things you find you can have the XQuery emit an XML
file that you can then apply an XSLT to, rather than trying to implement
the transform entirely in XQuery.

With XSLT and Saxon you could do something like:

<xsl:stylesheet ...>

<xsl:template name="run">
<xsl:apply-templates select="collection('docs?select=*.xml')"/>
</xsl:template>

<xsl:template match="/">
<!-- do stuff to find what you want in each doc -->
</xsl:template>
</xsl:stylesheet>

Then use the -i flag for Saxon to specify the initial template to run
("run").

The size of the documents shouldn't be a big issue, especially if you can
allocate sufficient memory to the processor. You could probably take
advantage of new streaming features in XSLT 3 and implemented in the
latest Saxon versions.

For something like this you might have to see how much virtual memory the
process requires by running it and if it fails with an out-of-memory
error, give it more until it either runs or you've run out of available
real memory.

Cheers,

Eliot

----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 8/29/15, 8:36 AM, "Mark Wilson pubs(_at_)knihtisk(_dot_)org"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

I have been asked to isolate two elements each from a set of individual
xml files containing hundreds of elements. I thought collect() would
work, but each individual file is very large (36,000 + lines) and there
are 8000 of them. I have no idea as how to begin. I would include a
sample file, but as I said, they are very large. Where might I look to
get ideas?
Thanks,
Mark








====================================================================== 
B. Tommie Usdin                        
mailto:btusdin(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com    
17 West Jefferson Street                           Phone: 301/315-9631 
Suite 207                                    Direct Line: 301/315-9634 
Rockville, MD  20850                                 Fax: 301/315-8285 
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in XML and SGML               
======================================================================
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>
  • [xsl] Your recent posting to XSL-List, B Tommie Usdin btusdin(_at_)mulberrytech(_dot_)com <=