xsl-list
[Top] [All Lists]

Re: [xsl] Transform a million XML documents

2017-02-13 09:23:28
I can report that collection() worked fine on my smaller test set of about 50K 
documents. Will have a test against the full 1 million data set in the next day 
or two. Again, this is a Saxon-specific feature.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 2/13/17, 8:39 AM, "Matthew Stoeffler 
matthew(_dot_)stoeffler(_at_)ithaka(_dot_)org" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

    I”ve done this on a smaller scale: about 44,000 input documents, minimum of 
2K per doc.  I chose to loop with collection function  and send each input node 
to a result tree written out with result document to a temp , working 
directory, and generate directly from the loop a shell script that then moved 
all the temp files to a final location. This because I has a lot of related 
asset files that also needed to move.  I was able to run this with Saxon PE.  I 
don’t remember run time, but it didn’t seem excessive.
    
    
    
    m./
    
    
    
    
    
    > On Feb 10, 2017, at 4:52 PM, Michael Kay mike(_at_)saxonica(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
    
    > 
    
    >> 
    
    >> Here is a summary of the ensuing discussion.
    
    >> 
    
    >> Scenario: There are a million XML documents that need to be transformed. 
Each file is in the 1-4KB range. The files are organized into directories about 
4 or 5 deep and some directories have 100s or 1000s of files.
    
    >> 
    
    >> Transforming a million files is easily handled by Saxon-EE,
    
    > 
    
    > 
    
    > That is in no way a summary of what I wrote on that thread. I wrote, much 
more cautiously "I can't see any particular reason why collection() shouldn't 
handle it".
    
    > 
    
    > Michael Kay
    
    > Saxonica
    
    > 
    
    
    
    
    
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>