xsl-list
[Top] [All Lists]

RE: [xsl] Merging two sets of files

2012-04-03 09:32:05
Apologies for the terse reply - I meant to also say thank you very much for the 
two-pass suggestion, Emmanuel! :)


-----Original Message-----
From: Emma Burrows [mailto:Emma(_dot_)Burrows(_at_)rpharms(_dot_)com]
Sent: 03 April 2012 15:30
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Merging two sets of files

In fact the outputs are as follows:

- a ditamap and a set of standalone DITA topic files based on all the records 
in book.xml except drug records with catalogue entries
- standalone documents from book.xml for drug records that don't have 
corresponding drug information
- a new version of each of the catalogue files that has a corresponding 
book.xml record

What I didn't mention is that I already have a process to convert book.xml into 
a ditamap and create all the standalone documents. This is tried and tested and 
I am not messing with it two weeks before delivery. :)

The merging is a new, late requirement - I was hoping to just bolt it on to the 
existing transform by hooking it into the drug record matching template, but 
maybe that's not sensible. I am investigating doing a completely separate 
transform.


-----Original Message-----
From: Emmanuel Bégué [mailto:medusis(_at_)gmail(_dot_)com]
Sent: 03 April 2012 10:50
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Merging two sets of files

Do I understand your requirements correctly -- you need to output
- a new version of book.xml with associated catalog information from the drug 
database
- standalone documents from book.xml for topics that don't have corresponding 
drug info
- a new version of each of the 10,000 for which information can be found in 
book.xml (not sure about that last part -- requirements 3 and
4?)

I would build two temporary documents:
- a new book.xml with all associated catalog information
- a new big catalog file (from all the relevant little drug files), with all 
associated book information

and then in a second pass, cut those big documents to output the required 
result files as needed.

One way to do two passes with Saxon is to use saxon:next-in-chain (variables 
are really painful to use).

Hope this helps.
Regards,
EB


2012/4/3 Emma Burrows <Emma(_dot_)Burrows(_at_)rpharms(_dot_)com>:
I'm currently using XSLT 2.0 (using Saxon 9.3 via Oxygen 12) to merge two 
sets of XML files together based on a third file which is a kind of lookup 
table. However, I'm coming across a problem when I need to effectively merge 
two source files into the same output file, and I need some suggestions on a 
change of approach.

I have the following XML files:

- Main document - let's call it book.xml This contains various types
of topics, including about 4000 topics related to drugs, each identified by a 
unique id.

- Ancillary drug information files auto-generated from an online drug 
database.
These are about 10,000 little XML files, each named after the unique id of 
the drug information in the online catalogue.

- An XML file - let's call it lookup.xml - that is essentially a look-up 
table, matching ids in book.xml to one or more drug catalogue ids, and vice 
versa. However, not all records in book.xml have an entry in lookup.xml.

Now my requirement is to convert book.xml from its current proprietary format 
into a DITA-based specialisation, and while I'm doing that:

1- Output the records with no corresponding catalogue entry as standalone 
documents.

2- Merge each drug record in book.xml that has catalogue entries with the 
corresponding auto-generated catalogue file(s), based on lookup.xml.

3- If a record in book.xml has more than one catalogue id in lookup.xml, I 
need to copy the book.xml record into every one of the corresponding 
auto-generated files.

4- If more than one record in book.xml corresponds to one catalogue id in 
lookup.xml, I need to merge all the book.xml records with that same catalogue 
file.

5- Make sure the converted and merged files are referenced in the correct 
location in the book's hierarchy.

I expect we'll ultimately do something more sensible like use conref rather 
than tamper with the auto-generated files, but merging them is my current 
brief as it stands.

Point 4 is the immediate stumbling block because my solution to fulfilling 
points 2 and 3 was as follows:

1. Convert the book.xml drug record into the desired DITA format and place 
that in a variable.
I'm doing this based on a matched template, so this happens whenever the 
processor "encounters" a drug record as it travels book.xml. This ensures 
that I can export records with no catalogue id and keep track of where the 
record was in the hierarchy.

2. Use the lookup.xml file to find corresponding catalogue ids for that 
record.

3. For each catalogue id, open the corresponding catalogue file using 
document(), and result-document it to a new file with the contents of the 
variable inserted in the XML.

The problem is that in step 3, I can't reopen a document that was previously 
created by the transform, so I can't "add" a new book.xml record to the 
contents of an already generated catalogue file, even by outputting a new 
file with a different name.

I can see that I'll probably need a process with an intermediate step, 
perhaps using lookup.xml to guide the processing so I can group records with 
the same catalogue id. But the only trouble with that is what to do with 
records that don't appear in lookup.xml...

Anyway, I hope all this is clear and I'm open to ideas. :)


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com 
______________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com 
______________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>