xsl-list
[Top] [All Lists]

[xsl] Merging two sets of files

2012-04-03 03:55:53
I'm currently using XSLT 2.0 (using Saxon 9.3 via Oxygen 12) to merge two sets 
of XML files together based on a third file which is a kind of lookup table. 
However, I'm coming across a problem when I need to effectively merge two 
source files into the same output file, and I need some suggestions on a change 
of approach.

I have the following XML files:

- Main document - let's call it book.xml
This contains various types of topics, including about 4000 topics related to 
drugs, each identified by a unique id.

- Ancillary drug information files auto-generated from an online drug database.
These are about 10,000 little XML files, each named after the unique id of the 
drug information in the online catalogue.

- An XML file - let's call it lookup.xml - that is essentially a look-up table, 
matching ids in book.xml to one or more drug catalogue ids, and vice versa. 
However, not all records in book.xml have an entry in lookup.xml.

Now my requirement is to convert book.xml from its current proprietary format 
into a DITA-based specialisation, and while I'm doing that:

1- Output the records with no corresponding catalogue entry as standalone 
documents.

2- Merge each drug record in book.xml that has catalogue entries with the 
corresponding auto-generated catalogue file(s), based on lookup.xml.

3- If a record in book.xml has more than one catalogue id in lookup.xml, I need 
to copy the book.xml record into every one of the corresponding auto-generated 
files.

4- If more than one record in book.xml corresponds to one catalogue id in 
lookup.xml, I need to merge all the book.xml records with that same catalogue 
file.

5- Make sure the converted and merged files are referenced in the correct 
location in the book's hierarchy.

I expect we'll ultimately do something more sensible like use conref rather 
than tamper with the auto-generated files, but merging them is my current brief 
as it stands.

Point 4 is the immediate stumbling block because my solution to fulfilling 
points 2 and 3 was as follows:

1. Convert the book.xml drug record into the desired DITA format and place that 
in a variable.
I'm doing this based on a matched template, so this happens whenever the 
processor "encounters" a drug record as it travels book.xml. This ensures that 
I can export records with no catalogue id and keep track of where the record 
was in the hierarchy.

2. Use the lookup.xml file to find corresponding catalogue ids for that record.

3. For each catalogue id, open the corresponding catalogue file using 
document(), and result-document it to a new file with the contents of the 
variable inserted in the XML.

The problem is that in step 3, I can't reopen a document that was previously 
created by the transform, so I can't "add" a new book.xml record to the 
contents of an already generated catalogue file, even by outputting a new file 
with a different name.

I can see that I'll probably need a process with an intermediate step, perhaps 
using lookup.xml to guide the processing so I can group records with the same 
catalogue id. But the only trouble with that is what to do with records that 
don't appear in lookup.xml...

Anyway, I hope all this is clear and I'm open to ideas. :)


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>