xsl-list
[Top] [All Lists]

Re: Create several XML files

2004-11-22 16:31:56
Depending on what's actually in the two-or-more concatenated XML documents in a single file, it may constitute a well-formed external parsed entity, and can be parsed as such.

That is, if we have in dox.xml (which does not parse as XML):

<?xml version='1.0'?>
<doc>...</doc>
<doc>...</doc>
<doc>...</doc>

the entirety can be parsed if it is called into a shell or "wrapper" document like so:

<!DOCTYPE wrapper [
<!ENTITY content SYSTEM "dox.xml">
]>
<wrapper>
  &content;
</wrapper>

... which can be parsed (and processed with XSLT, which could be used to split the pieces back out).

If there are XML declarations sprinkled throughout, as in

<?xml version='1.0'?>
<doc>...</doc>
<?xml version='1.0'?>
<doc>...</doc>
<?xml version='1.0'?>
<doc>...</doc>

... then you have to work a little harder. (Some less-than-conformant parsers may not care about those errant XML declarations, thinking they're processing instructions; but most will.) Pre-processing to remove or alter them would work, but if you could identify them dependably, you could as easily split the files at that point and not have the problem. (You could alter them to something innocuous like a PI, parse the file and then use XSLT to clean up the mess, but that would be embarrassing: I wouldn't announce it to the list if I were planning that.)

If the demarcators aren't XML declarations but really PIs:

<?xml version='1.0'?>
<doc>...</doc>
<?separator?>
<doc>...</doc>
<?separator?>
<doc>...</doc>

then it would work to wrap the file into an entity.

That is, how hard you have to work very much depends on the particulars of the format of the concatenation.

Have we been told exactly those particulars?

Cheers,
Wendell


 At 03:58 PM 11/22/2004, M.D. wrote:
Two concatenated XML files (each being well formed in their own
regard) would equal one non-well-formed XML file so thats not going to
work for you either.  Who on earth is joining XML files and giving
them to you in such a format anyway?  I feel for you on this one...
that bites!


======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>