Depending on what's actually in the two-or-more concatenated XML documents
in a single file, it may constitute a well-formed external parsed entity,
and can be parsed as such.
That is, if we have in dox.xml (which does not parse as XML):
<?xml version='1.0'?>
<doc>...</doc>
<doc>...</doc>
<doc>...</doc>
the entirety can be parsed if it is called into a shell or "wrapper"
document like so:
<!DOCTYPE wrapper [
<!ENTITY content SYSTEM "dox.xml">
]>
<wrapper>
&content;
</wrapper>
... which can be parsed (and processed with XSLT, which could be used to
split the pieces back out).
If there are XML declarations sprinkled throughout, as in
<?xml version='1.0'?>
<doc>...</doc>
<?xml version='1.0'?>
<doc>...</doc>
<?xml version='1.0'?>
<doc>...</doc>
... then you have to work a little harder. (Some less-than-conformant
parsers may not care about those errant XML declarations, thinking they're
processing instructions; but most will.) Pre-processing to remove or alter
them would work, but if you could identify them dependably, you could as
easily split the files at that point and not have the problem. (You could
alter them to something innocuous like a PI, parse the file and then use
XSLT to clean up the mess, but that would be embarrassing: I wouldn't
announce it to the list if I were planning that.)
If the demarcators aren't XML declarations but really PIs:
<?xml version='1.0'?>
<doc>...</doc>
<?separator?>
<doc>...</doc>
<?separator?>
<doc>...</doc>
then it would work to wrap the file into an entity.
That is, how hard you have to work very much depends on the particulars of
the format of the concatenation.
Have we been told exactly those particulars?
Cheers,
Wendell
At 03:58 PM 11/22/2004, M.D. wrote:
Two concatenated XML files (each being well formed in their own
regard) would equal one non-well-formed XML file so thats not going to
work for you either. Who on earth is joining XML files and giving
them to you in such a format anyway? I feel for you on this one...
that bites!
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--