xsl-list
[Top] [All Lists]

Re: [xsl] Design of XML so that it may be efficiently stream-processed

2013-11-26 18:03:54
Firstly, I question the premise that XML should be designed to enable streamed 
transformation. One could equally well argue that you should design it so it 
doesn't need to be transformed at all. Transformation is only necessary because 
the data isn't in the form you want it; designing it so that it can easily be 
transformed into the form you want it seems a little odd. Unless perhaps you 
are thinking of designing the intermediate formats in a processing pipeline.


1. Use lots of attributes. Store in them the data needed for processing the 
node.

Certainly for data that can conveniently be represented as attributes, this 
will make streamed processing easier. But don't overdo it.

2. Have one child element only.

No, if there are two things that should naturally be represented as child 
elements, then represent them that way. There are plenty of techniques still 
available for streamed processing: accumulators, xsl:iterator, fold-left, 
xsl:fork.
 

So, to enable efficient stream processing, design XML like this:

<root a="..." b="..." c="...">
     <node d="..." e="..." f="...">
           <node g="..." h="..." i="...">
                 <node j="..." k="..." l="...">
                       <node m="..." n="..." o="...">
                             <node p="..." q="..." r="...">
                                 ...
                            </node>
                       </node>
                 </node>
           </node>
     </node>
</root>

This results in a massively deep tree. For Gigabyte-sized XML files, the 
nesting could be a billion levels deep (or more).

No, such a design is completely bizarre and defeats the whole purpose of 
streaming, which is to reduce memory use.

I would add some more important design criteria. Put metadata and reference 
information (stuff that's needed for reference throughout document processing) 
at the start of the document rather than the end, or in a separate document. 
Use hierarchic nesting for relationships rather than id/idref style pointers 
(even perhaps if it means holding the data redundantly).

Michael Kay
Saxonica


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--