xsl-list
[Top] [All Lists]

Re: [xsl] Nesting a flat XML structure

2018-10-30 09:02:35
The DITA for Publishers Word-to-DITA transformation framework 
(https://github.com/dita4publishers/org.dita4publishers.word2dita) provides a 
general two-stage processing pipeline to go from Word XML to DITA. It produces 
an intermediate XML format that simplifies the original Word XML and then 
applies transforms to that to infer the hierarchical structure based on a 
separate style-to-tag mapping file.

 

If nothing else, it demonstrates heavy use of for-each-group to do non-trivial 
grouping, including dynamic adjustment of the result hierarchy based on 
determination of effective levels from clues in the source without the need for 
100% explicit leveling reflected in your style names.

 

It does require that named styles be used (the transform comes from Publishing 
requirements where well-prepared manuscripts are a rule) but it could be 
adapted to be more general if needed. I’ve recently adapted the intermediate 
format to also generate high-quality DOCX files through the Wordinator project 
(https://github.com/drmacro/wordinator) so you could, in theory round trip from 
Word to XML back to Word (although nobody that I know of has actually tried to 
do that and I’m not sure it would ever actually make sense to do so).

 

While it’s designed to generate DITA XML as a result it can be adapted to 
produce any XML, either directly or as a follow-on transform.

 

It is currently implemented in XSLT 2 (the code has been around for almost a 
decade now).

 

Cheers,

 

Eliot

--

Eliot Kimber

http://contrext.com

 

 

 

From: "ian(_dot_)proudfoot(_at_)itp-x(_dot_)co(_dot_)uk" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Reply-To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Date: Monday, October 29, 2018 at 1:13 PM
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Subject: [xsl] Nesting a flat XML structure

 

Hello XSLT experts.

 

I have a need to transform flat xml that was sourced from a word processor file 
and generate the implied structure. My efforts have been partially successful 
for one level nesting but I’m finding it difficult to manage the process for an 
arbitrary number of nested levels.

 

Here’s a much simplified, but representative example of typical source XML:

 

<doc>

        <p style="h1">title text</p>

        <p style="para">body text</p>

        <p style="para">body text</p>

        <p style="bullet_level1">list text</p>

        <p style="bullet_level1">list text</p>

        <p style="bullet_level2">list text</p>

        <p style="bullet_level2">list text</p>

        <p style="bullet_level2">list text</p>

        <p style="h2">title text</p>

        <p style="para">body text</p>

        <p style="para">body text</p>

    </doc>

 

I need to generate the nested structure to look like similar to this (again 
much simplified):

<doc>

        <section>

            <title>title text</title>

            <p>body text</p>

            <p>body text <ul>

                    <li>list text</li>

                    <li>list text<ul>

                            <li>list text</li>

                            <li>list text</li>

                            <li>list text</li>

                        </ul>

                    </li>

                </ul>

            </p>

            <section>

                <title>title text</title>

                <p>body text</p>

                <p>body text</p>

            </section>

        </section>

</doc>

 

There is no way to know in advance the level of nesting that may be needed. The 
rules for nesting are provided by a separate mapping file. The mapping file is 
used as the input to generate a document specific xslt file that handles 
element naming and adds all of the necessary attributes. The format and content 
of that mapping file are under my control but the source documents are not.

 

My initial efforts used xsl:for-each-group with group adjacent to identify and 
nest the first level and that works nicely, but I’ve got myself tied-up in 
knots trying to work out how to make it work for any further nesting. Perhaps 
I’m overthinking it? I tried to create a recursive template to do the work, but 
that’s where I got stuck. 

 

I’m using the latest version of Saxon via the version 9 API.

 

Thanks in advance.

Ian Proudfoot

Isle of Wight, UK 

 

 

XSL-List info and archive 

EasyUnsubscribe (by email) 
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>