xsl-list
[Top] [All Lists]

RE: [xsl] Transforming flat ?WordML? source to a hierarchical XML output.

2007-09-12 03:39:34
There's an example of XSLT 2.0 code for converting a hierarchy expressed as
a flat structure with level numbers into a real XML hierarchy at

http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html

Michael Kay
http://www.saxonica.com/
 

-----Original Message-----
From: David Medley [mailto:DAVEMEDLEY(_at_)uk(_dot_)ibm(_dot_)com] 
Sent: 11 September 2007 15:27
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Transforming flat ?WordML? source to a 
hierarchical XML output.

Using following:

Saxon XSLT processor, version 8.9

XSLT 2.0


I am trying to process XML source generated by Microsoft Word 
(WORDML).

WordML has no concept of hierarchy, and so each paragraph in 
the source looks like below:

        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Normal"/>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Normal Paragraph</w:t>
                </w:r>
        </w:p> 
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="0"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Top Level List</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="0"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Top Level List</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Bulleted"/>
                        <w:listPr>
                                <w:ilvl w:val="1"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 1</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Bulleted"/>
                        <w:listPr>
                                <w:ilvl w:val="1"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 1</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="2"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 2</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="3"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 3</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="4"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Nested List Level 4</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="4"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:i/>
                        </w:rPr>
                        <w:t>Nested List Level 4</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="5"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:b/>
                        </w:rPr>
                        <w:t>Nested List Level 5</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Number"/>
                        <w:listPr>
                                <w:ilvl w:val="5"/>
                        </w:listPr>
                </w:pPr>
                <w:r>
                        <w:rPr>
                                <w:u w:val="single"/>
                        </w:rPr>
                        <w:t>Nested List Level 5</w:t>
                </w:r>
        </w:p>
        <w:p>
                <w:pPr>
                        <w:pStyle w:val="Normal"/>
                </w:pPr>
                <w:r>
                        <w:rPr/>
                        <w:t>Normal Paragraph</w:t>
                </w:r>
        </w:p>

This displays in word as follows:

Normal Paragraph
1.      Top Level List
2.      Top Level List
        *       Nested List Level 1
        *       Nested List Level 1
                1.      Nested List Level 2
                        a.      Nested List Level 3
                                i.      Nested List Level 4
                                ii.     Nested List Level 4
                                        1.      Nested List Level 5
                                        2.      Nested List Level 5
Normal Paragraph


I need the outcome to be as follows:

        <Paragraph>Normal Paragraph</Paragraph>
        <List type="numbered">
                <Item>Top Level List</Item>
                <Item>Top Level List
                        <List type="bulleted">
                                <Item>Nested List Level 1</Item>
                                <Item>Nested List Level 1
                                        <List type="numbered">
                                                <Item>Nested 
List Level 2
                                                        <List type="
numbered">
                                                              
  <Item> Nested List Level 3
                                                              
          < List type="numbered">  <Item>Nested List Level 
4</Item>  <Item>Nested List Level 4
        <List type="numbered">
                <Item>Nested List Level 5</Item>
                <Item>Nested List Level 5</Item>
        </List>
 </Item>
                                                              
          </
List>
                                                              
  </Item>
                                                        </List>
                                                </Item>
                                        </List>
                                </Item>
                        </List>
                </Item>
        </List>
        <Paragraph>Normal Paragraph</Paragraph>


I think what is required is a grouping procedure, grouping 
the paragraphs depending on the value of  x-path 
'w:pPr/w:listPr/w:ilvl/@w:val' for each paragraph.
My attempt to do this has been unsuccessful resulting in 
problems of not all paragraphs having the x-path 
'w:pPr/w:listPr/w:ilvl/@w:val' and therefore the grouping falls over.

I hope you can help me in this matter, thank you for reading.


Thank you,
David Medley
IT Specialist

Application Services, GBS
IBM Office Internal: 299263 External: +44 (0) 1252 55 9263
Mobile: +44 (0) 7790-778801
E-mail: davemedley(_at_)uk(_dot_)ibm(_dot_)com
Notes: David Medley/UK/IBM(_at_)IBMGB







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales 
with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, 
Hampshire PO6 3AU







--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>