xsl-list
[Top] [All Lists]

RE: split OpenOffice 1.1 documents (flat xml)

2003-07-30 08:26:02
It appears that the beginning and end of a chapter is not signified by an 
element, that is to say, there is no element that contains a chapter. Is that 
correct?

If so, how can you determine where a chapter begins and ends? If you can answer 
that question, you have moved a long way toward solving the problem.

It appears that you can identify the beginning of a chapter with an XPath 
expression along these lines: 
"office:document-content/office:body/text:h[(_at_)text:level="1"]. It also 
seems that all sibling nodes of a particular <text:h> element up to but not 
including the next <text:h> sibling node are part of the chapter, is that 
correct?
-- 
Charles Knell
cknell(_at_)onebox(_dot_)com - email



-----Original Message-----
From:     "Linnemann, Victor" <Linnemann(_at_)euroscript(_dot_)ch>
Sent:     Wed, 30 Jul 2003 15:50:51 +0200
To:       XSL-List(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject:  [xsl] split OpenOffice 1.1 documents (flat xml)

Hello everybody,
my question is about splitting large OpenOffice 1.1 documents (the
content.xml that you will see once you unzipped the *.swx) into single
chapters for translation purposes.
It's flat xml, and because of this I already looked in the XSL-FAQ under
http://www.dpawson.co.uk/xsl/sect2/flatfile.htm "Convert a flat XML
document", but I was not able to apply the suggested solution to my problem.
Each of the splitted files has to be a valid OpenOffice document and must
contain exactly one chapter (begins with <text:h ...>bla</text:h> and ends
with the next <text:h ...>bla</text:h>).
***********************************************************
XML (sorry, very odd content):
***********************************************************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD
OfficeDocument 1.0//EN" "office.dtd">
<office:document-content 
        xmlns:office="http://openoffice.org/2000/office"; 
        xmlns:style="http://openoffice.org/2000/style"; 
        xmlns:text="http://openoffice.org/2000/text"; 
        (...)
        xmlns:script="http://openoffice.org/2000/script"; office:class="text"
office:version="1.0">
<office:script/>
<office:font-decls>
        <style:font-decl style:name="Arial" fo:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
</office:font-decls>
<office:automatic-styles/>
<office:body>
        <text:sequence-decls>
                <text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
                <text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
                <text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
                <text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
        </text:sequence-decls>
        <text:h text:style-name="Heading 1" text:level="1">Kapitel
1</text:h>
        <text:p text:style-name="Standard">Dies ist mein Dokument.</text:p>
        <text:h text:style-name="Heading 1" text:level="1">Kapitel
2</text:h>
        <text:p text:style-name="Standard">Vor jedem neuen Kapitel soll
gesplittet werden.</text:p>
</office:body>
</office:document-content>
***********************************************************
desired result:
***********************************************************
The same document structure, but splitted file 1 has as it's content

        <text:h text:style-name="Heading 1" text:level="1">Chapter
1</text:h>
        <text:p text:style-name="Standard">This is my content.</text:p>

whereas splitted file 2 has as it's content

        <text:h text:style-name="Heading 1" text:level="1">Chapter
2</text:h>
        <text:p text:style-name="Standard">This is my other
content.</text:p>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>