xsl-list
[Top] [All Lists]

RE: [xsl] Grouping Word 2007 content by customXml nodes

2007-01-15 04:55:55
In this expression:

<xsl:value-of select="//w:p/w:r/w:t"/>

"//" selects from the root of the document. You want to select relative to
what's selected by xsl:for-each-group, that is current-group(). So try:

<xsl:value-of select="current-group()/w:p/w:r/w:t"/>

In practice you probably want to do further processing of this content,
something like

<xsl:apply-templates select="current-group()/w:p"/> 

Michael Kay
http://www.saxonica.com/



-----Original Message-----
From: Frank Hopper [mailto:frank(_dot_)hopper(_at_)gmx(_dot_)de] 
Sent: 15 January 2007 11:05
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Grouping Word 2007 content by customXml nodes

I am new to XSLT and working with ASP.NET 2.0 trying to bulk 
upload content from Word 2007 docx files to a SQL Server 2005 
Express Edition database in order to publish the content 
through my content management system.  So far I think I will 
need to use xml version 2.0 and Saxon 8.7 processor for .NET 
(since the .NET XslCompiledTransform processor only supports 
xml version 1.0).

I would like to split the Word 2007 documents into several 
parts via XSLT so I can publish a long Word 2007 document as 
several web pages to the internet. I added my own customXML 
to  the Word 2007 document  to insert information like page 
title, url, meta description and meta keywords and so on (the 
WORD2007SAMPLE_DOCUMENT.XML file below only shows the page 
title customXML to keep the sample short). Every <w:customXml 
w:element="pageTitle"> indicates the start of a new web page. 
The content in between will be converted to HTML.

The DESIRED_OUTPUT.XML shows the xml file I would like to get 
as a result.  This file will be loaded into the corresponding 
tables and columns of my SQL Server 2005 Express Edition database.

The RECEIVED_OUTPUT.XML shows the output I get so far. It 
shows that the content is not grouped correctly into separate 
web pages.

The MY_NOT_WORKING_TRANSFORM.XSL shows how I tried to 
transform the WORD2007SAMPLE_DOCUMENT.XML into 
DESIRED_OUTPUT.XML without success. The conversion of the 
content to HTML is not included to keep the sample short.

MY PROBLEM:
When I group by  <w:customXml w:element="pageTitle"> using 
for-each-group I can't get to the value of  <w:t>Content 
?</w:t> nodes without destroying my grouping effort.  I 
suppose this is because the content is not in the same or a 
lower level than my <w:customXml w:element="pageTitle">.

Thanks for your help.

----------------------------
WORD2007SAMPLE_DOCUMENT.XML
----------------------------
<?xml version="1.0"?>
<w:document
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
06/main">
   <w:body>
     <w:p>
       <w:customXml w:element="pageTitle">
         <w:r>
           <w:t>1. Web Page Title</w:t>
         </w:r>
       </w:customXml>
     </w:p>
     <w:p>
       <w:r>
         <w:t>Content A</w:t>
       </w:r>
     </w:p>
     <w:p>
       <w:r>
         <w:t>Content B</w:t>
       </w:r>
     </w:p>
     <w:p>
       <w:customXml w:element="pageTitle">
         <w:r>
           <w:t>2. Web Page Title</w:t>
         </w:r>
       </w:customXml>
     </w:p>
     <w:p>
       <w:r>
         <w:t>Content C</w:t>
       </w:r>
     </w:p>
     <w:p>
       <w:r>
         <w:t>Content D</w:t>
       </w:r>
     </w:p>
   </w:body>
</w:document>

----------------------------
DESIRED_OUTPUT.XML
----------------------------
<?xml version="1.0" encoding="utf-8"?>
<root
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
06/main">
   <pageData>
     <pageTitle>1. Web Page Title</pageTitle>
     <pageContent>
       Content A and Content B
     </pageContent>
   </pageData>
   <pageData>
     <pageTitle>2. Web Page Title</pageTitle>
     <pageContent>
       Content C and Content D
      </pageContent>
   </pageData>
</root>

----------------------------
RECEIVED_OUTPUT.XML
----------------------------
<?xml version="1.0" encoding="utf-8"?>
<root
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
06/main">
   <pageData>
     <pageTitle>1. Web Page Title</pageTitle>
     <pageContent>
       Content A and Content B Content C and Content D
     </pageContent>
   </pageData>
   <pageData>
     <pageTitle>2. Web Page Title</pageTitle>
     <pageContent>
       Content A and Content B Content C and Content D
     </pageContent>
   </pageData>
</root>

----------------------------
MY_NOT_WORKING_TRANSFORM.XSL
----------------------------
<xsl:stylesheet version="2.0"
   xmlns:xsl=http://www.w3.org/1999/XSL/Transform
  
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/20
06/main">

   <xsl:output method="xml" indent="yes" encoding="utf-8" />
   <xsl:strip-space elements="*"/>

   <xsl:template match="/">
     <xsl:apply-templates select="//w:body"/>
   </xsl:template>

   <xsl:template match="w:body">
     <root>
       <xsl:for-each-group select="*"
        group-starting-with="w:p[w:customXml/@w:element = 
'pageTitle']">
         <pageData>
           <pageTitle>
             <xsl:value-of select="."/>
           </pageTitle>
           <pageContent>
             <xsl:value-of select="//w:p/w:r/w:t"/>
           </pageContent>
         </pageData>
       </xsl:for-each-group>
     </root>
   </xsl:template>
</xsl:stylesheet>

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>