xsl-list
[Top] [All Lists]

Re: WordML to XML

2005-02-11 19:14:57
Joris, et al...

My requirement is specifically to convert wordML to
xml. i.e. strip off the "wordML" specific tags, but
retain the "formatting instructions".

For example:
For a wordDocument with contents as "I have bold and
italics and underscore", this is the source wordML
document.
------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml";
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core";
xmlns:aml="http://schemas.microsoft.com/aml/2001/core";
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint";
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
w:macrosPresent="no" w:embeddedObjPresent="no"
w:ocxPresent="no"
xml:space="preserve"><o:DocumentProperties><o:Title>I
have bold and italics and
underscore</o:Title><o:Author>vnanjang</o:Author><o:LastAuthor>vnanjang</o:LastAuthor><o:Revision>1</o:Revision><o:TotalTime>0</o:TotalTime><o:Created>2005-02-12T01:52:00Z</o:Created><o:LastSaved>2005-02-12T01:52:00Z</o:LastSaved><o:Pages>1</o:Pages><o:Words>5</o:Words><o:Characters>34</o:Characters><o:Company>Oracle
Corporation</o:Company><o:Lines>1</o:Lines><o:Paragraphs>1</o:Paragraphs><o:CharactersWithSpaces>38</o:CharactersWithSpaces><o:Version>11.5604</o:Version></o:DocumentProperties><w:fonts><w:defaultFonts
w:ascii="Times New Roman" w:fareast="SimSun"
w:h-ansi="Times New Roman" w:cs="Times New
Roman"/><w:font w:name="SimSun"><w:altName
w:val="å®?ä½?"/><w:panose-1
w:val="02010600030101010101"/><w:charset
w:val="86"/><w:family w:val="Auto"/><w:pitch
w:val="variable"/><w:sig w:usb-0="00000003"
w:usb-1="080E0000" w:usb-2="00000010"
w:usb-3="00000000" w:csb-0="00040001"
w:csb-1="00000000"/></w:font><w:font
w:name="@SimSun"><w:panose-1
w:val="02010600030101010101"/><w:charset
w:val="86"/><w:family w:val="Auto"/><w:pitch
w:val="variable"/><w:sig w:usb-0="00000003"
w:usb-1="080E0000" w:usb-2="00000010"
w:usb-3="00000000" w:csb-0="00040001"
w:csb-1="00000000"/></w:font></w:fonts><w:styles><w:versionOfBuiltInStylenames
w:val="4"/><w:latentStyles w:defLockedState="off"
w:latentStyleCount="156"/><w:style w:type="paragraph"
w:default="on" w:styleId="Normal"><w:name
w:val="Normal"/><w:rPr><wx:font wx:val="Times New
Roman"/><w:sz w:val="24"/><w:sz-cs w:val="24"/><w:lang
w:val="EN-US" w:fareast="ZH-CN"
w:bidi="AR-SA"/></w:rPr></w:style><w:style
w:type="character" w:default="on"
w:styleId="DefaultParagraphFont"><w:name
w:val="Default Paragraph
Font"/><w:semiHidden/></w:style><w:style
w:type="table" w:default="on"
w:styleId="TableNormal"><w:name w:val="Normal
Table"/><wx:uiName wx:val="Table
Normal"/><w:semiHidden/><w:rPr><wx:font wx:val="Times
New Roman"/></w:rPr><w:tblPr><w:tblInd w:w="0"
w:type="dxa"/><w:tblCellMar><w:top w:w="0"
w:type="dxa"/><w:left w:w="108"
w:type="dxa"/><w:bottom w:w="0" w:type="dxa"/><w:right
w:w="108"
w:type="dxa"/></w:tblCellMar></w:tblPr></w:style><w:style
w:type="list" w:default="on"
w:styleId="NoList"><w:name w:val="No
List"/><w:semiHidden/></w:style></w:styles><w:docPr><w:view
w:val="print"/><w:zoom
w:percent="100"/><w:doNotEmbedSystemFonts/><w:proofState
w:spelling="clean"
w:grammar="clean"/><w:attachedTemplate
w:val=""/><w:defaultTabStop
w:val="720"/><w:characterSpacingControl
w:val="DontCompress"/><w:optimizeForBrowser/><w:validateAgainstSchema/><w:saveInvalidXML
w:val="off"/><w:ignoreMixedContent
w:val="off"/><w:alwaysShowPlaceholderText
w:val="off"/><w:compat><w:dontAllowFieldEndSelect/><w:applyBreakingRules/><w:useWord2002TableStyleRules/><w:useFELayout/></w:compat></w:docPr><w:body><wx:sect><w:p><w:pPr><w:rPr><w:b/><w:b-cs/><w:i/><w:i-cs/><w:u
w:val="single"/></w:rPr></w:pPr><w:r><w:rPr><w:b/><w:b-cs/><w:i/><w:i-cs/><w:color
w:val="000000"/><w:u w:val="single"/></w:rPr><w:t>I
have bold and italics and
underscore</w:t></w:r></w:p><w:sectPr><w:pgSz
w:w="12240" w:h="15840"/><w:pgMar w:top="1440"
w:right="1800" w:bottom="1440" w:left="1800"
w:header="720" w:footer="720" w:gutter="0"/><w:cols
w:space="720"/><w:docGrid
w:line-pitch="360"/></w:sectPr></wx:sect></w:body></w:wordDocument>
------------------------------------------------------



I need to write an XSLT that will give me the
following output..
------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<b>
   <i>
     <u>
         I have bold, italics and underscore
     </u>
   </i>
</b>
------------------------------------------------------
Though this looks like html, html output is not what
I'm interested in. I have provided here a
simplification of my requirement. In reality, my
wordML document will have some of my custom tags and
data, like the above, will be part of these custom
tags..

For example, the output in xml could be..
------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vasuarticletag>
<b>
   <i>
     <u>
         I have bold, italics and underscore
     </u>
   </i>
</b>
</vasuarticletag>
------------------------------------------------------
So, I need help in writing an xslt which will 
1. traverse through every "w:r" block.
2. Look for "w:rPr" tags with "w:i", "w:b" , "w:u"
children.
3. If they exist, output <i>, <b>, <u> tags, then
output the contents of the corresponding "w:t" block
and then close the <i>, <b>, <u> tags.

Requesting your help...

Regards,
Vasu


                
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>