xsl-list
[Top] [All Lists]

[xsl] XML to XML change, handling mixed content

2011-10-18 15:41:32
Hello,

I have 2 questions:

1) I have a specific requirement where I am bit struck with what would be 
the best way to handle it. In a nutshell, I need to modify the source

<p> 
   
 text text &#x2018; LINK-1 TEXT &#x2019; TEXT TEXT <URL 
weburl="XXX">XXX</url> TEXT 
<SOmething>TEXT</SOmething>
    AND again 
<INSIDE>SOME TEXT text &#x2018; LINK-2 TEXT &#x2019; TEXT 
<URL weburl="YYY">YYY</url></INSIDE>
    And can be more text with or without URL and TEXT like &#x2018; LINK-3 
TEXT&#x2019;
</p>

to (THE REQUIREMENT)

<p> 
    text text <a href="XXX"> LINK-1 TEXT </a> TEXT TEXT TEXT
 <SOmething>TEXT</SOmething>
    AND again <another>SOME TEXT text <a href="XXX"> LINK-2 TEXT </a> TEXT 
<another>
    And can be more text with or without URL and TEXT like &#x2018; LINK-3 
TEXT&#x2019;
</p>

What
 it required is, for each <URL>, if the PRECEDING part of string 
had text contained within  &#x2018; and &#x2019;, then they mut 
be converted to <a href> link. For me, after narrowing down to 
p[URL], not sure what would be the best pattern to achieve the desired 
result. Pls can you suggest something? In the above sample, NOTE that 
the last set of &#x2018; LINK-3 TEXT&#x2019; was left as it is 
due to no matching URL. Even though XSL1 used, if XSL2 can solve it 
easily, pls suggest that also. 


[SAMPLE Skeleton XML and XSL]

XML:

<?xml version="1.0"
 encoding="UTF-8"?>
<root>
    <something>
        <blah-blah>Can have many child</blah-blah>        
        <nodeGroup>
            <note id="does-not-matter-1">
                <p>
                    <something><sup>1</sup></something>
                    some text here. <bidItem id="95522-1" vol="1"> Title Name, 
Other details,
 &#x2018;The
                        arms trade and corruption&#x2019;, <i>Prospect</i> 
Aug.2005</bidItem>.
                    
                    <!-- NOTE: NO URL IN THIS CASE, WHICH IS FINE -->
                </p>
            </note>
            <note id="does-not-matter-2">
                <p> some text
 &#x2018;Ex-Pentagon procurement executive gets jail time&#x2019;, text text 
&lt;
                    <url 
webUrl="http://www.aaa.xx/bbb/ddd.htm";>http://www.aaa.xx/bbb/ddd.htm</url>&gt;; 
                   
 &#x2018;Former Air Force acquisition official released from 
jail&#x2019;, Government in 2005, &lt;
                    <url webUrl="http://www.aaa.xx/bbb/uuu.htm";>SAME AS 
@webUrl</url>&gt;; and 
                    <bidItem id="95522-2">Author name., &#x2018;Cashing in for 
profit? Who cost taxpayers
 billions in biggest Pentagon scandal in years?&#x2019;, <i>60 Minutes</i>, 
CBS, 5 Jan. 2005
                   
 </bidItem>, &lt;  <url 
webUrl="http://www.cbsnews.com/stories/2005/01/04/60II/main664652.shtml";>SAME
 AS @webUrl</url>&gt;.
                    
                    <!-- HERE EACH URL HAS MATCHING  &#x2018;contens&#x2019; 
WHICH IS FINE -->
                </p>
           
 </note>
            <note id="does-not-matter-3">
                <p><something><sup>68</sup></something> This figure is 
comprised of a fine of
                    &#xa3;500&#xa0;000 ($900&#xa0;000) for &#x2018;irregular 
accounting practices&#x2019;
                    in a Tanzanian deal for an inappropriate and overpriced air 
radar system that was
                    tainted by allegations of high-level corruption, with 
...($405&#xa0;000)
 costs..
                    &#xa3;29.275 million ($52.695 million) going to Tanzania in 
reparations. <bidItem
                    
 id="996522-31" title="BAE deal with Tanzania...">Evans, R. and 
Lewis, P., &#x2018;BAE deal with Tanzania:
                     military air traffic control&#x2014;for country with no 
airforce&#x2019;, <i>The
                    
 Guardian</i>, 6 Feb. 2010</bidItem>; &#x2018;Military 
radar probe: the key suspects &#x2026; and 
                    the case against them&#x2019;, <i>This Day</i> (Dar es 
Salaam), 15 Feb. 2010; &lt;
                   
 <url 
webUrl="http://www.judiciary.gov.uk/Resources/JCO/Documents/Judgments/r-v-bae-sentencing-remarks.pdf";>SAME
 AS @webUrl</url>&gt;.
                    
                    <!-- 
                        ONLY ONE URL, BUT MANY  &#x2018; in-between texts 
&#x2019; 
                        So, the URL belong only to its preceding "&#x2018; 
in-between texts &#x2019"
                    -->
                </p>
            </note>
        </nodeGroup>
    </something>


XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
    <xsl:template match="/">
        <xsl:apply-templates select="*"/>
   
 </xsl:template>
    
    <xsl:template match="*"> 
        <xsl:copy> 
            <xsl:apply-templates select="@*|node()"/> 
        </xsl:copy> 
    </xsl:template>
    
    <xsl:variable name="href-start">&lt;href="</xsl:variable>
    <xsl:variable name="href-mid">"/></xsl:variable>
    <xsl:variable name="href-finish">&lt;a/></xsl:variable>
    
    <xsl:template match="note">
        <xsl:copy> 
            <xsl:apply-templates
 select="@*"/>          
            <xsl:apply-templates mode="url"/>       
        </xsl:copy>             
    </xsl:template>
    
        
    <xsl:template match="p[url]" mode="url">
        <!-- HERE, FOR EACH URL, IT SHOULD FORM A HREF LINK, COVERING ANY 
PRECEDING TEXT THAT APPEAR 
            IN-BETWEEN &#x2018; AND &#x2019;
        
            Ref: MAIL
 DESCRIPTION.
        -->
        <xsl:copy> 
            <xsl:apply-templates select="@*"/>          
            <xsl:apply-templates/>
        </xsl:copy> 
    </xsl:template>
    
    <xsl:template match="p[not(url)]" mode="url">
        <xsl:copy> 
            <xsl:apply-templates select="@*"/>          
            <xsl:apply-templates/>       
        </xsl:copy>             
    </xsl:template>
    
    <xsl:template match="@*|text()|comment()|processing-instruction()"> 
        <xsl:copy-of select="."/> 
    </xsl:template>
    
 <!-- COMMENTED... SOME TRY ALONG THIS LINE
   <xsl:template .... mode="url">
        <xsl:copy>
            <xsl:... test="contains(., '&#x2018;')">
                   
 <!-\-<xsl:apply-templates>
                        <xsl:sort select="substring-before(., '&#x2018;')"/>
                        </xsl:apply-templates>-\->
                    <xsl:value-of select="substring-before(., '&#x2018;')"/>
                   
 <xsl:value-of select="$href-start" 
disable-output-escaping="yes"/>[@<xsl:value-of 
select="following-sibling::url"/>]<xsl:value-of select="$href-mid"
 disable-output-escaping="yes"/>
                    <xsl:value-of select="substring-after(., '&#x2018;')"/>
                </xsl:...>
                <xsl:... test="contains(., '&#x2019;')">
                    <xsl:value-of select="substring-before(., 
'&#x2019;')"/>                    
                    <xsl:value-of select="$href-finish"
 disable-output-escaping="yes"/>
                    <xsl:value-of select="substring-after(., '&#x2019;')"/>
                </xsl:..>  
                <xsl:apply-templates .... mode="url"/>            
       </xsl:copy> 
    </xsl:template>
-->
    
</xsl:stylesheet>
2) Additionally, when  dealing with 
such mixed content (I mean containing both text and child elements), 
what is the best way to split and handle elements and text seperately?

Thanks and look forward to suggestions,
Karl

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>