xsl-list
[Top] [All Lists]

Re: [xsl] grouping duplicate links with xslt 1.0

2011-10-21 10:30:45
I'd approach this as an identity transform with special handling for
internalLink elements.  That way, you're less likely to lose other
content, since the default is to copy everything unless you say
otherwise.

The other issue I was concerned about with your current approach is
that it tries to combine every internalLink with the same target, no
matter where in the document they occur, whereas it seems your
requirement is more like "combine any series of internalLink elements
all sharing the same target, separated only by whitespace".  That's
the approach I went with, below.  Let me know if I misinterpreted your
goal.

Also, I made an assumption that the real content of each internalLink
is anything after the target child element, since it appeared that
anything before that is just whitespace for markup formatting
purposes.  If actual content may occur before the target, then the
mode="include" template will need to be adjusted accordingly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">

<!-- duplicate link processor -->

<xsl:output method="xml"/>

<!-- identity template to copy most content to result as-is -->
<xsl:template match="@* | node()">
    <xsl:copy><xsl:apply-templates select="@* | node()"/></xsl:copy>
</xsl:template>

<!-- handle specially internalLinks that are not preceded by another
     internalLink with the same target (possibly separated by text nodes
     consisting only of whitespace) -->
<xsl:template match="internalLink[not(target =
preceding-sibling::node()[not(self::text()[normalize-space(.) =
''])][1]/self::internalLink/target)]">
    <xsl:variable name="target" select="target"/>
    <!-- sibs = number of following siblings that are not (a) whitespace-only
         text nodes or (b) internalLink elements with the same target -->
    <xsl:variable name="sibs"
select="count(following-sibling::node()[not(self::text()[normalize-space(.)
= ''] | self::internalLink[target = $target])])"/>
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
        <xsl:apply-templates
select="following-sibling::*[count(following-sibling::node()[not(self::text()[normalize-space(.)
= ''] | self::internalLink[target = $target])]) = $sibs]"
mode="include"/>
    </xsl:copy>
</xsl:template>

<!-- for an internalLink included in another, output just the content
     following the "target" child element -->
<xsl:template match="internalLink" mode="include">
    <xsl:apply-templates select="target/following-sibling::node()"/>
</xsl:template>

<!-- suppress all other internalLink elements and whitespace between two
     internalLink elements sharing the same target -->
<xsl:template match="internalLink"/>
<xsl:template match="text()[normalize-space(.) =
''][following-sibling::node()[1]/self::internalLink/target =
preceding-sibling::node()[1]/self::internalLink/target]"/>

</xsl:stylesheet>

-Brandon :)


On Fri, Oct 21, 2011 at 10:11 AM, Terry Ofner <tdofner(_at_)gmail(_dot_)com> 
wrote:
I couldn't find any reference to this issue in the archive. If it has been 
addressed before, please forgive.

I have an issue with MS Word outputting duplicate links in xml, breaking up 
the text. I need to group identical links and output one link while leaving 
all other nodes/text the same. Here is an example of the input xml:

<paragraphs>

   <!-- Have students practice [the activity in 9.01].-->
<p>Have students practice
   <internalLink>
   <target>Update_Link [7] [act_1]</target>the activity</internalLink>
   <internalLink>
   <target>Update_Link [7] [act_1]</target> in 9.01</internalLink>.</p>

   <!-- Have students practive [the activity in 9.02]. -->

<p>Have students practice <internalLink>
   <target>Update_Link [7] [act_2]</target>the activity</internalLink>
<internalLink>
   <target>Update_Link [7] [act_2]</target> in 9.02</internalLink>.</p>
</paragraphs>

I am limited to xslt 1.0. The following 1.0 sheet does everything I need it 
to except it drops the final period.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
version="1.0">

   <!-- duplicate link processor -->

   <xsl:output method="xml" indent="yes"/>

<xsl:key name="link_target" match="internalLink" use="target" />


<xsl:template match="paragraphs">
   <xsl:for-each select="p">
   <p><xsl:apply-templates select="./text()[1]"/><internalLink>
       <target><xsl:apply-templates select="internalLink[count(. | 
key('link_target', target)[1]) = 1]/target"/></target>
       <xsl:apply-templates 
select="./internalLink/text()"/></internalLink></p></xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Here is the output using Oxygen/Saxon 6.5.5. Everything is good except for 
the final period.

<?xml version="1.0" encoding="utf-8"?>
<p>Have students practice
   <internalLink>
     <target>Update_Link [7] [act_1]</target>
   the activity
    in 9.01</internalLink>
</p>
<p>Have students practice <internalLink>
     <target>Update_Link [7] [act_2]</target>
   the activity
    in 9.02</internalLink>
</p>

Any pointers would be most appreciated.

Terry
--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>