xsl-list
[Top] [All Lists]

Re: [xsl] Move elements to preceding parent

2009-06-18 11:15:59
Hi Ken,

I tried to test the stylesheet with non-<p> elements inside body and I
see they break the paragraph.

Input Example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<body>
   <p dir="rtl">
      <span class="chapter">line1</span>
   </p>
 <p dir="rtl"><span class="regular">line10</span>
 <span class="regular">line11</span>
 </p>
 <p dir="rtl"><span class="regular">line12</span>
 </p>
<p dir="rtl"><span class="regular">line13.</span>
</p>
<p dir="rtl"><span class="regular">line14</span>
</p>
<p dir="rtl"><span class="regular">line15</span>
</p>
<h5>
    <img src="images/test.jpg" width="35.00" height="30.00" alt="images.jpg" />

</h5>
<p dir="rtl"><span class="regular">line16.</span>
</p>
<p dir="rtl"><span class="regular">line17"</span>
</p>

</body>
</html>

Output:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml";>
   <body>
      <p dir="rtl">
         <span class="chapter">line1</span>

      </p>
      <p dir="rtl"><span class="regular">line10</span>
         <span class="regular">line11</span>
         <span class="regular">line12</span>
         <span class="regular">line13.</span>

      </p>
      <p dir="rtl"><span class="regular">line14</span>

      </p>
      <p dir="rtl"><span class="regular">line15</span>

      </p>
      <h5>
         <img src="images/test.jpg" width="35.00" height="30.00"
alt="images.jpg" />


      </h5>
      <p dir="rtl"><span class="regular">line16.</span>

      </p>
      <p dir="rtl"><span class="regular">line17"</span>

      </p>
   </body>
</html>


I thought the <h5> element should be grouped as a seperate group
because of the condition group-ending-with="*[not(self::p)] ...

What should I change so the output will be:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml";>
   <body>
      <p dir="rtl">
         <span class="chapter">line1</span>

      </p>
      <p dir="rtl"><span class="regular">line10</span>
         <span class="regular">line11</span>
         <span class="regular">line12</span>
         <span class="regular">line13.</span>

      </p>
      <p dir="rtl"><span class="regular">line14</span>

                <span class="regular">line15</span>
                <span class="regular">line16.</span>
      </p>
      <h5>
         <img src="images/test.jpg" width="35.00" height="30.00"
alt="images.jpg" />


      </h5>

      <p dir="rtl"><span class="regular">line17"</span>

      </p>
   </body>
</html>

Thanks, Israel


On Wed, Jun 17, 2009 at 11:37 PM, G. Ken
Holman<gkholman(_at_)cranesoftwrights(_dot_)com> wrote:
At 2009-06-17 23:11 +0300, Israel Viente wrote:

I really appreciate your code and comments, but after reading it many
times, I can't reach to the bottom of the logic here.
I'm a newbie so forgive my stupid questions.

As I tell my students, questions are not stupid if they are asked sincerely.
 I far more appreciate the asking of questions than the ignoring of working
code that was supplied as requested.

1. Why do we need the outer most copy element:
<xsl:template match="body">
 <xsl:copy>

In order to preserve the body element when it comes time to group the
paragraphs.

How does it work in combination with xsl:for-each-group?

By being the parent of the elements being grouped, matching on <body> gives
the stylesheet the opportunity to act on all of the children of body.  The
paragraphs you want to massage are children of the body, so the time to act
on those children is at the time the body arrives at the stylesheet.  Since
we want the body element to be part of the result, we preserve it with
<xsl:copy>.

2. Can you please explain the group-ending-with selection?

You can see by the select="*" that I have selected *all* of the children of
the body.  I want to act on those groups of adjacent <p> elements.  But
since there are other non-<p> elements that could be in the data (there
aren't any in your data, but how often is a web page made solely of
paragraphs?) I would be pulling those into the selection as well.  After
all, I want all of the children of body to be processed in child order, I
only want to engage the special handling when I'm dealing with those
children that are paragraphs.

Yes your data sample only contained paragraphs, but I try to write my
stylesheets defensively anticipating other conditions.

Why do we need *[not(self::p)] ? Doesn't it mean all except p elements?

Indeed it does mean all except <p> elements.  By putting non-<p> elements in
their own group, they won't interfere with the groups that are comprised of
<p> elements.

So, adding more narrative to the stylesheet:

 <xsl:template match="body">
 <xsl:copy>
   <xsl:copy-of select="@*"/>

The above preserves the body element and any attributes that might be
attached to it.

   <xsl:for-each-group select="*"

The above selects all of the children of the body.

                       group-ending-with="*[not(self::p)] |
                                          p[span/@class='chapter'] |
                                          p[matches(span[last()],
                                                    '[.?&#x22;]$')]">

The above creates a group for every non-paragraph, a group for every
chapter, and a group for every consecutive sequence of paragraphs and ends
that group with a paragraph with the desired punctuation.

     <!--now the information is grouped by p elements that end as
 required-->
     <xsl:choose>
       <xsl:when test="current-group()[last()]
                       [self::p][matches(span[last()],'[.?&#x22;]$')]">

The above tells me when I have encountered a group of <p> elements that ends
with a paragraph with the desired punctuation.

         <!--in a group of p elements that end as required-->
         <xsl:copy>
           <xsl:copy-of select="@*"/>

The above preserves the *first* of those paragraphs, and its attributes.

           <!--preserve the content of the first of these p elements-->
           <xsl:apply-templates/>

The above preserves the content of that paragraph.

           <!--preserve only the span elements and indentation from the
 rest;
               (the indentation is needed because this is paragraph
                white-space)-->
           <xsl:apply-templates select="current-group()[position()>1]/
                                        (text()[not(normalize-space())] |
                                        span)"/>

The above preserves only the content of the other paragraphs in the group.
 If there are no other paragraphs in the group, nothing else is added.  If
there are 15 other paragraphs in the group, all of the content of all of
them are added.  This is the generalized nature of the result:  I'm not
assuming that there is only one other paragraph.

         </xsl:copy>
       </xsl:when>
       <xsl:otherwise>
         <!--in another kind of group so just copy these using identity-->
         <xsl:apply-templates select="current-group()"/>

The above preserves all of the children of <body> that are not paragraphs or
are chapter paragraphs.

       </xsl:otherwise>
     </xsl:choose>
   </xsl:for-each-group>
 </xsl:copy>
 </xsl:template>

I hope this has helped.  Working directly with the sibling axes is fraught
with problems because of the reach of these axes usually past where we want
to stop looking.  By looking *down* on the data, rather than left and right,
one can see a different perspective of your requirement.  You expressed your
requirement by looking left and right from the given paragraph.  I expressed
your requirement by looking down at the paragraphs from the <body> parent.

Good luck in your work with XML and XSLT!  As you learn more I'm sure you'll
love it more.

. . . . . . . . . . . . Ken

--
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--