xsl-list
[Top] [All Lists]

RE: [xsl] Dividing documents based on size of contents

2009-05-27 03:25:59
Hello,

This is a grouping problem.

Given this source document:

<document>
        <documentDivision>
                <pagebreak num="50"/>
                </documentDivision>
        <documentDivision>
                <pagebreak num="50"/>
                </documentDivision>
        <documentDivision>
                <pagebreak num="127"/>
                </documentDivision>
        <documentDivision>
                <pagebreak num="5"/>
                </documentDivision>
        <documentDivision>
                <pagebreak num="23"/>
                </documentDivision>
        <documentDivision>
                <pagebreak num="78"/>
                </documentDivision>
        </document>

where the number of pagebreaks is found in @num
(whereas in reality the number of pagebreaks should
be computed from counting pagebreak elements), you
can get what you want with this template:

<xsl:template match="/document">
  <xsl:copy>
    <xsl:for-each-group select="documentDivision"
      group-adjacent="floor(sum(pagebreak/@num) div 100) = 1">
      <doc>
        <xsl:attribute name="pagebreaks"
          select="sum(current-group()/pagebreak/@num)"/>
        <xsl:copy-of select="current-group()"/>
        </doc>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

which results in:

<document>
   <doc pagebreaks="100">
      <documentDivision>
         <pagebreak num="50"/>
      </documentDivision>
      <documentDivision>
         <pagebreak num="50"/>
      </documentDivision>
   </doc>
   <doc pagebreaks="127">
      <documentDivision>
         <pagebreak num="127"/>
      </documentDivision>
   </doc>
   <doc pagebreaks="106">
      <documentDivision>
         <pagebreak num="5"/>
      </documentDivision>
      <documentDivision>
         <pagebreak num="23"/>
      </documentDivision>
      <documentDivision>
         <pagebreak num="78"/>
      </documentDivision>
   </doc>
</document>

Of course, as stated above, you need to adjust the "group-adjacent"
attribute so that it uses a proper method to count pagebreaks according
to your actual source document.

Also, the principle is that group-adjacent keeps adding elements to
the group to satisfy the clause, so that if you have this sequence
of pagebreaks:
1
97
250

you will get one doc with all those 348 pages, whereas you might
have prefered to have one doc with 98 pages and another with 250.
But you can tweak that, maybe with multiple passes.

Hope this helps,
Regards,
EB

-----Original Message-----
From: Chris von See [mailto:chris(_at_)techadapt(_dot_)com]
Sent: Wednesday, May 27, 2009 3:54 AM
To: xsl-list
Subject: [xsl] Dividing documents based on size of contents


Hi all -

I have what I think is a fairly simple problem, but I'm having trouble  
with the implementation in XSLT.  Any help you could give would be  
greatly appreciated.

I have a document which is subdivided into multiple sections, with  
each section, in turn, divided into pages as shown below:

<document>
      <documentDivision>
              ... arbitrary content ...
              <pagebreak />
              ... arbitrary content ...
              <pagebreak />
      </documentDivision>

      ... arbitrary number of <documentDivision> elements ...

</document>

Each <documentDivision> section of the document can have an arbitrary  
number of <pagebreak> elements, and an arbitrary amount of content  
between <pagebreak>s.

I'd like to be able to break the input <document> into multiple  
<document>s, each of which has the minimum number of  
<documentDivision> sections that give it a <pagebreak> count ~100  
pages.  I'd like to break the input at <documentDivision> boundaries,  
but I don't need the output documents to be equally sized or to be  
exactly 100 pages long - just as close to that size as I can  
reasonably get while maintaining the <documentDivision> boundaries.

So for example if I have an input document that looks like this:

<document>
      <documentDivision>
              ... content containing 50 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 50 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 127 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 5 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 23 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 78 <pagebreak /> elements ...
      </documentDivision>
</document>

the output documents should look like this, with each output document  
being "close" to 100 pages in length:

<!-- This doc has enough <documentDivision> elements to give exactly  
100 pages. -->
<document>
      <documentDivision>
              ... content containing 50 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 50 <pagebreak /> elements ...
      </documentDivision>
</document>

<!-- This doc has a single <documentDivision> element with 127 pages -  
close enough! -->
<document>
      <documentDivision>
              ... content containing 127 <pagebreak /> elements ...
      </documentDivision>
</document>

<!-- This doc has a three <documentDivision> elements of 5, 23 and 78  
pages each - close enough! -->
<document>
      <documentDivision>
              ... content containing 5 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 23 <pagebreak /> elements ...
      </documentDivision>
      <documentDivision>
              ... content containing 78 <pagebreak /> elements ...
      </documentDivision>
</document>

I've been able to figure out how to get the number of <pagebreak>s per  
<documentDivision> and how to calculate the number of <pagebreak>s in  
any given group of <documentDivision> sections, but what I'm not sure  
of is how to maintain information about the point at which I last  
created a new output document so that I can determine what group of  
<documentDivision> elements has a page count around 100 and should  
therefore be used to create a new output document.  It seems that the  
best way to carry this context would be via params to xsl;apply- 
templates, but I'm not clear on how to set up the XSLT code so that  
the state gets maintained as I iterate through <documentDivision>  
elements.  It also seems like there should be some XPath expression  
that I can use with xsl:for-each-group, but I can't quite figure out  
how to write that such that each group has only the minimum number of  
<documentDivision> elements needed to accumulate 100-ish pages.

Do you have any guidance on ways to do this?  I think I'm just having  
a mental block, and a swift kick in the right direction should do the  
trick.


Thanks
Chris



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>