xsl-list
[Top] [All Lists]

Re: [xsl] Dividing documents based on size of contents

2009-05-27 15:10:21
Thanks to Emmanuel and Michael for the great answers. I modified my code to use Michael's solution (using one pass through the source instead of two) and it seems to be working.

Cheers
Chris

On May 27, 2009, at 1:12 AM, Michael Kay wrote:


I think this is a case for "sibling recursion" - in fact, it's the example I use on training courses, if I think the group is capable of tackling the problem (it tends to cause significant headache, and people are typically amazed how after 3 hours head-scratching, the answer turns out to be about
ten lines of code).

It's probably easiest to do this in two phases: the first phase copies the
documentDivision elements, inserting a <documentBreak/> element where
appropriate, and the second phase uses for-each-group
starting-with="documentBreak" to create the document elements.

The sibling recursion works like this

<xsl:template match="documentDivision">
  <xsl:param name="size-so-far" as="xs:integer"/>
  <xsl:variable name="new-size-so-far" as="xs:integer"
                select="$size-so-far + count(pagebreak)"/>
  <xsl:variable name="start-new-document" as="xs:boolean"
                select="$new-size-so-far gt 100"/>
  <xsl:copy-of select="."/>
  <xsl:if test="$start-new-document">
    <documentBreak/>
  </xsl:if>
<xsl:apply-templates select="following- sibling::documentDivision[1]">
    <xsl:with-param name="size-so-far"
select="if ($start-new-document) then 0 else $new-size-so- far"/>
    </xsl:with-param>
  </xsl:apply-templates>
</xsl:template>


and then you start the process off with

<xsl:template match="document">
  <xsl:apply-templates select="documentDivision[1]"/>
</xsl:template>

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay


-----Original Message-----
From: Chris von See [mailto:chris(_at_)techadapt(_dot_)com]
Sent: 27 May 2009 02:54
To: xsl-list
Subject: [xsl] Dividing documents based on size of contents

Hi all -

I have what I think is a fairly simple problem, but I'm
having trouble with the implementation in XSLT.  Any help you
could give would be greatly appreciated.

I have a document which is subdivided into multiple sections,
with each section, in turn, divided into pages as shown below:

<document>
        <documentDivision>
                ... arbitrary content ...
                <pagebreak />
                ... arbitrary content ...
                <pagebreak />
        </documentDivision>

        ... arbitrary number of <documentDivision> elements ...

</document>

Each <documentDivision> section of the document can have an
arbitrary number of <pagebreak> elements, and an arbitrary
amount of content between <pagebreak>s.

I'd like to be able to break the input <document> into
multiple <document>s, each of which has the minimum number of
<documentDivision> sections that give it a <pagebreak> count
~100 pages.  I'd like to break the input at
<documentDivision> boundaries, but I don't need the output
documents to be equally sized or to be exactly 100 pages long
- just as close to that size as I can reasonably get while
maintaining the <documentDivision> boundaries.

So for example if I have an input document that looks like this:

<document>
        <documentDivision>
                ... content containing 50 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 50 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 127 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 5 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 23 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 78 <pagebreak /> elements ...
        </documentDivision>
</document>

the output documents should look like this, with each output
document being "close" to 100 pages in length:

<!-- This doc has enough <documentDivision> elements to give
exactly 100 pages. --> <document>
        <documentDivision>
                ... content containing 50 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 50 <pagebreak /> elements ...
        </documentDivision>
</document>

<!-- This doc has a single <documentDivision> element with
127 pages - close enough! --> <document>
        <documentDivision>
                ... content containing 127 <pagebreak /> elements ...
        </documentDivision>
</document>

<!-- This doc has a three <documentDivision> elements of 5,
23 and 78 pages each - close enough! --> <document>
        <documentDivision>
                ... content containing 5 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 23 <pagebreak /> elements ...
        </documentDivision>
        <documentDivision>
                ... content containing 78 <pagebreak /> elements ...
        </documentDivision>
</document>

I've been able to figure out how to get the number of
<pagebreak>s per <documentDivision> and how to calculate the
number of <pagebreak>s in any given group of
<documentDivision> sections, but what I'm not sure of is how
to maintain information about the point at which I last
created a new output document so that I can determine what
group of <documentDivision> elements has a page count around
100 and should therefore be used to create a new output
document.  It seems that the best way to carry this context
would be via params to xsl;apply- templates, but I'm not
clear on how to set up the XSLT code so that the state gets
maintained as I iterate through <documentDivision> elements.
It also seems like there should be some XPath expression that
I can use with xsl:for-each-group, but I can't quite figure
out how to write that such that each group has only the
minimum number of <documentDivision> elements needed to
accumulate 100-ish pages.

Do you have any guidance on ways to do this?  I think I'm
just having a mental block, and a swift kick in the right
direction should do the trick.


Thanks
Chris



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


Chris von See
Senior Geek
TechAdapt, Inc.
2910 Heights Dr.
Bellingham, WA  98226

E: chris(_at_)techadapt(_dot_)com
P: +1 360 223 1514
F: +1 360 544 0112

Save trees.  Print only when necessary.




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>