xsl-list
[Top] [All Lists]

Re: [xsl] breaking up XML on page break element

2014-07-07 12:55:52
Uhoh I wasn't reading ...

... compare my solution here:

https://github.com/wendellpiez/MITH_XSLT/blob/master/xslt/p-promote.xsl

plus there's an older version here:

http://piez.org/wendell/projects/Interedition2011/lib/p5o-browser-html.xsl

In Luminescent (my "hobby" LMNL processing framework) there's a fair
amount of this stuff (reducing and promoting hierarchies). The fact
that we can generalize methods to do this in XSLT 2.0 is fantastic.
:-)

Cheers, Wendell


On Mon, Jul 7, 2014 at 4:53 AM, Geert Bormans
geert(_at_)gbormans(_dot_)telenet(_dot_)be 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
wrote:
Hi Gerrit,

First my congratulations to the German team
(I admit they should have scored an extra goal...
given I made a bet at the office for 2-0, that would have brought me some
cash :-)

Thanks very much for this solution.
It is exactly what I was looking for.
It seems robust and elegant, and I love patterns with a name ;-)

Thanks a ton

Geert


At 20:20 4/07/2014, you wrote:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:template match="* | @*" mode="#default">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()" mode="#current"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="book" mode="#default">
    <xsl:variable name="context" select="." as="element(book)" />
    <xsl:copy>
      <xsl:for-each-group select="descendant::node()[not(node())]"
group-starting-with="pb">
        <xsl:copy-of select="self::pb"/>
        <xsl:apply-templates select="$context/*" mode="split">
          <xsl:with-param name="restricted-to"
select="current-group()/ancestor-or-self::node()" tunnel="yes"/>
        </xsl:apply-templates>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="node()" mode="split">
    <xsl:param name="restricted-to" as="node()+" tunnel="yes" />
    <xsl:if test="exists(. intersect $restricted-to)">
      <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:apply-templates mode="#current" />
      </xsl:copy>
    </xsl:if>
  </xsl:template>

  <xsl:template match="pb" mode="split"/>

</xsl:stylesheet>

On 04.07.2014 18:31, Geert Bormans 
geert(_at_)gbormans(_dot_)telenet(_dot_)be wrote:

Thanks Gerrit,
(I admit I need to read this twice to get it, but that might be caused
by the 0-1 and me not trying to miss all of the fun in Rio)
I will look into it after the match


At 17:18 4/07/2014, you wrote:

I tackle it by what I call “upward projectionâ€Â :

:


When processing the top-level element, do a for-each-group of all
descendants that are terminal nodes (those without children), with a
group-starting-with at the splitting points.

For each group, process the book (or the HTML body, or whatever common
ancestor there is) once in another mode, with a tunneled parameter
'restricted-to' that contains, for each group, the terminal nodes and
their ancestors.

When processing each group, for each node that you encounter, test
whether the node is contained in the tunneled variable (using
intersect). If it is, reproduce the node and continue in this mode, if
it isn̢۪t contained, do nothing.

.


There may be an option to discard or to reproduce the splitting
elements.

Examples for this technique are in
https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl, modes
hub:split-at-tab and hub:split-at-br

They are a bit more complex than your case because they split
paragraphs that may contain tables or footnotes that in turn can
contain other paragraphs. I introduced the function
hub:same-scope($splitting-element, $containing-element) to split only
at splitting elements that are contained within the paragraph that
should be split, rather than in a paragraph that is contained in a
footnote or table cell that is somehow contained in the given paragraph.

I might prepare a synthetic standalone example if anyone is
interested, and furthermore on the condition that interested parties
root for Germany instead of France today.

Gerrit

On 04.07.2014 16:43, Geert Bormans 
geert(_at_)gbormans(_dot_)telenet(_dot_)be wrote:

Hi all,

Here is a fun one I thought I could share

I have a nicely nested XML (a bit TEI like)
and markers for page breaks can happen everywhere in the document (as
empty elements)

Now I want to break the document per page, reconstructing the structure
So in a first step, I want to isolate the pagebreak to the highest
level

<book>
<title>...</title>
<section>
<para>aaa<pb/>bbb</para>
</section>
</book>

to become

<book>
<title>...</title>
<section>
<para>aaa</para>
</section>
<pb/>
<section>
<para>bbb</para>
</section>
</book>

Bearing in mind I need a generic solution
and pagebreaks can happen at every level

Any thoughts?
I am not looking for code, just curious on how people would attack this

Thanks

Geert


--
Gerrit Imsieke
Geschäftsführer / Managing Director

le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler





-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>