xsl-list
[Top] [All Lists]

Re: [xsl] Moving element up hierarchy unless text nodes

2015-03-03 18:36:45
Cool Wendell!

I've not had a chance to test this out yet, I may have to come back to you
with some questions as I'm really not sure I understand that match
pattern.  I'll have a play with it.

Many thanks!

-James

On Tue, Mar 3, 2015 at 7:48 PM, Wendell Piez wapiez(_at_)wendellpiez(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi again James,

So in the code I posted yesterday I realized at least one more
interesting improvement is possible.

Instead of

<xsl:template match="pb">
  <!-- Only copy the pb if no ancestor considers it 'leading' or
'trailing'. -->
  <xsl:if test="empty(ancestor::*/
        (key('leading-pb',generate-id()) |
         key('trailing-pb',generate-id())) intersect . )  ">
    <xsl:copy-of select="."/>
  </xsl:if>
</xsl:template>

We could have more directly and efficiently

  <xsl:template match="pb">
    <xsl:if test="(. is key('leading-pb',generate-id())) and
            (. is key('trailing-pb',generate-id()))">
      <xsl:copy-of select="."/>
    </xsl:if>
  </xsl:template>


Or even (if you are crazy for match patterns, and who isn't)

<xsl:template match="pb[empty(key('leading-pb',generate-id())) or
      empty(key('trailing-pb',generate-id()))]"/>

These work because the keys bind pb elements to themselves when they
are not 'leading' or 'trailing' (i.e. correctly outside not inside
their parent).

Cheers, Wendell

On Mon, Mar 2, 2015 at 2:11 PM, Wendell Piez wapiez(_at_)wendellpiez(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi James,

So, try this. It works by assigning 'pb' elements to ancestors that
consider them 'leading' (start the element off) or 'trailing'. They
can be retrieved from (for) said ancestor using a key.

Lightly tested.

<xsl:template match="comment() | processing-instruction() | text() | @*">
  <xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="*">
  <xsl:copy-of select="key('leading-pb',generate-id())"/>
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
  <xsl:copy-of select="key('trailing-pb',generate-id())"/>
</xsl:template>

<xsl:template match="pb">
  <!-- Only copy the pb if no ancestor considers it 'leading' or
'trailing'. -->
  <xsl:if test="empty(
    ancestor::*/(key('leading-pb',generate-id()) |
key('trailing-pb',generate-id())) intersect . )  ">
    <xsl:copy-of select="."/>
  </xsl:if>
</xsl:template>

<xsl:key name="leading-pb" match="pb">
  <xsl:apply-templates select="." mode="leading-pb"/>
</xsl:key>

<xsl:key name="trailing-pb" match="pb">
  <xsl:apply-templates select="." mode="trailing-pb"/>
</xsl:key>

<xsl:template match="body/*" mode="leading-pb trailing-pb">
  <xsl:sequence select="generate-id()"/>
</xsl:template>

<xsl:template match="*" mode="leading-pb">
  <xsl:choose>
    <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
preceding-sibling::text()[matches(.,'\S')])">
      <xsl:apply-templates select=".." mode="leading-pb"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="generate-id()"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<xsl:template match="*" mode="trailing-pb">
  <xsl:choose>
    <xsl:when test="empty(following-sibling::*/(. except self::pb) |
following-sibling::text()[matches(.,'\S')])">
      <xsl:apply-templates select=".." mode="trailing-pb"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="generate-id()"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Feel free to ask for any explanation needed. It *seems* to work
(although I often do not trust my lying eyes) ... :-)

Cheers, Wendell

On Fri, Feb 27, 2015 at 6:51 PM, James Cummings
james(_at_)blushingbunny(_dot_)net 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
wrote:

Hi there.

We've been looking at canonicalising use of <pb/> in a large collection
of
TEI P5 XML texts. What we want to do is move this up the hierarchy
unless
there is text before or after it only stopping when there is a sibling
element with textual content or when it hits the body/back/front
elements.
i.e. someone might have encoded:


====input====
<body>
    <div>
        <lg>
            <l><pb n="1"/> some text here</l>
            <l>some text here <pb n="2"/></l>
        </lg>
        <lg>
            <l>some text <pb n="3"/> some text</l>
            <anchor xml:id="test"/>
            <l><pb n="4"/>some text here</l>
            <l>some text here <pb n="5"/></l>
            <anchor xml:id="test2"/>
        </lg>
    </div>
    <div>
        <head>Some Text</head>
        <lg>
            <!-- A comment here -->
            <l><pb n="6"/>Some text</l>
            <l>Some text<pb n="7"/></l>
        </lg>
    </div>
</body>
=====

And what we'd want to end up with is:

=====
<body>
    <pb n="1"/>
    <div>
        <lg>
            <l> some text here</l>
            <l>some text here </l>
        </lg>
        <pb n="2"/>
        <lg>
            <l>some text <pb n="3"/> some text</l>
            <pb n="4"/>
            <anchor xml:id="test"/>
            <l>some text here</l>
            <l>some text here </l>
            <anchor xml:id="test2"/>
        </lg>
    </div>
    <pb n="5"/>
    <div>
        <head>Some Text</head>
        <pb n="6"/>
        <lg>
            <!-- A comment here -->
            <l>Some text</l>
            <l>Some text</l>
        </lg>
    </div>
    <pb n="7"/>
</body>
=====

So as the <pb/> has text before/after it, it stays where it is. It
should
move to the level in the hierarchy where its
preceding-sibling::node()[1]
has text, passing over other empty elements or comments.  (Of course,
as you
might expect) the markup could be any element names, I just use div/lg/l
here because it is short and nicely hierarchicial as an example. My
approach
so far has been, on every element to try to test if there is text()
between
where I currently am and the following::pb[1] by selecting everything
between the start and the pb and looking at its normalised
string-length.
But so far these tests aren't working right, and I haven't even got my
head
round how to do it in reverse for <pb/> at the end.

Has anyone done something like this before that I could look at? Any
suggestions?

Thanks for any help!

-James Cummings
XSL-List info and archive
EasyUnsubscribe (by email)



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^




--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>