xsl-list
[Top] [All Lists]

Re: [xsl] exercise in complex grouping

2020-05-12 05:53:40
I have assumed there are no complicated overlapping cases here but this
works for the one case I tried which includes A before B and B before A


<x>
 blah blah blah
   <d><e>blah</e> blah
   <B target="#A1">blort</B>
   <f>monkey</f> shines
   <A xml:id="A1">snort</A>
   blah
   <A xml:id="A2">snort</A>
   <q/>
   <l>zzz</l>
   <B target="#A2">blort</B>
   <kkkk/>
   </d>
zzz
</x>



----

<xsl:stylesheet version="2.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform";>
 <xsl:template match="node()">
  <xsl:copy>
   <xsl:copy-of select="@*"/>
   <xsl:apply-templates/>
  </xsl:copy>
 </xsl:template>

 <xsl:key name="b" match="B" use="substring(@target,2)"/>
 <xsl:template match="d">
  <xsl:copy>
   <xsl:copy-of select="@*"/>
   <xsl:for-each-group select="node()" group-adjacent="self::B or
self::A[key('b',@xml:id)]">
    <xsl:choose>
     <xsl:when test="current-grouping-key()">
     </xsl:when>
     <xsl:otherwise>
      <xsl:variable name="a" select="preceding-sibling::*[1]"/>
      <xsl:variable name="b"
select="current-group()[last()]/following-sibling::*[1]"/>
      <xsl:choose>
       <xsl:when test="concat('#',$a/@xml:id)=$b/@target or
concat('#',$b/@xml:id)=$a/@target">
<xsl:text>&#10;</xsl:text><C><xsl:text>&#10;</xsl:text>
<xsl:copy-of select="$a"/>
<xsl:text>&#10;</xsl:text>
<xsl:copy-of select="current-group()"/>
<xsl:text>&#10;</xsl:text>
<xsl:copy-of select="$b"/>
<xsl:text>&#10;</xsl:text></C><xsl:text>&#10;</xsl:text>
       </xsl:when>
       <xsl:otherwise>
<xsl:copy-of select="current-group()"/>
       </xsl:otherwise>
      </xsl:choose>
     </xsl:otherwise>
    </xsl:choose>
   </xsl:for-each-group>
  </xsl:copy>
 </xsl:template>

</xsl:stylesheet>



---

produces

<x>
 blah blah blah
   <d><e>blah</e> blah

<C>
<B target="#A1">blort</B>

   <f>monkey</f> shines

<A xml:id="A1">snort</A>
</C>

   blah

<C>
<A xml:id="A2">snort</A>

   <q/>
   <l>zzz</l>

<B target="#A2">blort</B>
</C>

   <kkkk/>
   </d>
zzz
</x>




On Tue, 12 May 2020 at 10:33, Syd Bauman 
s(_dot_)bauman(_at_)northeastern(_dot_)edu <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

I have a moderately sizable TEI file (~31,000 text nodes with ~100,400
"words" or ~688,000 characters; ~20,000 elements, ~15,000 attributes).
Somewhere in all that mess there are a few pairs of elements for which
I need some special processing.

Say each pair is an <A> and a <B>. I can find each <B> by XPath quite
trivially. In addition, for every pair, <B> has a @target that points
to the corresponding <A> via a bare name identifier URL. Furthermore,
every <B> in the document is part of such a pair. (Which is why it is
so trivial to find them via XPath. The same can not be said for <A>:
there are *lots* of <A> elements that are not part of an <A>-<B>
pair; but none, of course, that bear that particular @xml:id, so they
can be found by XPath. It's just easy, not trivial. :-)

In general, there can be other nodes between <A> and <B>, and there
will be cases in which <B> precedes rather than follows the <A> it
points to. E.g.,

   blah blah blah
   <d><e>blah</e> blah
   <B target="#A1">blort</B>
   <f>monkey</f> shines
   <A xml:id="A1">snort</A>
   blah</d>

I want to be able to handle these cases, too.

For the foreseeable future, there will never be another <B> in between
a <B> and the <A> it points to, and each <B> will be a child of the
same element as the <A> it points to. (I.e., no overlap problems.) But
as soon as I say these complications will never happen, the very next
day the editors will gleeful send e-mail saying they have found such a
case. But for now, if needed, I'm willing to write code that presumes
it won't happen.

What I want for output is to be able to wrap the <B> with the <A> it
points to, *and everything in between* in a <C>.

   blah blah blah
   <d><e>blah</e> blah
   <C xml:id="A1Container">
     <B target="#A1">blort</B>
     <f>monkey</f> shines
     <A xml:id="A1">snort</A>
   </C>
   blah</d>

I am 90% confident I can write some messy XSLT 1.0 Muenchian grouping
code that does this. (Although I suspect it would take two passes,
one for <A> precedes <B>, another for <B> precedes <A>; but I don't
care about two passes at all, and would not even care if it took N
passes.[1]) But I am equally confident there is a much better
<xsl:for-each-group> method that, at the moment, I simply can't wrap
my head around.

Thanks for any thoughts, pointers, code, or advice.

Note
----
[1] Where N is proportional to the number <A>-<B> pairs.

--
 Syd Bauman, NRP  (he/him/his)
 Senior XML Programmer/Analyst
 Northeastern University Women Writers Project
 s(_dot_)bauman(_at_)northeastern(_dot_)edu or
 Syd_Bauman(_at_)alumni(_dot_)Brown(_dot_)edu


--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>