xsl-list
[Top] [All Lists]

Re: [xsl] Finding Only Initial Following Siblings That Meet Some Criteria

2020-02-05 18:09:17


On 06.02.2020 00:42, Eliot Kimber ekimber(_at_)contrext(_dot_)com wrote:
In my case, I must start with the first instance of the matching phrase anywhere in 
the source document (I'm pulling stuff that could be anywhere to a specific location) 
and then only want to consider things that immediately follow that specific 
<ph> element.

You didn’t show that you were matching the first occurrence of ph[@outputclass] in the whole document. Well, if you did and if ph[@outputclass] may occur at other places than following siblings of that first occurrence, your solution won't group them all. It won't consider ph[@outputclass] in the next <p>, for example.

If you want to match the first ph[@outputclass] in each p though and if the content that precedes this occurrence ought to be preserved (but not duplicated), you need to process the preceding siblings, too. The danger in looking ahead and behind instead of processing all nodes, by means of a grouping of the parent, is that you might end up processing nodes twice if you just do an apply-templates in p, or not processing them at all if you only start with the first occurrence and don't process its preceding siblings.


So unless I'm missing a subtlety of your solution, I don't think it would do 
quite what I want because it's too inclusive.

I'd argue that what I proposed is not so subtly different from your solution. As a principle, it's "always try to process all nodes in a given context with for-each-group, and avoid cherry-picking specific child nodes that will or will not also group some of their siblings".

But unless I know which node you matched and what else you processed in that context, I don't know whether you made sure to avoid duplicating or neglecting content.

Gerrit


Cheers,

E.
--
Eliot Kimber
http://contrext.com
On 2/5/20, 5:07 PM, "Imsieke, Gerrit, le-tex 
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

     Grouping should liberate you from looking ahead or behind. So instead of
     matching the first <ph outputclass="x">, you'd match <p> (or more
     generally '*[ph[@outputclass]]') and do the group-adjacent grouping for
     the child nodes, like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
        version="3.0">
        <xsl:template match="*[ph[@outputclass]]">
          <xsl:copy>
            <xsl:apply-templates select="@*" mode="#current"/>
            <xsl:for-each-group select="node()"
              group-adjacent="string(self::ph/@outputclass)">
              <xsl:choose>
                <xsl:when test="current-grouping-key()">
                  <xsl:element name="{current-grouping-key()}">
                    <xsl:value-of select="current-group()"
                      separator=""/>
                  </xsl:element>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:apply-templates select="current-group()"
                    mode="#current"/>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:for-each-group>
          </xsl:copy>
        </xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
     </xsl:stylesheet>
This is not shorter in terms of lines of code than what you suggested.
     In terms of performance, it could be a bit more efficient than your
     solution, depending on the cost of identifying the first
     ph[@output-class] and its following siblings, compared to the cost of
     identifying a parent of ph[@output-class] and selecting its children.
But as I wanted to say above, in terms of idiomatic XSLT 2+ purity, I'd
     always prefer a solution that doesn't look along the preceding/following
     axes, even when it is done just once for selecting the for-each-group
     population.
Gerrit On 05.02.2020 23:29, Eliot Kimber ekimber(_at_)contrext(_dot_)com wrote:
     > In my XML I can have adjacent elements that should be processed as a 
unit, where the adjacent elements all have the same value for a given attribute. 
Other elements with the same attribute could be following siblings but separated 
by other elements or text nodes, i.e.:
     >
     > <p>Text <ph outputclass="x">1</ph><ph outputclass="x">2</ph> more text <ph 
outputclass="x">New sequence</ph></p>
     >
     > Where the rendered result should combine the first two <ph> elements but 
not the third, i.e.:
     >
     > <p>Text <x>12</x> more text <x>New sequence</x></p>
     >
     > Processing is applied to the first element in the document with the @outputclass 
value "x" and then I want to grab any immediately following siblings with the same 
@outputclass value and no intervening text or element nodes.
     >
     > My solution is to use for-each-group like so:
     >
     >      <xsl:variable name="this" as="element()" select="."/>
     >      <xsl:variable name="adjacent-sibs" as="element()+">
     >        <xsl:for-each-group select="($this, 
$this/following-sibling::node())"
     >          group-adjacent="string(@outputclass)">
     >          <xsl:if test=". is $this">
     >            <xsl:sequence select="current-group()"/>
     >          </xsl:if>
     >        </xsl:for-each-group>
     >      </xsl:variable>
     >
     > Which works, but I'm thinking there must be a more compact way to do the 
same selection, but the formulation is escaping me.
     >
     > Is there a more compact or more efficient way to make this selection of 
only immediately-adjacent following siblings?
     >
     > Thanks,
     >
     > E.
     > --
     > Eliot Kimber
     > http://contrext.com
     >
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>