Re: [xsl] mixed content grouping by whitespace

Hi,

On Gerrit's excellent explanation of group-adjacent....

At 06:52 PM 4/12/2010, he wrote:

This groups the nodes in the variable you've created by the boolean
(so the truth or falsehood of whether the pattern matches? I didn't
know you could do that in a group-* pattern) of the existence of the
segs you've created on tei:seg/text() which mark the whitespace.
There are two flavours of grouping conditions: patterns andexpressions. group-starting/ending-with require patterns whilegroup-by and group-adjacent accept any XPath expression. The latterare being applied to each item of the so-called population in orderto calculate grouping keys, the former match specific nodes in thepopulation that will lead or terminate a group.

It's really helpful to keep this distinction in mind. One sort ofgrouping works with a key; @group-by or @group-adjacent calculatesthat key. The other sort simply applies a match criterion to eachnode in the group to determine whether it's the particular sort ofnode (group-starting or group-ending) of interest for that sort of grouping.

For all but the nodes marked-up as WS in our example, evaluatingself::tei:seg[(_at_)type='sep'] yields the empty sequence. Since theempty sequence cannot be used as a grouping key for group-adjacent[1], its boolean value is calculated, which is false for emptysequences [2]. I could have used empty() instead of boolean() whichwould just flip each node's true()/false() key. In this case, Iwould have to swap the "when current-grouping-key" and the"otherwise" actions accordingly, or test="not(current-grouping-key())".


Indeed; and "not(self::tei:seg[(_at_)type='sep'])" would work like empty().

Similarly, "exists(self::tei:seg[(_at_)type='sep'])" would work like boolean().

The main thing is that splitting logic is really "group-adjacent"logic in which the key is used to assign nodes to the categories forsplitting. Another illustration of this principle would begroup-adjacent="ceiling(position() div 5)", which splits into groupsof five members (with the last group given the remainder).

Here (the most common case for splitting) those categories are two,hence the expressions returning Boolean values. Booleans are nicesince we can then examine current-grouping-key() straightforwardlywith a test to tell which sort of group (of the two sorts) one is in.

In the word wrap example, it's a matter of taste whether to usegroup-starting-with or group-adjacent. But try to tackle thegroup-adjacent example given in the spec [3] usinggroup-starting-with (or group-ending-with), and you'll find yourselfwriting all kinds of complicated lookaheads and lookbehinds thatfor-each-group promised to liberate you from. The same holds fortrying to solve group-starting-with problems using group-adjacent.There's a reason THey created all 4 forms of for-each-group. AndTHey saw it was good.

Sometimes it's a matter of taste, and sometimes it's a tough call;but group-adjacent is frequently more elegant.


Cheers,
Wendell



======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--