xsl-list
[Top] [All Lists]

[xsl] Solution: Sorting on arbitrary string values (Was: [xsl] XSLT 1.0: Problem grouping disparate unordered data)

2006-03-16 09:14:45
Greetings all,

I finally solved my problem, with help from Wendell Piez and G. Ken
Holman, and thought I'd post my solution in case it helps anybody in the
future. I'll leave the original mail at the bottom for reference by
anybody reading this in the list archives.

Wendell was the first to respond, pointing out that I needed to find a way
to create a custom sort key by mapping my arbitrary text values to the
required order. Ken then demonstrated how to do this using the document()
function, which should have been the end of my worries. However due to a
server configuration issue, the document() function won't work for me, and
things are too busy here to start meddling with production servers, so I
needed another approach.

I finally realised that I could use a trick I first learnt when working in
the Forth programming language: arithmetic using Boolean values converted
to numbers.

In XSLT, the Boolean value "true" converts to the number one, and "false"
to zero. I therefore used the following technique:

<xsl:for-each select="apes/ape">
    <xsl:sort data-type="number" order="ascending"
        select="(number(@type='Gorilla') * 1)
              + (number(@type='Chimpanzee') * 2)
              + (number(@type='Orangutan') * 3)
              + (number(@type='Bonobo') * 4) />
    <!-- other <xsl:sort/> elements for priority and date here -->
    <xsl:if test="position() &lt;= 3">
        <xsl:apply-templates select="." />
    </xsl:if>
</xsl:for-each>

What is happening is that the value of the @type attribute is compared to
each possible string value. If it is equal to one of those values, the
number() function converts it to 1; if not equal, to zero. (The use of the
number() function is a belt-and-braces thing which will also hopefully
make this code slightly clearer to the poor soul who has to maintain it.)
This value is then multiplied by a value between 1 and 4, and the result
of all of these values is summed.

As the attribute can only be equal to one of the values to which it is
compared, the majority of those multiplications will equate to zero; only
one will equate to a value other than zero. So if we consider the case of
an orangutan, the sum will be

(0 * 1) + (0 * 2) + (1 * 3) + (0 * 4)

which equals 3. With an ascending sort, the apes come out in the desired
order, and I can start thinking about something else.

While Ken's solution is much more elegant, and much easier to extend to a
greater number of possible values, anybody who finds themselves in the
same bind as I did may find this technique useful. The general principle
of using the numeric equivalent of Boolean values is one that can be
applied in numerous circumstances, although it's as well to document it
for those who come after.

For what it's worth, the actual data I was using were in TrafficMasterML
4.0, an XML format used by the company TrafficMaster to deliver constantly
updated traffic and travel news within the UK (and probably elsewhere). If
anybody finds this in the archives because they're looking for info about
transforming this particular XML application, feel free to contact me if
you need any help.

Thanks again to Wendell and Ken,

Nick.
-- 
Nick Fitzsimons
http://www.nickfitz.co.uk/

Original message follows:
Hi all,

Firstly, I'm using XSLT 1.0.

As my real dataset is very large and boring, I'm presenting this problem
in terms of an input document which represents the same problems in a less
noisy form.

This is the simplified example input:

<apes>
      <ape priority="5" date="25-01-2006" type="Chimpanzee" />
      <ape priority="1" date="26-01-2006" type="Gorilla"    />
      <ape priority="2" date="29-01-2006" type="Chimpanzee" />
      <ape priority="1" date="22-01-2006" type="Orangutan"  />
      <ape priority="3" date="22-01-2006" type="Bonobo"     />
      <ape priority="1" date="25-01-2006" type="Bonobo"     />
      <ape priority="4" date="24-01-2006" type="Gorilla"    />
      <ape priority="5" date="22-01-2006" type="Bonobo"     />
      <ape priority="4" date="26-01-2006" type="Chimpanzee" />
      <ape priority="4" date="25-01-2006" type="Gorilla"    />
      <ape priority="2" date="25-01-2006" type="Bonobo"     />
      <ape priority="3" date="25-01-2006" type="Orangutan"  />
      <ape priority="1" date="25-01-2006" type="Bonobo"     />
      <ape priority="3" date="27-01-2006" type="Gorilla"    />
      <ape priority="1" date="25-01-2006" type="Chimpanzee" />
      <ape priority="1" date="25-01-2006" type="Orangutan"  />
</apes>

The rules for grouping and sorting are fairly simple in the basic case:

Show Gorillas of priority greater than 2, sorted by priority, then by date
descending;
Show Chimpanzees of priority greater than 2, sorted by priority, then by
date descending;
Show Orangutans of priority greater than 2, sorted by priority, then by
date descending;
Show Bonobos of priority greater than 2, sorted by priority, then by date
descending;

and I'm done.

The current approach is basically to get all the apes of one kind:

<xsl:variable name="gorillas" select="apes/ape[(_at_)type='Gorilla' and
number(@priority) &gt; 2]"/>

and they are then sorted on priority and date; that code is
straightforward, apart from the substring-before shenanigans required to
get those annoying UK dates to sort correctly.

The above process is repeated for each species. Leaving out irrelevancies,
the results for the above document would be along the lines of:

<section>
    <Gorilla    priority="4" date="25-01-2006" />
    <Gorilla    priority="4" date="24-01-2006" />
    <Gorilla    priority="3" date="27-01-2006" />
</section>
<section>
    <Chimpanzee priority="5" date="25-01-2006" />
    <Chimpanzee priority="4" date="26-01-2006" />
</section>
<section>
    <Orangutan  priority="3" date="25-01-2006" />
</section>
<section>
    <Bonobo     priority="5" date="22-01-2006" />
    <Bonobo     priority="3" date="22-01-2006" />
</section>

However, I now have the further requirement that I return a single
<section> containing only the first three items (or fewer if less than
three match the "priority greater than 2" criterion). In this example that
would give just the three gorillas. If all but one of the gorillas
escaped, I would have to output that remaining gorilla followed by the two
chimpanzees; and if all the gorillas got away, with one of the chimps as
driver, I have to return the remaining chimp, the orangutan, and the
highest-priority bonobo.

Given that:

The types are arbitrary names (not sortable);
The initial dataset is not sorted on any field, and cannot be as it's
coming from an external provider;
I'm not permitted to use any extension functions like nodeset(), as my
client may want to move to a different XSLT processor at a later date;

how can I achieve the necessary grouping and sorting?

I've been racking my brains over this one and I'm almost certain some
straightforward Muenchian grouping will suffice, but as we're 2 days away
from taking the system live and I'm dealing with a new bug report every
half hour or so spanning XSLT, HTML, CSS and JSP, I'm finding it hard to
get the time to really wrap my head round this one. Any help/advice would
be greatly appreciated.

For the curious: the real data has nothing to do with apes; I just thought
they'd brighten the place up. No simians were harmed in the creation of
this cry for help :-)

TIA,

Nick.
--
Nick Fitzsimons
http://www.nickfitz.co.uk/


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>