xsl-list
[Top] [All Lists]

[xsl] [answered] collecting multiple tokenize() results into one sequence

2008-07-24 08:45:46
Hello,
I spent a good while writing this post, then found the answer before I
posted it!
But I think I'll go ahead and post, in case it helps somebody (e.g. me)
find the answer in the future when facing a similar problem.

I'm trying to take advantage of XSLT 2.0 features to create an index of
keywords.

The input XML has a <meta> tag containing keywords delimited by commas:

<items>
   <item>
       <rec>001</rec>
       <name>7-Zip</name>
       <meta>zip,compress,uncompress,rar,archive</meta>
   ...</item>
   <item>...</item>
</items>

There are many <item> elements, each with a <meta> child.
I want to tokenize the contents of the many //item/meta elements into
one long sequence of strings.
Then I can loop over the distinct values of the resulting sequence (in
alphabetical order) to output an index section for each keyword.

My current attempt is:
       <!-- gather all tokens into one sequence of strings, then group
by identical strings -->
       <xsl:for-each-group select="for $tags in //item/meta return
tokenize($tags, ',')" group-by=".">
           <xsl:sort select="." /> <!-- alphabetical order -->
           <h2><xsl:value-of select="."/>:</h2>
           <ul>
               <xsl:for-each select="//item[contains(meta, .)]">
                   <xsl:sort select="name"/>
                   <li>...

But this gives me (in Saxon 9B) the compile-time error
Error on line 208 of file:....xsl:
 XPTY0020: Cannot select a node here: the context item is an atomic value

Line 208 is the xsl:for-each (not the for-each-group).
I don't understand why it is a problem that the context item (which
should be a string, the first thing in current-group(), right?) is an
atomic value.
The select of the for-each on line 208 does not depend on the context
item, does it?
I tried replacing "." with "current-grouping-key()" on that line, but
it made no difference; same error.


==============================
OK, after going around several iterations, I have gotten to the root of
the matter: why does Saxon say "Cannot select a node here: the context
item is an atomic value" for the
        xsl:for-each select="//item[contains(meta, .)]"?
No doubt some of you already know this. I found the answer at
http://www.oxygenxml.com/archives/xsl-list/200510/msg00444.html
Because "absolute paths" are not absolute at all: they select relative to
the root of the tree containing the context node. You've got to know which
document to look in.

Michael Kay
Sure enough. You can't select "//item..." because that path is relative
to the current document, which is determined from the current node, and
is undetermined when the context item is not a node (e.g. just a string).

I guess the lesson here for me is to take Saxon's error messages more
seriously; when they don't make sense, google them and find out what
they mean.

A suggestion for improvement for Saxon: the error would be clearer if it
changed the part that says "cannot select a node here".
The latter seems misleading, and that only once you've figured out what
it means. From what I understand now, you CAN select nodes "here" (if
"here" means "with the context item being what it is", rather than some
syntactic consideration); you just have to be more absolute, by
specifying what document you're talking about:
   <xsl:for-each select="document('')//item[contains(meta, .)]">
or
   <xsl:for-each select="$all-items[contains(meta, .)]">

It may obvious to Michael Kay that "cannot select a node here" means you
have to specify what document you're talking about, but I think most of
the time, most of us don't even remember that "/..." isn't really absolute.

In fact, the XPath 1.0 spec specifically says
"An absolute location path consists of / optionally followed by a
relative location path."
Then it explains (contradictorily, in light of what Michael Kay said
above), "A / by itself selects the root node of the document containing
the context node."

I see that "absolute" is not there in the XPath 2.0 spec. Also, the
descriptions of "/" and "//" make the error conditions crystal clear:

A "|/|" at the beginning of a path expression is an abbreviation for the
initial step |fn:root(self::node()) treat as document-node()/| (however,
if the "|/|" is the entire path expression, the trailing "|/|" is
omitted from the expansion.) The effect of this initial step is to begin
the path at the root node of the tree that contains the context node. If
the context item is not a node, a type error
is raised [err:XPTY0020]. At evaluation time, if the root
node above the context node is not a document node, a dynamic error
is raised [err:XPDY0050].
(Similarly for "//".)

But who reads the new description of "/" if they already know XPath from
1.0?? :-)


How to clarify the error message? What about (borrowing language from
the above paragraph):
"Invalid initial / or // in path step: Cannot select the root node of
the tree that contains the context node, because the context item is not
a node."

It's a little long, but given that I'm not the first one who has
been unhelped by the existing error message, wouldn't it be worth
it to make this somewhat obscure problem clearer?

Regards,
Lars





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>