RE: [xsl] Grammars for XPath 2.0: which to use?

I don't think there were many significant grammar changes after the book was
printed. Only one or two minor ones like changing empty() to
empty-sequence(). There may also have been a few clarifications of lexical
rules, for example the fact that (10div 3) is illegal - there must be a
space between "10" and "div". (This question arose with (if($X)then 10else
20) where the "e" can be read as part of a numeric literal).

At the time I wrote the book, the draft spec was still using compound
symbols like <"cast" "as">. These subsequently came out, as a result of a
decision to present a spec that was more a description of the legal
sentences in the language and less a recipe for writing a parser. Although
in the book I was definitely writing for users of the language rather than
parser-writers, I didn't want to depart too far from the published grammar,
so these compound symbols appear as <cast as>, which I think is actually
quite a good compromise, though you need to read the accompanying text to
see that you're actually allowed to have a comment in the middle of it.

Some of the complexity in the spec, especially the Note you reference (which
was at one time part of the spec) arises from XQuery, which adds quite a few
complications to the already-complicated rules for XPath. I think it's true
that in XPath, unlike XQuery, you can tokenize without knowledge of the
grammatical context. The Saxon parser does a "raw" tokenization which for
XPath is essentially context-free, and then adds some processing between the
lexer and the syntax analyzer which essentially classifies tokens more
precisely based on the immediately preceding and following tokens - so
there's a separation between the two traditional tasks of a lexer, splitting
the text into tokens and classifying the tokens. But in other cases, for
example the distinction between "+" as an operator and "+" as an occurrence
indicator, it's left to the syntax analyzer to distinguish them.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Dimitre Novatchev [mailto:dnovatchev(_at_)gmail(_dot_)com] 
Sent: 13 July 2007 05:12
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Grammars for XPath 2.0: which to use?

Recently I've been having fun with parsing context-free 
languages using a general parser for LR languages, written in 
XSLT 2.0.

The first and easier language was JSON, leading to the 
addition of two new functions to FXSL:
      f:json-document()
and
     f:json-file-document()

as reported in this list and in my blog.

The second language I played with was XPath. As I mentioned 
earlier in this list, it was almost straightforward and 
non-problematic to create a working parser (right now 
constructing just a parse tree for an XPath expression). The 
reason for this easiness is that Dr. Kay's XPath 2.0 book is 
an excellent reference material both in describing the 
terminal symbols (lexical tokens) of the language and its grammar.

My question is whether the XPath 2.0 grammar as described in 
the book is still equivalent to the one described in the 
XPath 2.0 recommendation (http://www.w3.org/TR/xpath20/#id-grammar)

or if there are any differences?

Certainly, I could try comparing both grammars myself, but 
why not ask and get this valuable information straight from 
the horse's mouth? I believe this is also valuable to the 
readers of xsl-list.

As the official W3 XPath 2.0 recommendation is not so easy to 
read as Dr. Kay's book, I would prefer to be able to continue 
using the grammar from his book (possibly with appropriate 
modifications).

The same question can be asked about the definition of the 
terminal symbols. Here we have:

  1. Dr. Kay's book.

  2. The official W3 XPath 2.0 recommendation
(http://www.w3.org/TR/xpath20/#terminal-symbols)

  3. A seemingly outdated W3 document "Building a Tokenizer 
for XPath or XQuery" (http://www.w3.org/TR/xquery-xpath-parsing/)

In implementing the lexical scanner (again in pure XSLT 2.0) 
I again used Dr. Kay's book (1), found (2) quite confusing, 
and definitely decided not to use any of the approaches 
described in (3). It might be interesting to know that 
determining the next terminal symbol can be accomplished 
based on a the evaluation of a single regular expression 
(shall I call this "one-pass approach" ?).

--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant 
intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
You've achieved success in your field when you don't know 
whether what you're doing is work or play

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--