[Top] [All Lists]

Re: [xsl] Are XPath expressions parsed using compiler parsingtechniques?

2022-05-10 00:59:28
Hi Roger,

The second version of Fleur, my own XQuery engine written in _javascript_, is also including its own handwritten parser. This one is a _javascript_ port from the XSLT stylesheet of old XSLTForms versions which parsed XPath expressions into _javascript_ instructions. It can be tricky for a homemade parser to consider, first, "if" as an element name, then, "if(a eq b)" as a function call, then "if(a eq b) then" as the beginning of an If-then-else statement, because "if" is not a keyword.

About errors detection, it is quite helpful for developers to locate each error within the corresponding source text. This location collect has to be implemented within the parser.

Static errors are to be detected at parser/compiler step. Some are grammatical errors and parsing will detect them. Others are due to type errors: after parsing, some kind of evaluation has to be performed not with values but with types (sequence types, actually), and sometimes union of types, to avoid most type checks at runtime. It is also surely possible to evaluate static expressions, such as 3 + 5, and to normalize values, such as 7.00. For optimising the compiler output, it is interesting to consider that atomization is useless on atoms. For runtime, considering position predicates, such as [1], is a common optimisation.

Dynamic errors handling requires links to source within the result of the parser/compiler. It is also required for a step debugger.


The XPath parser used in Saxon is a hand-written hybrid of a recursive-descent parser and a precedence parser.

The very first version was actually derived from James Clark's xt parser, and over the years it evolved out of all recognition, but without ever being redesigned from scratch. I have no idea if throwing it away and starting again with a generated parser would give any benefits. I suspec that having a hand-written parser gives us more control over diagnostics and error recovery, and it also enables us to support multiple grammars (different versions of XPath and XQuery, plus XSLT patterns) within a single parsing framework.

XPath (and even more so XQuery) has a lot of ad-hoc rules for resolving ambiguities, for example the rule that in the _expression_ (/ or /*), "or" parses as an element name, not as a binary operator (I don't think this ambiguity was even discovered for many years after XPath 1.0 was published). Again, I think it's probably easier to implement such ad-hoc rules with a hand-written parser. But someone who understands a particular parser generator well could probably find a way to do it.

Michael Kay

Roger wrote:

>> Are XPath expressions parsed using
>> compiler parsing algorithms?

Michael Kay responded:

Yes, of course
Hi Michael, does that mean each time Saxon encounters a place in an XSLT program where an XPath _expression_ is expected, Saxon sends the _expression_ into an XPath parser which tokenizes the _expression_, parses it into a syntax tree, and then traverses the tree to evaluate the _expression_? Did you use a parser generator to auto-generate the parser? If yes, which parser generator did you use? If you didn't use a parser generator, why not?


<Prev in Thread] Current Thread [Next in Thread>