The construct
(?=X)
is allowed in some regex dialects, it means "match X with a zero-width
positive lookahead". But it's not allowed in the XPath regex dialect. This
is basically an assertion that X must match at the current position, without
causing X to be swallowed.
This construct (a zero-width negative lookahead) isn't allowed either:
(?!X)
This is the inverse: it asserts that X does not match at the current
position, without swallowing X.
I'm afraid I have no idea whether these constructs can be translated into
anything that the XPath regex dialect permits.
Gunther Schadow can say "told you it would be needed":
http://www.stylusstudio.com/xsllist/200412/post00810.html
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Andrew Welch [mailto:andrew(_dot_)j(_dot_)welch(_at_)gmail(_dot_)com]
Sent: 10 July 2007 11:29
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] csv to xml converter bug
The csv-to-xml solution here:
http://andrewjwelch.com/code/xslt/csv/csv-to-xml.html
...has a bug where
,,"foo,bar",,x,,
generates the tokens:
<token/>
<token/>
<token/>
<token>"foo,bar"</token>
<token/>
<token/>
<token>x</token>
<token/>
<token/>
The x should be at position 5 but is at position 7 because
the commas either side of the quoted values aren't being
included with the value itself, and are generation extra
tokens in the xsl:non-matching-substring block.
I've tried various ways to modify the solution to fix the
bug, but always ran into problems with other strings, such as:
"foo,bar",,"foo,bar",x,,,"foo,bar"
If you include leading or trailing commas with the quoted
values then the empty value at position 2 here gets consumed.
Maybe a better regex would help here, but I couldn't write
one... (Or perhaps if the non-matching-substring block had
access to some information about the matching-substring block...)
I had a dig around the net and found a regex[1] that could be
sufficient to just use with tokenize, but it causes the error:
FORX0002: Error at character 2 in regular expression
",(?=([^\"]*\"[^\"]*\")*(?![^\"...":
expected ())
It works in the "The Regex Coach", but not in XSLT (with
Saxon 8.9.0.3b)
The code is:
<xsl:variable name="regex"
as="xs:string">,(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))</xsl:variable>
<xsl:function name="fn:getTokens" as="xs:string+">
<xsl:param name="str" as="xs:string"/>
<xsl:sequence select='for $t in tokenize($str, $regex)
return replace($t, "^,""|"",$|("")""", "$1")'/>
</xsl:function>
It's an unusual looking regex (to my novice eye) - any
explanation as to whats going on would be great.
thanks
andrew
[1] http://weblogs.asp.net/prieck/archive/2004/01/16/59457.aspx
--
http://andrewjwelch.com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--