xsl-list
[Top] [All Lists]

Re: [xsl] Special characters in regex expression

2014-07-23 23:47:07
On 23/07/2014, Michael Dykman mdykman(_at_)gmail(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
It is my understanding that Java' regular expression builtin emulates
'pcre' pretty closely.

Perl 5 has, over time, added some rather unique features that aren't
available with Java. XPath is a subset of Java's regex.


To escape spacial characters that have special meaning in a regular
expression, defining it as a character class (using the square bracket
notation) generally works

ie. if you want to match a question mark at the beginning of a line,
use:  "^[?].*$"

Thus,  regex="(\.|\!|\?)(?!\)|\.|\d|\w)" (ignoring the lack of look-ahead)
were better rewritten as

     regex="[.!?](?![).\d\w])" <!-- not valid -->

It is possible to select groups within the matching substring:

     regex="([.!?])([^).\d\w])"

Thus, in this simple case it is possible to use regex-group(1) and
regex-group(2)
to get the two characters individually, and insert nodes as required.

I am not sure what Gabor expects to happen with, e.g., "...??..." or
"...!!...", which are matched by this regex.

-W


On Wed, Jul 23, 2014 at 3:55 PM, mike(_at_)saxonica(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Exclamation mark is not a special character in XPath regular expressions,
and there does not need to be (and must not be) escaped.

Negative lookaheads are not supported in the XPath regular expression
dialect.

You can't assume that all regular expression dialects are the same.

Michael Kay

Saxonica



Dear All,

I am using xsl:analyze-string to retrieve and replace punctuation,
however, I got the following error:

 Error in regular expression: net.sf.saxon.trans.XPathException: Syntax
error at char 6 in regular expression: Escape character '!' not allowed.

How should I escape and match '?' and '!' ? I am also using a negative
look-ahead, why isn't that working?

Here is a sample from my code, thanks,

Gabor


<xsl:template match="//TEI:p//text()[ not
        ((parent::TEI:note)|(parent::TEI:hi)|(parent::TEI:date))]">
 <xsl:analyze-string select="." regex="(\.|\!|\?)(?!\)|\.|\d|\w)">

            <xsl:matching-substring>

                <xsl:element name="seg"
namespace="http://www.tei-c.org/ns/1.0";><xsl:value-of
select="."/></xsl:element>
           </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>


XSL-List info and archive
EasyUnsubscribe (by email)



--
 - michael dykman
 - mdykman(_at_)gmail(_dot_)com

 May the Source be with you.


--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>