xsl-list
[Top] [All Lists]

Re: [xsl] analyze-string gotcha/reminder

2012-11-19 09:15:47
It's a case where even in retrospect, it's hard to see how we could have
avoided this problem in the language design. Perhaps two separate
attributes, regex and regex-avt. But that feels very heavy-handed. Most
languages have a few quirks like this where people just have to learn the
hard way.


It would be helpful if an XSLT processor issues a warning message when
a single `{` and `}` are used in a regex -- this would immediately
explain to the user the issue and the correction to be made.

Cheers,
Dimitre

On Mon, Nov 19, 2012 at 1:12 AM, Michael Kay <mike(_at_)saxonica(_dot_)com> 
wrote:
I feel your pain. Many of us have lost a few hairs over this one. The good
news is that you probably won't make the same mistake again, or if you do,
you will spot it far more quickly.

It's a case where even in retrospect, it's hard to see how we could have
avoided this problem in the language design. Perhaps two separate
attributes, regex and regex-avt. But that feels very heavy-handed. Most
languages have a few quirks like this where people just have to learn the
hard way.

Michael Kay
Saxonica


On 18/11/2012 18:18, Ihe Onwuka wrote:

Below is a multiple match meant to extract 4 digit numbers from text

                 <xsl:analyze-string select="$line"
regex="(\D|^)(\d{4})(\D|$)">
                    <xsl:matching-substring>
                      <year><xsl:value-of
select="regex-group(2)"/></year>
                    </xsl:matching-substring>
                  </xsl:analyze-string

It doesn't work. I tried exactly the same regex  in XQuery using replace

xquery version "1.0";
replace('Accounting Items                                Dec.31,2005
  Dec.31,2006    Dec.31,2007
Dec.31,2008','(\D|^)\d{4}(\D|$)','xxxx')

it worked and I got

Accounting Items                                Dec.31xxxx
Dec.31xxxx   Dec.31xxxx   Dec.31xxxx

I thought maybe there was special syntax for the multiple match case - but
no.
Eventually I turned to the specification and found this.

Note:
Because the regex attribute is an attribute value template, curly
brackets within the regular expression must be doubled. For example,
to match a sequence of one to five characters, write regex=".{{1,5}}".
For regular expressions containing many curly brackets it may be more
convenient to use a notation such as
regex="{'[0-9]{1,5}[a-z]{3}[0-9]{1,2}'}", or to use a variable.

So I had to double up my curly braces.

There's an hour of my life that I won't get back.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--