xsl-list
[Top] [All Lists]

Re: [xsl] two regexp related questions

2011-05-19 15:24:46


On 2011-05-19 21:16, Julian Reschke wrote:
On 2011-05-19 20:51, Brandon Ibach wrote:
For 2), if you're using the regex to both validate the input (making
sure it conforms to the required syntax) and parse/extract the
name/value pairs, you might be able to make the job easier by breaking
these two tasks apart. Use the regex as you have it now to validate
the input and then, if it matches, use a shorter regex that matches
just a single name/value pair with analyze-string to do the actual
processing.

-Brandon :)

That's more or less what I do know. But as long as the regex contains a
repeating pattern, <xsl:matching-substring> will only be invoked once,
and the regex-group function will only return the contents for the last
match, right?

I think it depends on the implementation. I couldn't see anything in the spec about what regex-group(3) of
([a-z]+)=([a-z]+)(;([a-z]+)=([a-z]+))*
should be. In Saxon, it's ';e=f' for your example, but in principle it could also be ';c=d'.

As Brandon pointed out, using analyze-string with a repeating pattern that matches the entire string is not the best approach. There are more natural approaches that work without recursion. I sketched two of them below.

Input:
<foo>a=b;c=d;e=f</foo>

XSL:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">

  <xsl:output method="xml" indent="yes" />

  <xsl:template match="/">
    <variants>
      <var type="tokenize replace">
        <xsl:apply-templates mode="tok" />
      </var>
      <var type="analyze-string">
        <xsl:apply-templates mode="as" />
      </var>
      <var type="analyze-string full regex">
        <xsl:apply-templates mode="as-full" />
      </var>
    </variants>
  </xsl:template>

  <xsl:template match="foo" mode="tok">
    <xsl:copy>
      <xsl:for-each select="tokenize(., ';')">
<item name="{replace(., '=.+', '')}" value="{replace(., '.+=', '')}" />
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="foo" mode="as">
    <xsl:copy>
      <xsl:analyze-string select="." regex="([a-z]+)=([a-z]+);?" flags="i">
        <xsl:matching-substring>
          <item name="{regex-group(1)}" value="{regex-group(2)}" />
        </xsl:matching-substring>
      </xsl:analyze-string>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="foo" mode="as-full">
    <xsl:copy>
<xsl:analyze-string select="." regex="([a-z]+)=([a-z]+)(;([a-z]+)=([a-z]+))*" flags="i">
        <xsl:matching-substring>
<item name="{regex-group(1)}" value="{regex-group(2)}" rest="{regex-group(3)}"/>
        </xsl:matching-substring>
      </xsl:analyze-string>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Output:
<variants>
   <var type="tokenize replace">
      <foo>
         <item name="a" value="b"/>
         <item name="c" value="d"/>
         <item name="e" value="f"/>
      </foo>
   </var>
   <var type="analyze-string">
      <foo>
         <item name="a" value="b"/>
         <item name="c" value="d"/>
         <item name="e" value="f"/>
      </foo>
   </var>
   <var type="analyze-string full regex">
      <foo>
         <item name="a" value="b" rest=";e=f"/>
      </foo>
   </var>
</variants>

-Gerrit

Best regards, Julian

--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--