Yes, meanwhile I had changed the middle part to
(«[^»¤]+»\s*|§[^§¤]+§\s*){0,255}
so we agree :)
-W
On 31 January 2011 20:15, Alex Muir
<alex(_dot_)g(_dot_)muir(_at_)gmail(_dot_)com> wrote:
Okay this one seems to work based on your suggestion and a little
tweak to get it to surround all the LISTITEM's
((¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)\s*(((«[^»¤]+»\s*|§[^§¤]+§\s*){0,255})(¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)){0,200})
Also I note that the input I posted there was working. I was trying
to reduce the input text and then ended up using a project scenario
rather than a global scenario with the same name and after restarting
oxygen I guess I switched to using a different scenario running
different input than I wanted.
Thanks much
On Mon, Jan 31, 2011 at 6:59 PM, Wolfgang Laun
<wolfgang(_dot_)laun(_at_)gmail(_dot_)com> wrote:
The parentheses '(' and ')' do not match well in <xsl:variable
name="CompleteListIdentificationRegex" >. Please check.
But one evil subpattern is this (with spaces inserted for readability):
( ( «[^»¤]+» | \s+ | §[^§¤]+§ ){0,255})
This will try many combinations of zero to 255 repetitions of "any
number > 0 of spaces"
Cleaner is
(\s+|( «[^»¤]+»|§[^§¤]+§){0,255})
-W
On 31 January 2011 19:40, Alex Muir
<alex(_dot_)g(_dot_)muir(_at_)gmail(_dot_)com> wrote:
Hi,
With the following code:
------------------------------
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://saxon.sf.net/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0" exclude-result-prefixes="#all">
<xsl:output method="xml" indent="no"/>
<xsl:template match="unknown[exists(text())]">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:call-template name="CompleteListAnalyze">
<xsl:with-param name="content" select="text()"/>
</xsl:call-template>
</xsl:copy>
</xsl:template>
<xsl:template name="CompleteListAnalyze">
<xsl:param name="content"/>
<xsl:variable name="CompleteListIdentificationRegex" >
<xsl:text>((¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)(((«[^»¤]+»|\s+|§[^§¤]+§){0,255})(¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)){0,200})</xsl:text>
</xsl:variable>
<xsl:analyze-string select="$content"
regex="{$CompleteListIdentificationRegex}">
<xsl:matching-substring>
<xsl:text>¤COMPLETELIST POSITION="</xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>" PLACEMENT=""¤</xsl:text>
<xsl:value-of select="regex-group(1)"/>
<xsl:text>¤⊕/COMPLETELIST¤</xsl:text>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
And the following input file:
----------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<unknown>¤LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT"
TYPE="SNLI"¤«§HL§FONT size="2" id="H13211"»15«/§HL§FONT»«/§HL§TD»
«§HL§TD id="H13213"»«/§HL§TD» «/§HL§TR» «§HL§TR id="H13215"»«§HL§TD
id="H13216"» «/§HL§TD»«/§HL§TR» «§HL§TR valign="bottom" id="H13218"»
«§HL§TD id="H13220"»«/§HL§TD» «§HL§TD colspan="2"
id="H13222"»«§HL§FONT size="2" id="H13223"»TEXT TEXT TEXT
TEXT«/§HL§FONT»¤/LISTITEM¤«/TD» «TD id="H13225"»«/TD»
«TD id="H13227"»«/TD» «TD id="H13229"»«/TD» «TD
id="H13231"»«/TD» «TD align="right" id="H13233"»¤LISTITEM
BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"¤«§HL§FONT size="2"
id="H13234"»16«/§HL§FONT»«/§HL§TD» «§HL§TD
id="H13236"»«/§HL§TD» «/§HL§TR» «§HL§TR id="H13238"»«§HL§TD
id="H13239"» «/§HL§TD»«/§HL§TR» «§HL§TR valign="bottom" id="H13241"»
«§HL§TD id="H13243"»«/§HL§TD» «§HL§TD colspan="2"
id="H13245"»«§HL§FONT size="2" id="H13246"»TEXT TEXT TEXT TEXT TEXT
«/§HL§FONT»¤/LISTITEM¤«/TD» «TD id="H13248"»«/TD» «TD
id="H13250"»«/TD» «TD id="H13252"»«/TD» «TD
id="H13254"»«/TD» «TD align="right" id="H13256"»¤LISTITEM
BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"¤«§HL§FONT size="2"
id="H13257"»17«/§HL§FONT»«/§HL§TD» «§HL§TD
id="H13259"»«/§HL§TD» «/§HL§TR» «§HL§TR id="H13261"»«§HL§TD
id="H13262"» «/§HL§TD»«/§HL§TR» «§HL§TR valign="bottom" id="H13264"»
«§HL§TD id="H13266"»«/§HL§TD» «§HL§TD colspan="2"
id="H13268"»«§HL§FONT size="2" id="H13269"»TEXT TEXT TEXT TEXT TEXT
«/§HL§FONT»¤/LISTITEM¤</unknown>
</doc>
The regex held in the variable CompleteListIdentificationRegex runs
fine on the same input executing to completion in 201 steps. It
essentially just identifies all the content within the above <unknown>
element.
However the equivalent Analyze-String running in oxygen 12.1 will
continue running and not stop on the same input.
Any ideas?
Been working on it for 4 hours without much progress other than
reducing the number of execution steps in regex buddy by 40.
Thanks Much
--
Alex
-----
Currently:
Freelance Software Engineer 6+ yrs exp
Previously:
https://sites.google.com/a/utg.edu.gm/alex/
A Bafila, is two rivers flowing together as one:
http://www.facebook.com/pages/Bafila/125611807494851
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
Alex
-----
Currently:
Freelance Software Engineer 6+ yrs exp
Previously:
https://sites.google.com/a/utg.edu.gm/alex/
A Bafila, is two rivers flowing together as one:
http://www.facebook.com/pages/Bafila/125611807494851
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--