Thanks for the solution and comment. I will try it and take your
remarks into my maintenance considerations.
Meanwhile I found that I have input cases where the 2 rules are ambiguous.
For example if the input is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<p dir="rtl">
<span class="chapter">line1</span>
</p>
<p dir="rtl"><span class="regular">line10</span>
<span class="regular">line11</span>
</p>
<p dir="rtl"><span class="regular">line12</span>
</p>
<p dir="rtl"><span class="regular">line13.</span>
</p>
</body>
</html>
the error I get is:
Recoverable error
XTRE0540: Ambiguous rule match for /html/body[1]/p[3]
Matches both "p[preceding-sibling::p[1][span[(_at_)class ne 'chapter'] and
not(matches(span[(_at_)class ne 'chapter'][last()], '[.?"!]$'))]]" on line 22
of
file:/E:/help2-orig.xsl
and "p[span[(_at_)class ne 'chapter'] and not(matches(span[(_at_)class ne
'chapter'][last(
)],
'[.?"!]$'))]" on line 16 of file:/E:/help2-orig.xsl
It is beacuse the line:
<p dir="rtl"><span class="regular">line12</span></p>
matches both rules.
How can I make a rule that will take all the following-sibling p's
internal nodes until it finds one that it's last span ends with
paragraph terminator? And of course change the 2nd rule accordingly to
remove those merged to the upper sibling.
Thanks.
On Wed, Jun 17, 2009 at 1:52 PM, Martin
Honnen<Martin(_dot_)Honnen(_at_)gmx(_dot_)de> wrote:
Israel Viente wrote:
I have another question regarding this.
I want to be able to control the valid ending paragraph characters
from a config file in xml.
How can I read the set [.?"!] from an external xml? - say something
like:
<ParagraphTerminator>
<Char>.</Char>
<Char>?</Char>
<Char>"</Char>
<Char>!</Char>
</ParagraphTerminator>
or any other xml representation that will be easy to read as the set
in the regular expression.
You can do that by pulling in that document with the doc function and
building the regular expression:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xhtml"/>
<xsl:variable name="chars"
select="string-join(doc('chars.xml')/ParagraphTerminator/Char, '')"
xpath-default-namespace=""/>
<xsl:variable name="re1" select="concat('[', $chars, ']$')"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[span[(_at_)class ne 'chapter'] and
not(matches(span[(_at_)class ne 'chapter'][last()], $re1))]">
<xsl:copy>
<xsl:apply-templates select="@* | node() |
following-sibling::p[1]/node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[preceding-sibling::p[1][span[(_at_)class ne 'chapter']
and not(matches(span[(_at_)class ne 'chapter'][last()], $re1))]]"/>
</xsl:stylesheet>
There are however certain characters like '-' that would need to be escaped.
That applies to both solutions but is easier to forget and overlook if the
characters are read in from a file.
--
Martin Honnen
http://msmvps.com/blogs/martin_honnen/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--