On 8 April 2011 09:15, Dave Pawson <davep(_at_)dpawson(_dot_)co(_dot_)uk> wrote:
Given
<property>absolute-position</property>
<property>bottom</property>
<property>left</property>
<property>right</property>
<property>top</property>
as the input... what would the keys look like?
The 'list to be marked up' is as above
The other document is xml, containing, in other elements those words
Required output
<para> Blah blah blah <property>right</property>
'items' must be followed by [\s\p{{P}}] so left-handed doesn't get
marked up etc.
If, given "left", "left-handed" should not match, the set of stoppers must
include space and non-letters (\PL) and not punctuation characters (\pP).
If a regular expression is used, the pattern may also have to include the
anchor $.
And, possibly the symmetric pattern (using '^') should precede the pattern.
I'm not at all sure whether a regular expression substitution applied to text
nodes in their entirety would not be able to compete with any other approach.
A simple algorith can be used to optimize the regular expression, away from
the "brute force" pattern joining all words with '|'.
Example:
Given the words
bee-bonnet-bounce-bounty-burn-burst-sea-seal
the optimized and anchored regex is
(^|\s|\p{P})((?:b(?:ee|o(?:nnet|un(?:ce|ty))|ur(?:n|st))|sea(?:|l)))($|\s|\p{P})
Here is a text:
<p>Bee in my bonnet bounces from bounty. Burst on a bee-line into
the sea as a seal</p>
Applying global case-insensitive substitution with $1<x>$2</x>$3 produces:
<p><x>Bee</x> in my <x>bonnet</x> bounces from <x>bounty</x>.
<x>Burst</x> on a <x>bee</x>-line into the <x>sea</x> as a
<x>seal</x></p>
Disclaimer: My XSLT skills aren't sufficient to create the optimized
regex from the word list. If someone is interested enough, I can
provide the details.
-W
regards
--
regards
--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--