xsl-list
[Top] [All Lists]

Re: [xsl] Processing two documents, which order?

2011-04-08 03:50:12
On 8 April 2011 09:15, Dave Pawson <davep(_at_)dpawson(_dot_)co(_dot_)uk> wrote:

Given
     <property>absolute-position</property>
     <property>bottom</property>
     <property>left</property>
     <property>right</property>
     <property>top</property>
as the input... what would the keys look like?


The 'list to be marked up' is as above
The other document is xml, containing, in other elements those words

Required output

<para> Blah blah blah <property>right</property>

'items' must be followed by [\s\p{{P}}]  so left-handed doesn't get
marked up  etc.

If, given "left", "left-handed" should not match, the set of stoppers must
include space and non-letters (\PL) and not punctuation characters (\pP).
If a regular expression is used, the pattern may also have to include the
anchor $.

And, possibly the symmetric pattern (using '^') should precede the pattern.

I'm not at all sure whether a regular expression substitution applied to text
nodes in their entirety would not be able to compete with any other approach.
A simple algorith can be used to optimize the regular expression, away from
the "brute force" pattern joining all words with '|'.

Example:
Given the words

   bee-bonnet-bounce-bounty-burn-burst-sea-seal

the optimized and anchored regex is

  
(^|\s|\p{P})((?:b(?:ee|o(?:nnet|un(?:ce|ty))|ur(?:n|st))|sea(?:|l)))($|\s|\p{P})

Here is a text:

   <p>Bee in my bonnet bounces from bounty. Burst on a bee-line into
the sea as a seal</p>

Applying global case-insensitive substitution with $1<x>$2</x>$3 produces:

   <p><x>Bee</x> in my <x>bonnet</x> bounces from <x>bounty</x>.
<x>Burst</x> on a <x>bee</x>-line into the <x>sea</x> as a
<x>seal</x></p>

Disclaimer: My XSLT skills aren't sufficient to create the optimized
regex from the word list. If someone is interested enough, I can
provide the details.

-W



regards




--

regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--