xsl-list
[Top] [All Lists]

Re: [xsl] text replacement with mixed content

2011-09-05 04:08:23
Heeeey, well done, on first reading that looks really good.
Converting the nested markup to empty elements seems to be the crucial
part, so you can move the start and end around.  You should write this
up and publish it somewhere...


On 5 September 2011 09:49, Geert Bormans 
<geert(_at_)gbormans(_dot_)telenet(_dot_)be> wrote:
For those interested in this thread
Here is how I resolved this...
works like a charm on all tests,
and I am pleased with the robustness

Let me first thank all who stepped in.
I got some inspiration from the different posts,
your contributions are highly appreciated

Since the problem is contained in paragraphs, and I can quickly check
whether I have to bother with a revision or not per paragraph,
it does not really slow me down (too much) by having multiple steps through
the data

The thinking was the hard work. The actual XSLT implementation was not too
bad once the algorithm was solid

Let me show you what I did (simplified) taking test 7 as an example

       <in original="this old foo is breaking" revision="a new bar is
building" >
           <p><b type="stronger">I <i>did not realize that this </i></b>old
foo is breaking <i>this old foo</i></p>
       </in>

Pass 1. Take out the structure by making empty element markers (with id)
from each element tag
and in teh mean time put off-set markers at any location where a matching
pattern could start or end
(if "t"  is first character in the @original" put a marker in front of every
"t",
if "g" is the last character of @original, place a marker after every "g")
markers are potential-start <ps/> and potential end <pe/>
results in (simplified, I have namespaces, maintain attributes et al.)
          <p><start name="b" id="A"/>I <start name="i" id="B"/>did not
realize <ps/>that <ps/>this <end name="i" id="B"/><end name="b" id="A"/>old
foo is breaking<pe/> <start name="i" id="C"/><ps/>this old foo<end name="i"
id="C"/></p>

now actually the hard work is done

Pass 2.
on each <ps/> check if the join of all following text nodes (normalized one
way or another) starts with the normalized @original, if so upgrade to
revision start <rs/>
on each <pe/> check if the join of all preceding text nodes (normalized one
way or another) ends with the normalized @original, if so upgrade to
revision end <re/>
results in
         <p><start name="b" id="A"/>I <start name="i" id="B"/>did not
realize that <rs/>this <end name="i" id="B"/><end name="b" id="A"/>old foo
is breaking<re/> <start name="i" id="C"/>this old foo<end name="i"
id="C"/></p>

Pass 3.
structure the revisions, making them real elements
results in
         <p><start name="b" id="A"/>I <start name="i" id="B"/>did not
realize that <rev>this <end name="i" id="B"/><end name="b" id="A"/>old foo
is breaking</rev> <start name="i" id="C"/>this old foo<end name="i"
id="C"/></p>

Pass 4.
Move the end tag markers that are inside a revision having a corresponding
start tag marker (hence the id) outside the revision to right before the
revision
Do something similar with start tag markers
results in
         <p><start name="b" id="A"/>I <start name="i" id="B"/>did not
realize that <end name="i" id="B"/><end name="b" id="A"/><rev>this old foo
is breaking</rev> <start name="i" id="C"/>this old foo<end name="i"
id="C"/></p>

Pass 5.
Clean up: make the actual replacement in the revision and make the markers
into elements again
         <p><b>I <i>did not realize that </i></b><rev>a new bar is
building</rev> <i>this old foo</i></p>

The turning point for me was adding the offset markers,
before I was auto-generating pretty complex regular expressions,
now I got away with a simple ends-with() and starts-with()

If anyone sees a possible improvement here or there, let me know please

Me happy now, thanks for your help

Geert





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





-- 
Andrew Welch
http://andrewjwelch.com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>