xsl-list
[Top] [All Lists]

Re: [xsl] text replacement with mixed content

2011-09-05 03:49:43
For those interested in this thread
Here is how I resolved this...
works like a charm on all tests,
and I am pleased with the robustness

Let me first thank all who stepped in.
I got some inspiration from the different posts,
your contributions are highly appreciated

Since the problem is contained in paragraphs, and I can quickly check whether I have to bother with a revision or not per paragraph, it does not really slow me down (too much) by having multiple steps through the data

The thinking was the hard work. The actual XSLT implementation was not too bad once the algorithm was solid

Let me show you what I did (simplified) taking test 7 as an example

<in original="this old foo is breaking" revision="a new bar is building" > <p><b type="stronger">I <i>did not realize that this </i></b>old foo is breaking <i>this old foo</i></p>
        </in>

Pass 1. Take out the structure by making empty element markers (with id) from each element tag and in teh mean time put off-set markers at any location where a matching pattern could start or end (if "t" is first character in the @original" put a marker in front of every "t",
if "g" is the last character of @original, place a marker after every "g")
markers are potential-start <ps/> and potential end <pe/>
results in (simplified, I have namespaces, maintain attributes et al.)
<p><start name="b" id="A"/>I <start name="i" id="B"/>did not realize <ps/>that <ps/>this <end name="i" id="B"/><end name="b" id="A"/>old foo is breaking<pe/> <start name="i" id="C"/><ps/>this old foo<end name="i" id="C"/></p>

now actually the hard work is done

Pass 2.
on each <ps/> check if the join of all following text nodes (normalized one way or another) starts with the normalized @original, if so upgrade to revision start <rs/> on each <pe/> check if the join of all preceding text nodes (normalized one way or another) ends with the normalized @original, if so upgrade to revision end <re/>
results in
<p><start name="b" id="A"/>I <start name="i" id="B"/>did not realize that <rs/>this <end name="i" id="B"/><end name="b" id="A"/>old foo is breaking<re/> <start name="i" id="C"/>this old foo<end name="i" id="C"/></p>

Pass 3.
structure the revisions, making them real elements
results in
<p><start name="b" id="A"/>I <start name="i" id="B"/>did not realize that <rev>this <end name="i" id="B"/><end name="b" id="A"/>old foo is breaking</rev> <start name="i" id="C"/>this old foo<end name="i" id="C"/></p>

Pass 4.
Move the end tag markers that are inside a revision having a corresponding start tag marker (hence the id) outside the revision to right before the revision
Do something similar with start tag markers
results in
<p><start name="b" id="A"/>I <start name="i" id="B"/>did not realize that <end name="i" id="B"/><end name="b" id="A"/><rev>this old foo is breaking</rev> <start name="i" id="C"/>this old foo<end name="i" id="C"/></p>

Pass 5.
Clean up: make the actual replacement in the revision and make the markers into elements again <p><b>I <i>did not realize that </i></b><rev>a new bar is building</rev> <i>this old foo</i></p>

The turning point for me was adding the offset markers,
before I was auto-generating pretty complex regular expressions,
now I got away with a simple ends-with() and starts-with()

If anyone sees a possible improvement here or there, let me know please

Me happy now, thanks for your help

Geert





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>