Re: [xsl] text replacement with mixed content

For those interested in this thread
Here is how I resolved this...
works like a charm on all tests,
and I am pleased with the robustness

Let me first thank all who stepped in.
I got some inspiration from the different posts,
your contributions are highly appreciated

Since the problem is contained in paragraphs, and I can quickly checkwhether I have to bother with a revision or not per paragraph,it does not really slow me down (too much) by having multiple stepsthrough the data

The thinking was the hard work. The actual XSLT implementation wasnot too bad once the algorithm was solid


Let me show you what I did (simplified) taking test 7 as an example

<in original="this old foo is breaking" revision="a new baris building" >I did not realize that thisold foo is breaking this old foo

        </in>

Pass 1. Take out the structure by making empty element markers (withid) from each element tagand in teh mean time put off-set markers at any location where amatching pattern could start or end(if "t" is first character in the @original" put a marker in frontof every "t",

if "g" is the last character of @original, place a marker after every "g")
markers are potential-start <ps/> and potential end <pe/>
results in (simplified, I have namespaces, maintain attributes et al.)

<start name="b" id="A"/>I <start name="i" id="B"/>didnot realize <ps/>that <ps/>this <end name="i" id="B"/><end name="b"id="A"/>old foo is breaking<pe/> <start name="i" id="C"/><ps/>thisold foo<end name="i" id="C"/>


now actually the hard work is done

Pass 2.

on each <ps/> check if the join of all following text nodes(normalized one way or another) starts with the normalized @original,if so upgrade to revision start <rs/>on each <pe/> check if the join of all preceding text nodes(normalized one way or another) ends with the normalized @original,if so upgrade to revision end <re/>

results in

<start name="b" id="A"/>I <start name="i" id="B"/>didnot realize that <rs/>this <end name="i" id="B"/><end name="b"id="A"/>old foo is breaking<re/> <start name="i" id="C"/>this oldfoo<end name="i" id="C"/>


Pass 3.
structure the revisions, making them real elements
results in

<start name="b" id="A"/>I <start name="i" id="B"/>didnot realize that <rev>this <end name="i" id="B"/><end name="b"id="A"/>old foo is breaking</rev> <start name="i" id="C"/>this oldfoo<end name="i" id="C"/>


Pass 4.

Move the end tag markers that are inside a revision having acorresponding start tag marker (hence the id) outside the revision toright before the revision

Do something similar with start tag markers
results in

<start name="b" id="A"/>I <start name="i" id="B"/>didnot realize that <end name="i" id="B"/><end name="b"id="A"/><rev>this old foo is breaking</rev> <start name="i"id="C"/>this old foo<end name="i" id="C"/>


Pass 5.

Clean up: make the actual replacement in the revision and make themarkers into elements againI did not realize that <rev>a new bar isbuilding</rev> this old foo


The turning point for me was adding the offset markers,
before I was auto-generating pretty complex regular expressions,
now I got away with a simple ends-with() and starts-with()

If anyone sees a possible improvement here or there, let me know please

Me happy now, thanks for your help

Geert





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--