I don't think it's straightforward at all - people have spent years
perfecting algorithms for finding diffs between two sequences. I'm no
expert on this area, but if I had the problem I would start by searching
for appropriate algorithms before even thinking about writing an XSLT
implementation. Presumably there's a trade-off between the time spent
and the perfection of the result.
Michael Kay
Saxonica
On 30/09/2010 5:51 PM, Markus Flatscher wrote:
I'm banging my head against a sequence alignment problem. I have a
feeling that this is straightforward, but I can't put my finger on
what's missing from my attempts.
Suppose I have two inputs like so, where input1//w is always a subset
of input2//w:
<input1>
<w n="1">I</w>
<w n="2">am</w>
<w n="3">a</w>
<w n="4">sequence</w>
</input1>
<input2>
<w>I</w>
<w>am</w>
<w>a</w>
<w>longer</w>
<w>longer</w>
<w>sequence</w>
</input2>
I'd like to get output like so:
<output>
<w n="1">I</w>
<w n="2">am</w>
<w n="3">a</w>
<w n="skipped">longer</w>
<w n="skipped">longer</w>
<w n="4">sequence</w>
</output>
I.e., for each input1//w, @n should be copied to the nearest following
sibling <w> in input2 that matches .; <w>s in input2 that aren't in
input1 should be flagged as "skipped".
P.S.: The use case is aligning an imperfect but timestamped
transcription of an audio file (input1, machine-generated) with a
perfect but not-timestamped one (input2, human-generated).
Thanks much for any help,
Markus
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--