xsl-list
[Top] [All Lists]

Re: [xsl] Finding first difference between 2 text strings

2009-09-14 14:22:30
... I don't understand all
the details of the function, but that's one advantage of reusable
code! ...

Inserting debugging statements in David's stylesheet helps:
 a: abcdefghijklmnopqrstuvwxyz
 b: abcdefghijklmnopqrstuvw1y0
aa: abcdefghijklmnopqrstuvwxyz
bb: abcdefghijklmnopqrstuvw1y0
 r: ^:(a(b(c(d(e(f(g(h(i(j(k(l(m(n(o(p(q(r(s(t(u(v(w(1(y
(0)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?.*


Probably it is a good idea to understand/verify code one wants to rely
on ...?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?.


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Erich Baier
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


                                                                           
             <mlcook(_at_)Wabtec(_dot_)co                                       
      
             m>                                                            
                                                                        To 
             09/14/2009 07:57          
<xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>   
             PM                                                         cc 
                                                                           
                                                                   Subject 
             Please respond to         Re: [xsl] Finding first difference  
             xsl-list(_at_)lists(_dot_)mu         between 2 text strings        
      
              lberrytech.com                                               
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




What a clever/impressive/compact solution!

David's solution is the one I decided to use because it avoids potential
problems with stack overflow during recursion.  I don't understand all
the details of the function, but that's one advantage of reusable code!
With our data, the regexp processing didn't seem to be stressed too much
since I got reasonable results for strings up to 1000 characters in
length.

Our text strings also contain '(', ')', and '?', so they had to be added
to the list of special characters to be processed.

I suppose the use of the ')' in the function could be replaced by a
character not occurring in the text data.

Since we're also processing just ASCII text, and not Unicode, I replaced
the hex codes in the translation with just a space for each special
character.  The ordering of special characters doesn't matter (to me),
so a blank seemed to work fine.  The hex codes also seemed to throw-off
the resulting position of the mismatch, although I didn't investigate
thoroughly.

My changes to the function amount to the following (with similar changes
for $b):

<xsl:variable name="single-quote">'</xsl:variable>

<xsl:param name="a" as="xs:string" />
<xsl:variable name="aa-pattern" select="concat('.,+*\{}[]()?',
$single-quote)" />
<xsl:variable name="aa" select="translate($a,  $aa-pattern,  '
')"/>

Say, invoke the function as:
<xsl:variable name="pos1" select=" f:mismatch2 ($a, $b)" />

I also went ahead and reversed the strings so that I could find the last
character in the string difference, and then extract the whole section
that was different:

<xsl:variable name="rev-a"
select="codepoints-to-string(reverse(string-to-codepoints($a)))" />
<xsl:variable name="rev-b"
select="codepoints-to-string(reverse(string-to-codepoints($b)))" />
<xsl:variable name="pos2" select=" f:mismatch2 ($rev-a, $rev-b)" />

Then output this string:
substring($a, $pos1, string-length($a) - $pos2 - $pos1 + 2)

or this string, depending on which sub-section is desired for the user
(and, actually, I output both for a "from"/"to" comparison):
substring($b, $pos1, string-length($b) - $pos2 - $pos1 + 2)

Processing time was not excessive, and I got some useful output from our
data.

Thanks again to David and the others who supplied working solutions!

-- Mike Cook



An alternative definition, that appears to give the same results is:

  <xsl:function name="f:mismatch2" as="xs:integer?">
    <xsl:param name="a" as="xs:string" />
    <xsl:param name="b" as="xs:string" />
    <xsl:variable name="aa"

select="translate($a,'.+*\{}[]','&#xe001;&#xe002;&#xe003;&#xe004;&#xe005
;&#xe006;&#xe
007;&#xe008;')"/>
    <xsl:variable name="bb"

select="translate($b,'.+*\{}[]','&#xe001;&#xe002;&#xe003;&#xe004;&#xe005
;&#xe006;&#xe
007;&#xe008;')"/>
    <xsl:variable name="r"
select="concat('^:',replace($bb,'.','($0'),replace($bb,'.',')?'),'.*')"/

    <xsl:sequence
select="1+string-length(replace(concat(':',$aa),$r,'$1'))"/>
  </xsl:function>

If $b is long, this might stretch the capabilities of the regexp
engine
though....

David


This email and any attachments are only for use by the intended recipient
(s) and may contain legally privileged, confidential, proprietary or
otherwise private information.  Any unauthorized use, reproduction,
dissemination, distribution or other disclosure of the contents of this
e-mail or its attachments is strictly prohibited.  If you have received
this email in error, please notify the sender immediately and delete the
original.



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--