xsl-list
[Top] [All Lists]

Re: [xsl] Similarity metric in XSLT 2?

2012-03-31 12:27:58
Imsieke,
all,

Commons Lang [1] has an implementation of Levenshtein [2], and it
seems like calling it from XSLT with Saxon-PE works nicely.

Secondstring (http://secondstring.sourceforge.net/) is another Java
library that implements many, many more approximate string matching
algorithms, and is part of Simile Vicino
(http://code.google.com/p/simile-vicino/), which in turn is part of
Google's Freebase/Gridworks code base
(https://github.com/lbjay/gridworks).

I haven't had any luck calling any Secondstring or Vicino methods
using Saxon yet. I'd love to hear from anyone who has.

[1] http://commons.apache.org/lang/
[2] 
http://commons.apache.org/lang/api-release/src-html/org/apache/commons/lang3/StringUtils.html#line.6061

On Fri, Mar 30, 2012 at 4:24 PM, Imsieke, Gerrit, le-tex
<gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de> wrote:
I can only affirm that I'd be interested in such a library, too.

The last time that I needed string similarity metrics (4 yrs ago), I used
Perl with XML::LibXML and String::Similarity.

If there were such a module / extension function for XPath / XSLT, I'd
probably used it more often. If you find a Java library that is easy to
interface with from Java-based XSLT processors, please let me know. I think
that Levenshtein or more advanced algorithms will be too slow when
implemented in XSLT, but may be readily available as an extension function.

Gerrit


On 2012-03-30 20:18, Martin Holmes wrote:

Hi all,

I'm faced with a situation in which I have to match an input string
against a set of possible candidates, and I need to find the match which
is most similar to it (I'm trying to identify correspondences between
two sets of files which have similar, but not identical, content).

Has anyone done anything like measuring string similarity in XSLT 2.0?
If so, how did you approach it?

All help appreciated,
Martin


--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>