xsl-list
[Top] [All Lists]

Re: [xsl] Spell Check Type Matching in XPath?

2022-04-21 15:11:40
On Thu, 2022-04-21 at 19:01 +0000, Eliot Kimber
eliot(_dot_)kimber(_at_)servicenow(_dot_)com wrote:
I’m looking at Jeni’s code now. I’ll see what I can do with it.
 
The fact that this is the best there is (a MarkMail search basically
brought me to Mike’s response below), it suggests that there’s not
something more obvious that I simply failed to see.

Non-obvious (at least to me), but possibly faster, given that you know
already one of the strings to be matched, may be the symmetric-deletion
approach to edit distance described by Wolf Garber [1]. It allows a
fairly quick detection of whether the candidate string is within edit
distance  1 of the string you're looking to match -- if you adjust the
way you do it, you can detect strings within distance 2.

[1]
https://wolfgarbe.medium.com/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f

Michael Sperberg-McQueen

 
However, Jeni’s comments in her post about recursion suggests there’s
a way to improve the code in XSLT 3/XPath 3, maybe something using
iterate….
 
Cheers,
 
E.
 
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com
LinkedIn | Twitter | YouTube | Facebook
 
From:Michael Kay mike(_at_)saxonica(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Date: Thursday, April 21, 2022 at 1:35 PM
To: xsl-list <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Subject: Re: [xsl] Spell Check Type Matching in XPath?
[External Email]
 
Jeni Tennison's work on computing Levenshtein distance in XSLT may be
relevant:
 
http://www.jenitennison.com/2007/05/03/levenshtein-distance-in-xslt-2-0.html
 
(It would be interesting to see it reworked for XSLT 3.0...)
 
Search also for "Levenshtein distance XSLT" on Markmail.
 
Michael Kay
Saxonica


On 21 Apr 2022, at 18:57, Eliot 
Kimbereliot(_dot_)kimber(_at_)servicenow(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
 
I’m writing a Schematron rule that tries to identify URLs where the
server component is close to, but not quite, “docs.servicenow.com”,
i.e., “seivcenow” or “servcinow” or whatever. I also need to
eliminate servers that are not like servicenow, such as
“docs.amazon.com”.
 
Basically I want a the kind of fuzzy match on “servicenow” that
you’d get with normal spell checking.
 
I’m not seeing an easy way to do this in XSLT/XPath (in the context
of the XSLT Schematron engine in Oxygen XML).
 
But I feel like I’m missing some more-or-less obvious way to do
this with regular expression or maybe a fold or something (I can
use XPath 3).
 
What am I missing?
 
Thanks,
 
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com
LinkedIn | Twitter | YouTube | Facebook
XSL-List info and archive 
EasyUnsubscribe (by email)
 
XSL-List info and archive 
EasyUnsubscribe (by email)
XSL-List info and archiveEasyUnsubscribe(by email)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>