Re: [xsl] Using XSLT to build an index

Hi,

That is cool! However, it addressed a slightly different problem that mine.I will bookmark it because I can see a future use for the idea.

Thanks,
Mark

-----Original Message-----From: Dimitre Novatchev

Sent: Sunday, October 30, 2011 4:54 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Using XSLT to build an index

On Sun, Oct 30, 2011 at 2:47 PM, Mark <mark(_at_)knihtisk(_dot_)org> wrote:

The list archives did not seem to contain an XSLT stylesheet that could
index an XML file, but I may have missed it.


Perhaps my post from 2005 in this list on Concordance Building can help?

http://www.stylusstudio.com/xsllist/200511/post00190.html


--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


On Sun, Oct 30, 2011 at 2:47 PM, Mark <mark(_at_)knihtisk(_dot_)org> wrote:

The list archives did not seem to contain an XSLT stylesheet that could

index an XML file, but I may have missed it. Is it practical to write myown

XSLT 2 indexing stylesheet? If so, I have a bilingual XML file that I want
to index. My assumptions are that I must get rid of the punctuation
properly, then isolate the words, sort them, remove stop words, and so on.
To get started, I need a bit of help. All of the phrases are found in two
attributes: @czech and @eng.

Three questions:

(1) I am aware from Michael’s book that regex expressions may be used inthereplace() function, but I do not know how to write that regex expression.I

would like to remove all the punctuation from a phrase as follows: for

everything except a hyphen [-], replacement should be with an emptystring;

the hyphen should be replaced with a single space.

(2) I assume that to get rid of extra spaces (if any), I can use aconstruct

like: normalize-space(replace(@czech, ‘some regex expression’)).

(3) I assume that tokenize(normalize-space(replace(@czech, 'some regex
expression'))) will permit me to write out a list of the words found in
those attributes to an XML document. I am not completely clear as to what
tokenize() returns, or how to access that return.

I would appreciate any comments, and especially the construction of the
regex expression needed.
Thanks,
Mark


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--