xsl-list
[Top] [All Lists]

Re: Concordance with XSLT

2005-11-03 14:06:33
Rick Quatro wrote:
I am in the investigation stage of a project where the client wants a
concordance of a Bible. The concordance would be exhaustive, except for
words like "a", "the", "and", etc. We would supply an exclusion list. My
main question is this: given an XML version of the Bible, could this be done
practically with XSLT?

I don't think XSL is the best way to handle this type of thing. You might want to ask the same question on the Apache Lucene mail list (the main is at http://lucene.apache.org/) or some other search/indexing software list. This type of thing sounds more like a job for a search engine.

You would write a ContentHandler to index the XML into a lucene search index. You would create fields for the passage identifier, passage content and the passage's book ancestor. Another ContentHandler could create a create a list of all words not in the "stop word list". The list can then be sorted, duplicates removed and then run to search each word against the index. The results for each word could be returned as XML and XSL could be used to write them to a file.

best,
-Rob

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>