Rick Quatro wrote:
I am in the investigation stage of a project where the client wants a
concordance of a Bible. The concordance would be exhaustive, except for
words like "a", "the", "and", etc. We would supply an exclusion list. My
main question is this: given an XML version of the Bible, could this be done
practically with XSLT?
I don't think XSL is the best way to handle this type of thing. You
might want to ask the same question on the Apache Lucene mail list (the
main is at http://lucene.apache.org/) or some other search/indexing
software list. This type of thing sounds more like a job for a search
engine.
You would write a ContentHandler to index the XML into a lucene search
index. You would create fields for the passage identifier, passage
content and the passage's book ancestor. Another ContentHandler could
create a create a list of all words not in the "stop word list". The
list can then be sorted, duplicates removed and then run to search each
word against the index. The results for each word could be returned as
XML and XSL could be used to write them to a file.
best,
-Rob
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--