This isn't difficult, no need to contemplate doing it in Java. You can
tokenize the text using the tokenize() function in XSLT 2.0, or the
str:tokenize() function/template in EXSLT (www.exslt.org). Then look up each
token in your list of place names, using a key for efficiency.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Karl Koch [mailto:TheRanger(_at_)gmx(_dot_)net]
Sent: 21 July 2005 14:56
To: Mulberry list
Subject: [xsl] Identifying place names in text...
Hello group,
I would like to find a way of automatically identifying
references to places
in XML text. The thing is that I have a very large set of
content. In this
content there are sometimes references to particular places,
which I want to
know about.
This is my xml structure (made up for simplification):
<bookshelf:
<book>
<title>1000 years of London's history</title>
...
</book>
<book>
<title>1984</title>
...
</book>
</bookshelf>
Can I use XSLT to search for place names in the title of all
the books? I
would like to use a wordlist of geographical place names
(which I already
have). This would contain coutry and city names. The
stylesheet would match
occurances of these words in the <title> XML element. The
output here would
be a list of all books which have references about locations
in the title.
In this example, the result would only be the first book,
because it has
"London" in th title.
Perhaps this is the point where XSLT is getting too
complicated and I should
consider Java as a solution. However, I am continuously
impressed by the
power of XSLT and therefore I ask here because I think there
might be even a
solution for that problem using XSLT.
A note on the side: The output of this stylesheet would be a
helper and an
additional control for a mainly handcrafted process. I could
discover books
which I have overseen in the manual process.
Any help would be greatly appreciated.
Kind Regards,
Karl
--
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse fo?=r Mail, Message, More +++
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--