xsl-list
[Top] [All Lists]

RE: Identifying place names in text...

2005-07-21 09:38:07
This isn't difficult, no need to contemplate doing it in Java. You can
tokenize the text using the tokenize() function in XSLT 2.0, or the
str:tokenize() function/template in EXSLT (www.exslt.org). Then look up each
token in your list of place names, using a key for efficiency. 

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Karl Koch [mailto:TheRanger(_at_)gmx(_dot_)net] 
Sent: 21 July 2005 14:56
To: Mulberry list
Subject: [xsl] Identifying place names in text...

Hello group,

I would like to find a way of automatically identifying 
references to places
in XML text. The thing is that I have a very large set of 
content. In this
content there are sometimes references to particular places, 
which I want to
know about. 

This is my xml structure (made up for simplification):

<bookshelf:
  <book>
    <title>1000 years of London's history</title>
    ...
  </book>
  <book>
    <title>1984</title>
    ...
  </book>
</bookshelf>

Can I use XSLT to search for place names in the title of all 
the books? I
would like to use a wordlist of geographical place names 
(which I already
have). This would contain coutry and city names. The 
stylesheet would match
occurances of these words in the <title> XML element. The 
output here would
be a list of all books which have references about locations 
in the title.
In this example, the result would only be the first book, 
because it has
"London" in th title.

Perhaps this is the point where XSLT is getting too 
complicated and I should
consider Java as a solution. However, I am continuously 
impressed by the
power of XSLT and therefore I ask here because I think there 
might be even a
solution for that problem using XSLT.

A note on the side: The output of this stylesheet would be a 
helper and an
additional control for a mainly handcrafted process. I could 
discover books
which I have overseen in the manual process.

Any help would be greatly appreciated.

Kind Regards,
Karl

-- 
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse fo?=r Mail, Message, More +++

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>