xsl-list
[Top] [All Lists]

[xsl] word list and count from the text in an xml document

2010-06-12 15:51:34
Hi,
I have been floundering around in the xsl-list archives for a while looking for a way to get a listing and count of all the words in every instance of a specific element. Thousands of hits, but I think I am not using the correct search terms. I know what I want must be in the archive, but I just can't seem to narrow my search enough to find it.

Given a fragment like (and concentrating on the moment on the lang="en" element):
<Description>
   <Data lang="cz"> bílá skvrnka na spodní cásti písmene L ve SLOV</Data>
   <Data lang="en">white splotch on the lower bar on the L in SLOV</Data>
</Description>
<Description>
           <Data lang="cz">barevný bod pod dolním rámem vlevo od VHB</Data>
           <Data lang="en">dot on the lower frame to the left of VHB</Data>
</Description>

I would like to create a list like the one below (it would be nice to be able to use a "stop word" list also so as to not count stuff like "on", ""the", etc.):
bar        1
dot        1
frame    1
in           1
L            1
left         1
lower     2
of           1
on          3
SLOV    1
splotch 1
the        3
to         1
white    1

Mark


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--