xsl-list
[Top] [All Lists]

Re: [xsl] word list and count from the text in an xml document

2010-06-12 15:56:34
http://www.xsltfunctions.com/xsl/functx_word-count.html

if its xslt 2.0 then u have all the useful regex ways as possible or
at worst http://www.exslt.org/regexp/

hth, James Fuller

On Sat, Jun 12, 2010 at 10:51 PM, Mark <mark(_at_)knihtisk(_dot_)org> wrote:
Hi,
I have been floundering around in the xsl-list archives for a while looking
for a way to get a listing and count of all the words in every instance of a
specific element. Thousands of hits, but I think I am not using the correct
search terms. I know what I want must be in the archive, but I just can't
seem to narrow my search enough to find it.

Given a fragment like (and concentrating on the moment on the lang="en"
element):
<Description>
  <Data lang="cz"> bílá skvrnka na spodní cásti písmene L ve SLOV</Data>
  <Data lang="en">white splotch on the lower bar on the L in SLOV</Data>
</Description>
<Description>
          <Data lang="cz">barevný bod pod dolním rámem vlevo od VHB</Data>
          <Data lang="en">dot on the lower frame to the left of VHB</Data>
</Description>

I would like to create a list like the one below (it would be nice to be
able to use a "stop word" list also so as to not count stuff like "on",
""the", etc.):
bar        1
dot        1
frame    1
in           1
L            1
left         1
lower     2
of           1
on          3
SLOV    1
splotch 1
the        3
to         1
white    1

Mark


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--