xsl-list
[Top] [All Lists]

RE: [xsl] How to parse text into words, phrases, clauses, sentences, and paragraphs

2007-06-07 07:25:23
Michael, That was all I needed. Thanks for your help.
This list is great.

Cheers,

Mark, Getty Trust.

--- Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:

This is my first problem. How to apply a template
match ysing 
the tokenize() function. And which order to apply
(from 
paragraph -> word or word -> paragraph).

It's generally easiest to do it top-down, I think.

Something like this:

<xsl:for-each select="tokenize(.,
$sentence-delimiter)">
  <sentence id="{position()}">
    <xsl:for-each select="tokenize(.,
$phrase-delimiter)">
      <phrase id="{position()}">
        <xsl:for-each select="tokenize(.,
$word-delimiter)">
          <word id="{position()}">
            <xsl:value-of select="."/>

(d) doing the output numbering.


I think you just need position() as shown above.

Sometimes you need to work bottom-up if the
"sentences" can't be recognized
until you've identified the "words", for example if
you want to avoid
treating "." as ending a sentence if it appears in a
number. You're then
sometimes in the domain of positional grouping:
create a long flat list of
words, and then group it into sentences using some
kind of test applied to
the individual words.

Michael Kay
http://www.saxonica.com/



--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to:
http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





       
____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for 
today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--