xsl-list
[Top] [All Lists]

Re: Muenchian method on nodes with two or more items for indexing

2002-09-20 10:56:59
Sorry about mismatch between input and output. I over simplified. Here is a
file that I actually ran through the XSL with its output and what I would
LIKE to see. As for my source document, it has approximately 6000 entries
with another 6000 index items.

Thanks for any help!

Larry


Input:

<?xml-stylesheet type="text/xsl"
href="C:\LL2XML\TransXML2HTML\xml2ReverseIndex2.xsl"?>
<LexicalDatabase>
  <minor>
    <base>'wah 'nabuuysk</base>
    <sense num=" 1">
      <index enc="ENG">unexpected</index>
    </sense>
  </minor>
  <minor>
    <base>'wah wil&#226;ontk</base>
  </minor>
  <major>
    <base>'w&#224;hamaniits'&#224;</base>
    <sense num=" 1">
      <pos>v</pos>
      <def enc="ENG">careless</def>
      <index enc="ENG">careless</index>
    </sense>
  </major>
  <major>
    <base>xbimooksk</base>
    <sense num=" 1">
      <pos>n</pos>
      <def enc="ENG">half-white </def>
      <index enc="ENG">metis</index>
      <index enc="ENG">half-white</index>
      <sense num="1.1">
        <pos>n</pos>
        <def enc="ENG">test</def>
        <index enc="ENG">test</index>
      </sense>
    </sense>
  </major>
  <major>
    <base>xbismsg&#232;&#232;</base>
    <sense num=" 1">
      <pos>v</pos>
      <index enc="ENG">bow your head</index>
      <index enc="ENG">bend down</index>
    </sense>
  </major>
</LexicalDatabase>


XSL:

<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:saxon="http://icl.com/saxon";>
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:key name="BaseForm" match="LexicalDatabase/*"
use="concat(base,baseHom)"/>
  <xsl:key name="entries-by-index" match="//LexicalDatabase/*"
use=".//index"/>
  <xsl:template match="/">
    <ReverseEntries>
      <xsl:apply-templates/>
    </ReverseEntries>
  </xsl:template>
  <xsl:template match="LexicalDatabase">
    <xsl:for-each
select="//LexicalDatabase/*[generate-id(.)=generate-id(key('entries-by-index
',.//index))]">
      <xsl:sort select=".//index" order="ascending" />
      <IndexItem>
        <xsl:attribute name="value"><xsl:value-of
select=".//index"/></xsl:attribute>
        <xsl:for-each select="key('entries-by-index', .//index)">
          <!--xsl:sort select="base"/ Should be presorted coming from
LinguaLinks to account for multigraphs-->
          <entry>
            <xsl:attribute name="base"><xsl:value-of
select="base"/></xsl:attribute>
          </entry>
        </xsl:for-each>
      </IndexItem>
    </xsl:for-each>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>

Actual Output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon";>
  <IndexItem value="bow your head">
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="careless">
    <entry base="'wàhamaniits'à"/>
  </IndexItem>
  <IndexItem value="metis">
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="unexpected">
    <entry base="'wah 'nabuuysk"/>
  </IndexItem>
</ReverseEntries>

Desired Output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon";>
  <IndexItem value="bend down">  <-- This is missing above.
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="bow your head">
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="careless">
    <entry base="'wàhamaniits'à"/>
  </IndexItem>
  <IndexItem value="half-white">  <-- This is missing above.
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="metis">
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="test"> <-- This is missing above. Comes from sense
within another sense.
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="unexpected">
    <entry base="'wah 'nabuuysk"/>
  </IndexItem>

</ReverseEntries>


----- Original Message -----
From: <Jarno(_dot_)Elovirta(_at_)nokia(_dot_)com>
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Friday, September 20, 2002 12:20 AM
Subject: RE: [xsl] Muenchian method on nodes with two or more items for
indexing


Hi,

I just tried using an axes method with this problem and it
took more than 15
minutes to crunch through on a 2 GHZ Pentium with lots of
RAM. I need to

How big was your source document?

I have data of the following sort. You will note that minor or major
elements and their senses can have one or more index elements.

<LexicalDatabase>
<minor>
<base>'wah 'nabuuysk</base>
<sense num=" 1">
<index enc="ENG">unexpected</index>
</sense>
</minor>
<minor>
<base>'wah wil&#226;ontk</base>
</minor>
<major>
<base>'w&#224;hamaniits'&#224;</base>
<sense num=" 1">
<pos>v</pos>
<def enc="ENG">careless</def>
<index enc="ENG">careless</index>
</sense>
</major>
<major>
<base>xbimooksk</base>
<sense num=" 1">
<pos>n</pos>
<def enc="ENG">half-white </def>
<index enc="ENG">metis</index>
<index enc="ENG">half-white</index>
</sense>
</major>
<major>
<base>xbismsg&#232;&#232;</base>
<sense num=" 1">
<pos>v</pos>
<index enc="ENG">bow your head</index>
<index enc="ENG">bend down</index>
</sense>
</major>
</LexicalDatabase>

What I would like to do is get output a file that has index elements
containing their major or minor entries. It is similar to
grouping by last
name or city except that each person could have one, two or
more of these.
Perhaps "Schools attended" would be a good example. Anyhow,
here is a sample
of what I would like to output.

<IndexList>
<IndexItem value="metis">
<entry base="xbimooksk" baseHom="" />
</IndexItem>
<IndexItem value="microwave">
<entry base="âànuut" baseHom="2"/>
</IndexItem>
<IndexItem value="midday">
<entry base="nsèèlga sah" baseHom=""/>
<entry base="sèèlgyàxsk" baseHom=""/>
</IndexItem>
<IndexItem value="middle (in the _)">
<entry base="lusèèlk" baseHom=""/>
<entry base="xts'a" baseHom=""/>
</IndexItem>
</IndexList>

Your source and desired output don't match (e.g. no "microwave" in
source), so it's bit hard to see how it should work.

<xsl:key name="entries-by-index" match="index" use="."/>

<xsl:template match="LexicalDatabase">
  <IndexList>
    <xsl:for-each select="*/sense/index[generate-id() =
generate-id(key('entries-by-index', .)]">
      <xsl:sort select="." data-type="text"/>
      <IndexItem value="{.}">
        <xsl:for-each select="key('entries-by-index', .)/../../base">
          <entry base="{.}" baseHom=""/>
        </xsl:for-each>
      </IndexItem>
    </xsl:for-each>
  </IndexList>
</xsl:template>

Will get you somewhere, but I didn't understand where the value of baseHom
comes from.

J - Wumpscut: Deliverance (Alternative Club Mix)

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list