xsl-list
[Top] [All Lists]

Re: [xsl] faster complicated counting

2012-03-01 03:02:31
Can't you run a three-level for-each so that you can compute all three
numbers in one go?
-W

2012/3/1 Emmanuel Bégué <medusis(_at_)gmail(_dot_)com>

One way is to compute the respective position in variables, and then
look them up with keys, so that each position is only computed once.

For example, for the global position, you can add to the root of the
stylesheet:

<xsl:key name="l" match="l" use="@id"/>

<xsl:variable name="global">
       <xsl:for-each select="//l">
               <l pos="{position()}" id="{generate-id(.)}"/>
               </xsl:for-each>
       </xsl:variable>

and then, in each l element, look up the value of wwp:num-global like
this:

<xsl:attribute name="wwp:num-global" select="key('l', generate-id(.),
$global)/@pos"/>

Regards,
EB

2012/2/29 Syd Bauman <Syd_Bauman(_at_)brown(_dot_)edu>:
I am working with a relatively small dataset (~ 1 MiB) which uses a
TEI encoding. In TEI, a line of verse is encoded with an <l> element
(of which I have just about 306,000), which are grouped into groups
(like poems or stanzas) using <lg> (for "line group").

In the output of the particular process I am working on now, I'd like
to adorn each <l> element with three new attributes that indicate the
count of the current <l> element in various contexts:
 wwp:num-global   = with respect to the entire document
 wwp:num-local    = with respect to the current stanza or other
                    small unit of poetry
 wwp:num-regional = with respect to the current poem or other
                    large unit of poetry

So, as a toy example, see tiny.in.xml and tiny.out.xml, below.

I have worked out code that gets me the desired counts. My problem is
that all the tree-walking it does slows down my process by well over
an order of magnitude. I am betting there is a much better way to do
this, probably using keys or <xsl:number>, but have not been able to
wrap my mind around it.

The English-like pseudo-code for @num-local is "the count in the
context of the closest ancestor <lg> that itself has > 4 metrical
lines".

The English-like pseudo-code for @num-regional is "the count in the
context of the closest ancestor <lg> that has a @type that contains
"poem" or whose first descendant <l> has n='1'".

Here's what I have (note that we are only counting those <l> elements
that have an @part of 'I' or do not have a @part attribute at all):

 <xsl:attribute name="wwp:num-global">
   <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/>
 </xsl:attribute>
 <xsl:attribute name="wwp:num-regional">
   <xsl:variable name="region"
    select="(ancestor::lg[contains( @type,'poem') ]|ancestor::lg[
descendant::l[ @n eq '1'] ])[last()]"/>
   <xsl:value-of

select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
= $region/generate-id() ] ) +1"/>
 </xsl:attribute>
 <xsl:attribute name="wwp:num-local">
   <xsl:variable name="region"
    select="ancestor::lg[count( descendant::l[not(@part) or @part='I'] )
4 ][1]"/>
   <xsl:value-of

select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
= $region/generate-id() ] ) +1"/>
 </xsl:attribute>

Thoughts appreciated.

Notes
-----
* Yes, I realize that the test above is for *any* descendant <l> with
 n='1', not the first. We simply don't have any that aren't the
 first, so I didn't worry about it.

* It's pretty likely we'll change the definition of what is
 "regional" in the near future, but it probably won't affect the
 basic problem I'm having. I.e., I'm hoping that if someone shows me
 how to do this "regional" better, I'll be able to do any future
 version on my own. Cross your fingers :-)


toy input
--- -----
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0";
    xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0";>
 <teiHeader>
   <!-- blah, blah, blah -->
 </teiHeader>
 <text>
   <body>
     <lg type="superStructure">
       <lg type="poem.duck">
         <l>one</l>
         <l>two</l>
         <l>three</l>
         <l>four</l>
         <l>five</l>
         <l>six</l>
         <l>seven</l>
         <l>eight</l>
         <l>nine</l>
         <l>ten</l>
       </lg>
       <lg type="poem.duck">
         <l>one</l>
         <l>two</l>
         <l>three</l>
         <l>four</l>
         <lg type="tercet">
           <l>five</l>
           <l>six</l>
           <l>seven</l>
         </lg>
         <l>eight</l>
         <l>nine</l>
         <l>ten</l>
       </lg>
       <lg type="poem.duck">
         <lg type="stanza">
           <l>one</l>
           <l>two</l>
           <l>three</l>
           <l>four</l>
           <l>five</l>
           <l>six</l>
           <l>seven</l>
           <l>eight</l>
         </lg>
         <lg type="stanza">
           <l>nine</l>
           <l>ten</l>
           <l>eleven</l>
           <l>twelve</l>
           <l>thirteen</l>
           <l>fourteen</l>
           <l>fifteen</l>
           <l>sixteen</l>
         </lg>
         <lg type="stanza">
           <l>seventeen</l>
           <l>eighteen</l>
           <l>nineteen</l>
           <l>twenty</l>
           <l>twentyone</l>
           <l>twentytwo</l>
           <l>twentythree</l>
           <l>twentyfour</l>
         </lg>
       </lg>
     </lg>
   </body>
 </text>
</TEI>

toy code
--- ----
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0";
xmlns="http://www.tei-c.org/ns/1.0";
 xpath-default-namespace="http://www.tei-c.org/ns/1.0"; version="2.0">

 <xsl:template match="/">
   <xsl:text>&#x0A;</xsl:text>
   <xsl:apply-templates/>
 </xsl:template>
 <xsl:template match="@*|text()|processing-instruction()|comment()">
   <xsl:copy/>
 </xsl:template>
 <xsl:template match="*">
   <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="l">
   <xsl:copy>
     <xsl:attribute name="wwp:num-global">
       <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/>
     </xsl:attribute>
     <xsl:attribute name="wwp:num-regional">
       <xsl:variable name="region"
         select="(ancestor::lg[ contains( @type,'poem') ]|ancestor::lg[
descendant::l[ @n eq '1'] ])[last()]"/>
       <xsl:value-of
         select="count(
(preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
= $region/generate-id() ] ) +1"
       />
     </xsl:attribute>
     <xsl:attribute name="wwp:num-local">
       <xsl:variable name="region"
         select="ancestor::lg[count( descendant::l[not(@part) or
@part='I'] ) > 4 ][1]"/>
       <xsl:value-of
         select="count(
(preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
= $region/generate-id() ] ) +1"
       />
     </xsl:attribute>
     <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
 </xsl:template>

</xsl:stylesheet>

toy output
--- ------
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0";
xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0";>
 <teiHeader>
   <!-- blah, blah, blah -->
 </teiHeader>
 <text>
   <body>
     <lg type="superStructure">
       <lg type="poem.duck">
         <l wwp:num-global="1" wwp:num-regional="1"
wwp:num-local="1">one</l>
         <l wwp:num-global="2" wwp:num-regional="2"
wwp:num-local="2">two</l>
         <l wwp:num-global="3" wwp:num-regional="3"
wwp:num-local="3">three</l>
         <l wwp:num-global="4" wwp:num-regional="4"
wwp:num-local="4">four</l>
         <l wwp:num-global="5" wwp:num-regional="5"
wwp:num-local="5">five</l>
         <l wwp:num-global="6" wwp:num-regional="6"
wwp:num-local="6">six</l>
         <l wwp:num-global="7" wwp:num-regional="7"
wwp:num-local="7">seven</l>
         <l wwp:num-global="8" wwp:num-regional="8"
wwp:num-local="8">eight</l>
         <l wwp:num-global="9" wwp:num-regional="9"
wwp:num-local="9">nine</l>
         <l wwp:num-global="10" wwp:num-regional="10"
wwp:num-local="10">ten</l>
       </lg>
       <lg type="poem.duck">
         <l wwp:num-global="11" wwp:num-regional="1"
wwp:num-local="1">one</l>
         <l wwp:num-global="12" wwp:num-regional="2"
wwp:num-local="2">two</l>
         <l wwp:num-global="13" wwp:num-regional="3"
wwp:num-local="3">three</l>
         <l wwp:num-global="14" wwp:num-regional="4"
wwp:num-local="4">four</l>
         <lg type="tercet">
           <l wwp:num-global="15" wwp:num-regional="5"
wwp:num-local="5">five</l>
           <l wwp:num-global="16" wwp:num-regional="6"
wwp:num-local="6">six</l>
           <l wwp:num-global="17" wwp:num-regional="7"
wwp:num-local="7">seven</l>
         </lg>
         <l wwp:num-global="18" wwp:num-regional="8"
wwp:num-local="8">eight</l>
         <l wwp:num-global="19" wwp:num-regional="9"
wwp:num-local="9">nine</l>
         <l wwp:num-global="20" wwp:num-regional="10"
wwp:num-local="10">ten</l>
       </lg>
       <lg type="poem.duck">
         <lg type="stanza">
           <l wwp:num-global="21" wwp:num-regional="1"
wwp:num-local="1">one</l>
           <l wwp:num-global="22" wwp:num-regional="2"
wwp:num-local="2">two</l>
           <l wwp:num-global="23" wwp:num-regional="3"
wwp:num-local="3">three</l>
           <l wwp:num-global="24" wwp:num-regional="4"
wwp:num-local="4">four</l>
           <l wwp:num-global="25" wwp:num-regional="5"
wwp:num-local="5">five</l>
           <l wwp:num-global="26" wwp:num-regional="6"
wwp:num-local="6">six</l>
           <l wwp:num-global="27" wwp:num-regional="7"
wwp:num-local="7">seven</l>
           <l wwp:num-global="28" wwp:num-regional="8"
wwp:num-local="8">eight</l>
         </lg>
         <lg type="stanza">
           <l wwp:num-global="29" wwp:num-regional="9"
wwp:num-local="1">nine</l>
           <l wwp:num-global="30" wwp:num-regional="10"
wwp:num-local="2">ten</l>
           <l wwp:num-global="31" wwp:num-regional="11"
wwp:num-local="3">eleven</l>
           <l wwp:num-global="32" wwp:num-regional="12"
wwp:num-local="4">twelve</l>
           <l wwp:num-global="33" wwp:num-regional="13"
wwp:num-local="5">thirteen</l>
           <l wwp:num-global="34" wwp:num-regional="14"
wwp:num-local="6">fourteen</l>
           <l wwp:num-global="35" wwp:num-regional="15"
wwp:num-local="7">fifteen</l>
           <l wwp:num-global="36" wwp:num-regional="16"
wwp:num-local="8">sixteen</l>
         </lg>
         <lg type="stanza">
           <l wwp:num-global="37" wwp:num-regional="17"
wwp:num-local="1">seventeen</l>
           <l wwp:num-global="38" wwp:num-regional="18"
wwp:num-local="2">eighteen</l>
           <l wwp:num-global="39" wwp:num-regional="19"
wwp:num-local="3">nineteen</l>
           <l wwp:num-global="40" wwp:num-regional="20"
wwp:num-local="4">twenty</l>
           <l wwp:num-global="41" wwp:num-regional="21"
wwp:num-local="5">twentyone</l>
           <l wwp:num-global="42" wwp:num-regional="22"
wwp:num-local="6">twentytwo</l>
           <l wwp:num-global="43" wwp:num-regional="23"
wwp:num-local="7">twentythree</l>
           <l wwp:num-global="44" wwp:num-regional="24"
wwp:num-local="8">twentyfour</l>
         </lg>
       </lg>
     </lg>
   </body>
 </text>
</TEI>

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>