xsl-list
[Top] [All Lists]

Merging lines of 3 words or less

2005-09-08 02:32:19
In doing a transcription of some psalms, someone marked up as separate
lines instances where the editor of the print version had wrapped (and
indented) a line.
 What I want to do is pass the file through a stylesheet a merge and
lines with 3 or less words into the line before.
 
 If the source file looks something along the lines of :
 ----
 <?xml version="1.0" encoding="UTF-8"?>
 <div type="psalm" n="5">
     <lg n="5:1">
         <l n="1"><w>Myne</w> <w>wordes</w>, <w>lauerd</w>,
<w>with</w> <w>eres</w></l>
         <l n="2"><w>byse;</w></l>
         <l n="3"><w>Vnderstande</w> <w><c
type="thorn">&#x00FE;</c>e</w> <w>crie</w> <w>ofe</w> <w>me</w>.</l>
     </lg>
     <lg n="5:2">
         <l n="1"><w>Bihald</w> <w>vnto</w> <w>my</w> <w>bede</w>
<w>steuene</w>,</l>
         <l n="2"><w>Mi</w> <w>kynge</w> <w>and</w> <w>my</w>
<w>god</w> <w>ofe</w> <w>heuene</w>.</l>
     </lg>
     <lg n="5:3">
         <l n="1"><w>For</w> <w>to</w> <w><c
type="thorn">&#x00FE;</c>e</w>, <w>lauerd</w>, <w>bidde</w> <w>sal</w>
.<w>I</w>.<w>;</w></l>
         <l n="2"><w>Mi</w> <w>steuene</w> <w>sal</w> <w>tou</w>
<w>here</w> <w>erli</w>.</l>
     </lg>
     <lg n="5:4">
         <l n="1"><w>Erli</w> <w>sal</w> .<w>I</w>. <w>to</w> <w><c
type="thorn">&#x00FE;</c>e</w> <w>se</w> <w>and</w> <w>stande;</w></l>
         <l n="2"><w>For</w> <w>noght</w> <w>god</w> <w>artou</w>
<w>wiknes</w> <w>willande</w>,</l>
     </lg>
     <lg n="5:5">
         <l n="1"><w>Ne</w> <w>wone</w> <w>sal</w> <w>lither</w>
<w>biside</w> <w><c type="thorn">&#x00FE;</c>e</w>
,</l>
         <l n="2"><w>Ne</w> <w>vnrightwise</w> <w>bifor</w> <w><c
type="thorn">&#x00FE;</c>in</w> <w>eyhen</w> <w>be</w>.</l>
     </lg>
     <lg n="5:6">
         <l n="1"><w><c type="THORN">&#x00DE;</c>ou</w> <w>hated</w>
<w>al</w> <w><c type="thorn">&#x00FE;</c>at</w> <w>wirkes</w>
<w>wiknesse;</w></l>
         <l n="2"><w><c type="THORN">&#x00DE;</c>at</w> <w>lighe</w>
<w>spekes</w> <w>leses</w> <w>tou</w> <w>mare</w> <w>and</w></l>
         <l n="3"><w>lesse</w>,</l>
     </lg>
 (etc.)
 </div>    
 -----
 
 Now, the way I'm doing it which *seems* to work is:
 
 -----
 <?xml version="1.0" encoding="UTF-8"?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">
     <xsl:template match="/"><xsl:apply-templates/></xsl:template>
     <xsl:template match="node()|@*" priority="-1">
         <xsl:copy><xsl:apply-templates select="node()|@*"/></xsl:copy>
     </xsl:template>
     <xsl:template match="lg">
         <lg n="{(_at_)n}">
         <xsl:for-each select="l">
         <xsl:choose>
             <xsl:when test="count(w) > 3">
                 <xsl:variable name="lineNum"><xsl:number
count="l[count(w) > 3]" from="lg"/></xsl:variable>
                 <l n="{$lineNum}">
                     <xsl:apply-templates />
                 <xsl:if test="following-sibling::l[1][count(w) &lt; 4]">
                     <xsl:apply-templates select="following-sibling::l[1]"/>
                 </xsl:if>
                 </l>
             </xsl:when>
             <xsl:otherwise/>
         </xsl:choose>
         </xsl:for-each>
             </lg>
     </xsl:template>
     
     <xsl:template match="l[count(w) >3]">
         <xsl:copy><xsl:apply-templates select="node()|@*"/></xsl:copy>
     </xsl:template>
     
     <xsl:template match="l[count(w) &lt; 3]">
         <xsl:apply-templates />
     </xsl:template>
 </xsl:stylesheet>
 -----
 
 I'm just wondering if this is having any unforseen side-effects that
I'm not noticing?
 
 In 150 psalms there are only about 20 instances of lg/l's containing
less-than 4 words which are in fact real lines.  The rest should be
merged.  I figured it was easier to go and correct these 20 after
automatically fixing the hundreds (a few per psalm) which are wrong.
 
 Is this the best way to do it?
 
 -James

-- 
James Cummings, Cummings dot James at GMail dot com
<Prev in Thread] Current Thread [Next in Thread>