xsl-list
[Top] [All Lists]

[xsl] converting Word dictionary to FLEx

2018-09-18 16:47:42
Using Saxon 9.8.0.12 in Oxygen
Style sheet version="2.0"
Problem domain is getting a dictionary created in Word with only <p>s, <span>s 
and <b>, and <i> along with some color added to some spans.  
In plain text it looks like:

#-a (dem. adj. of proximity)
variant of -ad

#a-1(+a.f./i.a. verb)
1. so, in order that perhaps <D2>
2. (particle introducing a.f., indicating `near future' or `future 
possibility') <Asp1.19> <D2>
Variant Forms:
ad-(+a.f./i.a. verb) (in 1st person singular and third person plural)
1. so, in order that perhaps
2. (particle introducing a.f., indicating `near future' or `future possibility')
riɣ ad-ftuɣ I want to go.
ira a-t-iẓr He wants to see it.
a-ka-(+a.f.) if only
a(d)-ur-(+a.f./i.a.)
so, lest, in order that perhaps not
(also introduces neg. imp.: "Do not...")
a-ur-imil-(+a.f./i.a.)
perhaps, in order that, in the hope that; lest, maybe it would happen that
ad-ukʷan- (+a.f./i.a.)
1. when, as soon as <Asp1.24> <Na3.10.6>
2. just, repeatedly <Na3.16.2>
ad-ur- (+a.f./i.a.)
so, lest, in order that perhaps not
(also introduces neg. imp.: "Do not...")
Variant Forms:
ad-


Turn this into a flat file suitable to import into a dictionary processing 
program called FLEx. 
Something like:
\lx -a 
\gi (dem. adj. of proximity)
\vao -ad

\lx a-
\hm 1 
\co (+a.f./i.a. verb)
\sn 1 
\de so, in order that perhaps \so <D2>
\sn 2 
\gi (particle introducing a.f., indicating `near future' or `future 
possibility') 
\so <Asp1.19> 
\so <D2>
\sh Variant Forms:
\va ad-
\co (+a.f./i.a. verb) 
\gi (in 1st person singular and third person plural)
\sn 1
\de so, in order that perhaps
\sn 2 
\gi (particle introducing a.f., indicating `near future' or `future 
possibility')
\xv riɣ ad-ftuɣ 
\xe I want to go.
\xv ira a-t-iẓr 
\xe He wants to see it.
\va a-ka-
\co (+a.f.) if only
\va a(d)-ur-
\co (+a.f./i.a.)
\de so, lest, in order that perhaps not
\gid (also introduces neg. imp.: "Do not...")
\va a-ur-imil-
\co (+a.f./i.a.)
\de perhaps, in order that, in the hope that; lest, maybe it would happen that
\va ad-ukʷan- 
\co (+a.f./i.a.)
\sn 1
\de when, as soon as 
\so <Asp1.24> 
\so <Na3.10.6>
\sn 2 
\de just, repeatedly 
\so <Na3.16.2>
I have processed the html output from word into the following snippet:

\entry_number 00001
\lx -a
\vernacular FALSE
\grammatical_info dem. adj. of proximity)
\variant_of -ad

\entry_number 00002
\lx a-
\hm 1
\vernacular FALSE
\co (+|ga a.f.|r |ga i.a.|r  verb)
\senseStart 1  
\definition  so, in order that perhaps 
\source D2
\senseStart 2  
\grammatical_info particle introducing |ga a.f.|r , indicating `near future' or 
`future possibility')
\source Asp1.19
\source D2
\sectionHead Variant Forms:
\variant ad-
\co (+|ga a.f.|r |ga i.a.|r verb
\grammatical_info in 1|sup st|r person singular and third person plural)
\senseStart 1  
\definition  so, in order that perhaps 
\senseStart 2  
\grammatical_info particle introducing |ga a.f.|r , indicating `near future' or 
`future possibility')
<<<<<< above is correct

\example riɣI want to go.                       <<<<<< what I get
\example iraHe wants to see it.

\example riɣ ad-ftuɣ                            <<<<< what I am looking for.    
 I need two more words here. ad-ftuɣ
\translation I want to go.              
\example ira a-t-iẓr  
\translation He wants to see it.

The exact slash codes are not important. Getting ALL the data across is. 
I have only added the Arial class so far on this instead of <span 
style="font-family:&#34;Arial&#34;,sans-serif" lang="EN-GB"> it is  <span 
class="Arial"> 
I am starting with this snippet of code in HTML. 
<p> ...      
   
            <span class="Arial">verb) (in 1<sup>st</sup>person singular and 
third person plural)
        <br />1. so, in order that perhaps
        <br />2. (<i>particle introducing a.f., indicating `ne  ar future' or 
`future possibility'</i>)
               <br />
            </span> 
            <span class="MsoHyperlink">
               <b>
                  <span lang="EN-GB">riɣ</span>
               </b>
            </span>
            <b>
               <span lang="EN-GB">ad-</span>
               <span class="MsoHyperlink">
                  <span lang="EN-GB">ftuɣ</span>
               </span>
            </b>
            
            <span class="Arial">I want to go.<br />            
   .....
</p>     

My guess so far is to match the <br/> and then look for <b> words following but 
don’t include <b> after <span class="Arial" that turns into \translation .

    <xsl:template match="html:br">
        <xsl:element name="span">
            <xsl:attribute name="class">example</xsl:attribute>
            <xsl:value-of select="following::html:b"/>    <<<<<<<<<<<< this 
gives too many 
        </xsl:element>
    </xsl:template>

I hold the slash code in the class attribute until the last step. That way I 
can continue working on the file in XML.      

How do I restrict the <xsl:value-of select="following::html:b"/> to just the 
ones before the next 
<span class="Arial">I want to go.<br />

Thank you

Jim Albright
704-562-1529 unlimited cell
Wycliffe Bible Translators
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>