xsl-list
[Top] [All Lists]

Re: [xsl] Breaking paragraphs one linebreaks

2019-05-09 09:25:26
The DITA Community org.dita-community.i18n project provides general Saxon 
extensions for doing locale-aware word and line breaking. It requires either 
Saxon PE/EE or custom Java code to register the extension functions for use 
with HE (you can do with DITA Open Toolkit automatically starting with version 
3.3.1). 

https://github.com/dita-community/org.dita-community.i18n

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 

On 5/9/19, 9:01 AM, "Imsieke, Gerrit, le-tex 
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

    Hi Manuel,
    
    You can use XSLT. It will be easier if
    
    a) you can use at least XSLT 2.0 and
    
    b) the text nodes with the escaped breaks are immediately below the 
    <seg> elements, without any other highlighting etc. elements around them.
    
    Are these two conditions satisfied?
    
    Gerrit
    
    On 09.05.2019 15:44, Manuel Souto Pico terminolator(_at_)gmail(_dot_)com 
wrote:
    > Dear all,
    > 
    > I have a bilingual TMX file containing many tu elements like this, 
    > containing full paragraphs:
    > 
    > <?xml version="1.0" encoding="UTF-8"?>
    > <tmx version="1.4">
    >     <header segtype="paragraph" adminlang="en"/>
    >     <body>
    >        <tu tuid="1">
    >           <tuv xml:lang="es">
    >              <seg>El PSOE ganaría en 10 de las 12 comunidades donde 
    > habrá elecciones autonómicas el 26 de mayo, según el último barómetro 
    > del CIS. &lt;br&gt;Las excepciones serían Cantabria, donde el PRC, el 
    > partido de Miguel Ángel Revilla, sería primera fuerza. 
    > &lt;br&gt;&lt;br&gt;Navarra Suma, la coalición de PP, Ciudadanos y UPN, 
    > sería primera fuerza en la comunidad foral.</seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    >              <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda 
    > bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib 
    > chiqadi.&lt;br&gt;Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel 
    > Revilla partiyasi birinchi kuch bo'ladi.&lt;br&gt;&lt;br&gt;"Navarra 
    > Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning 
    > birinchi kuchi bo'ladi.</seg>
    >           </tuv>
    >        </tu>
    >     </body>
    > </tmx>
    > 
    > As you can see there are a few (escaped) line break tags between 
sentences.
    > 
    > I would like to transform that into something like this, where every tu 
    > element contains only sentences:
    > 
    > <?xml version="1.0" encoding="UTF-8"?>
    > <tmx version="1.4">
    >     <header segtype="paragraph" adminlang="en"/>
    >     <body>
    >        <tu tuid="1">
    >           <tuv xml:lang="es">
    > <seg>El PSOE ganaría en 10 de las 12 comunidades donde habrá elecciones 
    > autonómicas el 26 de mayo, según el último barómetro del CIS.</seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    > <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib 
    > o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib 
    > chiqadi.</seg>
    >           </tuv>
    >        </tu>
    >        <tu tuid="2">
    >           <tuv xml:lang="es">
    > <seg>Las excepciones serían Cantabria, donde el PRC, el partido de 
    > Miguel Ángel Revilla, sería primera fuerza. </seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    > <seg>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi 
    > birinchi kuch bo'ladi.</seg>
    >           </tuv>
    >        </tu>
    >        <tu tuid="3">
    >           <tuv xml:lang="es">
    > <seg>Navarra Suma, la coalición de PP, Ciudadanos y UPN, sería primera 
    > fuerza en la comunidad foral.</seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    > <seg>"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy 
    > hamjamiyatning birinchi kuchi bo'ladi.</seg>
    >           </tuv>
    >        </tu>
    >     </body>
    > </tmx>
    > 
    > Do you think I can use XSLT to do this more or less easily?
    > 
    > I wrote a few XSLT stylesheets years ago but I'm far from being a savvy 
    > user.
    > 
    > Thanks in advance for any tips.
    > 
    > Cheers, Manuel
    > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
    > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/225679> 
    > (by email <>)
    
    -- 
    Gerrit Imsieke
    Geschäftsführer / Managing Director
    le-tex publishing services GmbH
    Weissenfelser Str. 84, 04229 Leipzig, Germany
    Phone +49 341 355356 110, Fax +49 341 355356 510
    gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de
    
    Registergericht / Commercial Register: Amtsgericht Leipzig
    Registernummer / Registration Number: HRB 24930
    
    Geschäftsführer / Managing Directors:
    Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
    
    
    
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>