xsl-list
[Top] [All Lists]

Re: [xsl] Breaking paragraphs one linebreaks

2019-05-09 09:01:16
Hi Manuel,

You can use XSLT. It will be easier if

a) you can use at least XSLT 2.0 and

b) the text nodes with the escaped breaks are immediately below the <seg> elements, without any other highlighting etc. elements around them.

Are these two conditions satisfied?

Gerrit

On 09.05.2019 15:44, Manuel Souto Pico terminolator(_at_)gmail(_dot_)com wrote:
Dear all,

I have a bilingual TMX file containing many tu elements like this, containing full paragraphs:

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
    <header segtype="paragraph" adminlang="en"/>
    <body>
       <tu tuid="1">
          <tuv xml:lang="es">
            <seg>El PSOE ganaría en 10 de las 12 comunidades donde habrá elecciones autonómicas el 26 de mayo, según el último barómetro del CIS. &lt;br&gt;Las excepciones serían Cantabria, donde el PRC, el partido de Miguel Ángel Revilla, sería primera fuerza. &lt;br&gt;&lt;br&gt;Navarra Suma, la coalición de PP, Ciudadanos y UPN, sería primera fuerza en la comunidad foral.</seg>
          </tuv>
          <tuv xml:lang="uz">
            <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib chiqadi.&lt;br&gt;Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi birinchi kuch bo'ladi.&lt;br&gt;&lt;br&gt;"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning birinchi kuchi bo'ladi.</seg>
          </tuv>
       </tu>
    </body>
</tmx>

As you can see there are a few (escaped) line break tags between sentences.

I would like to transform that into something like this, where every tu element contains only sentences:

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
    <header segtype="paragraph" adminlang="en"/>
    <body>
       <tu tuid="1">
          <tuv xml:lang="es">
<seg>El PSOE ganaría en 10 de las 12 comunidades donde habrá elecciones autonómicas el 26 de mayo, según el último barómetro del CIS.</seg>
          </tuv>
          <tuv xml:lang="uz">
<seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib chiqadi.</seg>
          </tuv>
       </tu>
       <tu tuid="2">
          <tuv xml:lang="es">
<seg>Las excepciones serían Cantabria, donde el PRC, el partido de Miguel Ángel Revilla, sería primera fuerza. </seg>
          </tuv>
          <tuv xml:lang="uz">
<seg>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi birinchi kuch bo'ladi.</seg>
          </tuv>
       </tu>
       <tu tuid="3">
          <tuv xml:lang="es">
<seg>Navarra Suma, la coalición de PP, Ciudadanos y UPN, sería primera fuerza en la comunidad foral.</seg>
          </tuv>
          <tuv xml:lang="uz">
<seg>"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning birinchi kuchi bo'ladi.</seg>
          </tuv>
       </tu>
    </body>
</tmx>

Do you think I can use XSLT to do this more or less easily?

I wrote a few XSLT stylesheets years ago but I'm far from being a savvy user.

Thanks in advance for any tips.

Cheers, Manuel
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/225679> (by email <>)

--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>