The DITA Community org.dita-community.i18n project provides general Saxon
extensions for doing locale-aware word and line breaking. It requires either
Saxon PE/EE or custom Java code to register the extension functions for use
with HE (you can do with DITA Open Toolkit automatically starting with version
3.3.1).
https://github.com/dita-community/org.dita-community.i18n
Cheers,
Eliot
--
Eliot Kimber
http://contrext.com
On 5/9/19, 9:01 AM, "Imsieke, Gerrit, le-tex
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de"
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi Manuel,
You can use XSLT. It will be easier if
a) you can use at least XSLT 2.0 and
b) the text nodes with the escaped breaks are immediately below the
<seg> elements, without any other highlighting etc. elements around them.
Are these two conditions satisfied?
Gerrit
On 09.05.2019 15:44, Manuel Souto Pico terminolator(_at_)gmail(_dot_)com
wrote:
> Dear all,
>
> I have a bilingual TMX file containing many tu elements like this,
> containing full paragraphs:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <tmx version="1.4">
> <header segtype="paragraph" adminlang="en"/>
> <body>
> <tu tuid="1">
> <tuv xml:lang="es">
> <seg>El PSOE ganaría en 10 de las 12 comunidades donde
> habrá elecciones autonómicas el 26 de mayo, según el último barómetro
> del CIS. <br>Las excepciones serían Cantabria, donde el PRC, el
> partido de Miguel Ángel Revilla, sería primera fuerza.
> <br><br>Navarra Suma, la coalición de PP, Ciudadanos y UPN,
> sería primera fuerza en la comunidad foral.</seg>
> </tuv>
> <tuv xml:lang="uz">
> <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda
> bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib
> chiqadi.<br>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel
> Revilla partiyasi birinchi kuch bo'ladi.<br><br>"Navarra
> Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning
> birinchi kuchi bo'ladi.</seg>
> </tuv>
> </tu>
> </body>
> </tmx>
>
> As you can see there are a few (escaped) line break tags between
sentences.
>
> I would like to transform that into something like this, where every tu
> element contains only sentences:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <tmx version="1.4">
> <header segtype="paragraph" adminlang="en"/>
> <body>
> <tu tuid="1">
> <tuv xml:lang="es">
> <seg>El PSOE ganaría en 10 de las 12 comunidades donde habrá elecciones
> autonómicas el 26 de mayo, según el último barómetro del CIS.</seg>
> </tuv>
> <tuv xml:lang="uz">
> <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib
> o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib
> chiqadi.</seg>
> </tuv>
> </tu>
> <tu tuid="2">
> <tuv xml:lang="es">
> <seg>Las excepciones serían Cantabria, donde el PRC, el partido de
> Miguel Ángel Revilla, sería primera fuerza. </seg>
> </tuv>
> <tuv xml:lang="uz">
> <seg>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi
> birinchi kuch bo'ladi.</seg>
> </tuv>
> </tu>
> <tu tuid="3">
> <tuv xml:lang="es">
> <seg>Navarra Suma, la coalición de PP, Ciudadanos y UPN, sería primera
> fuerza en la comunidad foral.</seg>
> </tuv>
> <tuv xml:lang="uz">
> <seg>"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy
> hamjamiyatning birinchi kuchi bo'ladi.</seg>
> </tuv>
> </tu>
> </body>
> </tmx>
>
> Do you think I can use XSLT to do this more or less easily?
>
> I wrote a few XSLT stylesheets years ago but I'm far from being a savvy
> user.
>
> Thanks in advance for any tips.
>
> Cheers, Manuel
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/225679>
> (by email <>)
--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--