Hi Folks,
Over the weekend I’ve been writing a whitespace normalization stylesheet
that transforms input like
<emphasis>Nested<glossterm> phrases </glossterm>with whitespace </emphasis>
into
<emphasis>Nested <glossterm>phrases</glossterm> with whitespace</emphasis>
This is often useful when converting content that was created by word
processing or DTP applications. Probably for typographic reasons, they
tend to include a trailing space in the wrapping element (or rather,
they suggest that you include trailing spaces when you select words).
When converting such a styled phrase to a keyword, a glossary term, or
another semantically significant element, this extra whitespace should
be moved away from its original location and placed after the inline
element.
The challenges have been:
– Dealing with nested elements.
– Not only dealing with whitespace on the right-hand side, but also on
the left-hand side and on both sides of an inline element
– Also considering punctuation and space-like characters in addition to
whitespace.
– Making sure that any trailing punctuation is not extracted from the
footnote paragraph (and placed into the surrounding paragraph) if the
footnote is wrapped in a styling phrase. DTP applications often put
footnote markers – and with them the whole footnote – in styled phrases.
– Making it customizable for different vocabularies (it currently
supports DocBook, TEI, and JATS).
The XSLT has some features that may be of general interest, in
particular passing the relevant text nodes as tunneled parameters and
the footnote scoping.
But read for yourselves:
https://github.com/gimsieke/emphasis-normalize-space
Gerrit
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--