A specific problem in this category of "intra-para WS normalization", in
particular when normalizing nested emphasis elements, are embedded notes.
Suppose your XSLT already pulls out whitespace at the beginning or end
of potentially nested emphasis elements like this:
<p>text<emphasis> text <link>text </link></emphasis>text</p>
->
<p>text <emphasis>text <link>text</link></emphasis> text</p>
When dealing with nested inline elements, it is important not to pull
out the trailing whitespace out of this embedded footnote:
<p><phrase>text<fn><p>text. </p></fn></phrase>.</para>
should *not* be normalized to:
<p><phrase>text<fn><p>text.</p></fn></phrase> .</para>
because it will move the previously layout-neutral space at the end of
the footnote text to a place between the footnote marker and the period.
So there should be some rule that doesn't pull out intra-para whitespace
beyond the boundaries of the closest-ancestor para.
I once wrote an XSLT library that does this nested-emphasis whitespace
normalization for DocBook, TEI, and JATS:
https://github.com/gimsieke/emphasis-normalize-space
Rick, your example didn't suggest that the task at hand included
normalizing this particular type of "emphasis-fringe WS", but if it
does, then the library might be useful.
Gerrit
On 05.11.2019 16:29, Peter Flynn peter(_at_)silmaril(_dot_)ie wrote:
On 05/11/2019 01:00, Rick Quatro rick(_at_)rickquatro(_dot_)com wrote:
Hi All,
I have inherited some "interesting" xml that has mixed content and I
am trying to figure out some strategies for getting "cleaner" output
in my XSLT workflow without removing any needed whitespace.
This is a very common problem in handling document XML with significant
amounts of mixed content, especially with nested subelements. It's made
unnecessarily harder by the dropping of white-space-only nodes (we
should have paid more attention to this at the time).
Michael has explained the dropping of insignificant white-space. I don't
need that because I'm typically outputting to LaTeX which does that by
itself, but I do need the reverse: to reinsert the deleted
white-space-only nodes which fall between subelements, eg
...use <tag>foo</tag>, <emph>never</emph> <tag>bar</tag>...
To avoid this, in the template for each element which may occur in mixed
content, you could first call a short named template which tests if the
immediately-preceding node has a name which is not 'text', and in that
case, insert a single space character.
Peter
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--