xsl-list
[Top] [All Lists]

Re: [xsl] White space strategies for mixed content

2019-11-05 10:08:05
A specific problem in this category of "intra-para WS normalization", in particular when normalizing nested emphasis elements, are embedded notes.

Suppose your XSLT already pulls out whitespace at the beginning or end of potentially nested emphasis elements like this:

<p>text<emphasis> text <link>text </link></emphasis>text</p>
->
<p>text <emphasis>text <link>text</link></emphasis> text</p>

When dealing with nested inline elements, it is important not to pull out the trailing whitespace out of this embedded footnote:

<p><phrase>text<fn><p>text. </p></fn></phrase>.</para>

should *not* be normalized to:

<p><phrase>text<fn><p>text.</p></fn></phrase> .</para>

because it will move the previously layout-neutral space at the end of the footnote text to a place between the footnote marker and the period.

So there should be some rule that doesn't pull out intra-para whitespace beyond the boundaries of the closest-ancestor para.

I once wrote an XSLT library that does this nested-emphasis whitespace normalization for DocBook, TEI, and JATS: https://github.com/gimsieke/emphasis-normalize-space

Rick, your example didn't suggest that the task at hand included normalizing this particular type of "emphasis-fringe WS", but if it does, then the library might be useful.

Gerrit



On 05.11.2019 16:29, Peter Flynn peter(_at_)silmaril(_dot_)ie wrote:
On 05/11/2019 01:00, Rick Quatro rick(_at_)rickquatro(_dot_)com wrote:
Hi All,

I have inherited some "interesting" xml that has mixed content and I am trying to figure out some strategies for getting "cleaner" output in my XSLT workflow without removing any needed whitespace.

This is a very common problem in handling document XML with significant amounts of mixed content, especially with nested subelements. It's made unnecessarily harder by the dropping of white-space-only nodes (we should have paid more attention to this at the time).

Michael has explained the dropping of insignificant white-space. I don't need that because I'm typically outputting to LaTeX which does that by itself, but I do need the reverse: to reinsert the deleted white-space-only nodes which fall between subelements, eg

...use <tag>foo</tag>, <emph>never</emph> <tag>bar</tag>...

To avoid this, in the template for each element which may occur in mixed content, you could first call a short named template which tests if the immediately-preceding node has a name which is not 'text', and in that case, insert a single space character.

Peter
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>