So fairly often I get to try to do things to documentation XML documents; lots
of mixed content, and for various reasons someone wants the content
regularized. The pretty-print indentation should all be stripped out and
eventually added back in via some consistent means. (Or the pretty-print
indentation is making finding indexed phrases more challenging, or...)
Going through and applying normalize-space() to all the text nodes is an
obvious bad idea; it loses the spaces before mixed content elements.
<p>These are some <i>words</i></p>
turns into
<p>These are some<i>words</i></p>
and that's no help to anyone.
In a related way, normalize-space() has a narrow definition of white space --
U+000A, U+000D, U+009, and U+0020 (linefeed, carriage return, tab and space)
-- and this is not always entirely helpful. The content may have non-breaking
spaces, ideographic spaces, or other fancy spaces in it.
Would it be possible to get a normalize-mixed() that takes a sequence of text
and element nodes and a sequence of characters, returning a sequence of text
and elements nodes where any number of the characters in the sequence of
replace characters have been replaced with single spaces and the trailing or
leading spaces on the text nodes haven't been deleted?
I realize that there's no reason not to write this as a user-defined function;
it's how often I wind up wanting it that makes me think it might be something
to consider as a language function.
--
Graydon Saunders | graydonish(_at_)gmail(_dot_)com
Þæs oferéode, ðisses swá mæg.
-- Deor ("That passed, so may this.")
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--