xsl-list
[Top] [All Lists]

[xsl] XSLT 4: normalize-mixed()

2020-05-24 22:51:39
So fairly often I get to try to do things to documentation XML documents; lots 
of mixed content, and for various reasons someone wants the content 
regularized.  The pretty-print indentation should all be stripped out and 
eventually added back in via some consistent means.  (Or the pretty-print 
indentation is making finding indexed phrases more challenging, or...)

Going through and applying normalize-space() to all the text nodes is an 
obvious bad idea; it loses the spaces before mixed content elements.

<p>These are some <i>words</i></p>

turns into

<p>These are some<i>words</i></p>

and that's no help to anyone.

In a related way, normalize-space() has a narrow definition of white space -- 
U+000A, U+000D, U+009, and U+0020  (linefeed, carriage return, tab and space) 
-- and this is not always entirely helpful.  The content may have non-breaking 
spaces, ideographic spaces, or other fancy spaces in it.

Would it be possible to get a normalize-mixed() that takes a sequence of text 
and element nodes and a sequence of characters, returning a sequence of text 
and elements nodes where any number of the characters in the sequence of 
replace characters have been replaced with single spaces and the trailing or 
leading spaces on the text nodes haven't been deleted?

I realize that there's no reason not to write this as a user-defined function; 
it's how often I wind up wanting it that makes me think it might be something 
to consider as a language function.

-- 
Graydon Saunders  | graydonish(_at_)gmail(_dot_)com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>