I have tended to resort to things like
<xsl:variable name="NL" select="codepoints-to-string(10)"/>
to prevent this kind of thing happening.
Michael Kay
Saxonica
On 19 Aug 2020, at 14:02, Willem Van Lishout
willemvanlishout(_at_)gmail(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi all,
I'm having a little trouble with converting whitespace entities.
This line:
<xsl:variable name="nl" select="' '"/>
Gets converted to:
<xsl:variable name="nl" select="
"/>
Which causes the variable to be output as a space when I do <xsl:value-of
select="$nl"/> (I suppose because it gets normalized again during parsing).
I wonder what the best way is to solve this problem? It's not important what
it looks like, but the stylesheet behavior should obviously not change. The
formatting is done by an XSL stylesheet (which removes whitespace-only
nodes), which uses the aforementioned patched Xerces parser and then uses a
custom Saxon serializer to control output indentation and new line settings.
Is this something I could configure in the serializer?
Thanks,
Willem Van Lishout
willemvanlishout(_at_)gmail(_dot_)com
<mailto:willemvanlishout(_at_)gmail(_dot_)com>
On Thu, Jul 30, 2020 at 10:07 AM Willem Van Lishout
willemvanlishout(_at_)gmail(_dot_)com
<mailto:willemvanlishout(_at_)gmail(_dot_)com>
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com
<mailto:xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>> wrote:
Thanks everyone.
What I did is monkey patch Xerces to skip the normalization for attributes. I
still end up with instead of actual carriage returns, but it seems I
can fix that in XSLT by using a character map.
In my research I found out that the .NET XmlTextReader class allows disabling
normalization:
https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmltextreader.normalization?view=netcore-3.1
<https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmltextreader.normalization?view=netcore-3.1>
, perhaps it's useful to somebody...
Willem Van Lishout
willemvanlishout(_at_)gmail(_dot_)com
<mailto:willemvanlishout(_at_)gmail(_dot_)com>
On Wed, Jul 29, 2020 at 1:20 PM Michael Kay mike(_at_)saxonica(_dot_)com
<mailto:mike(_at_)saxonica(_dot_)com>
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com
<mailto:xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>> wrote:
Agreed, the attribute normalization spec is an absolute pain, and being able
to switch it off would have many benefits and no adverse consequences I can
foresee.
Michael Kay
Saxonica
On 29 Jul 2020, at 09:27, Pieter Lamers
pieter(_dot_)lamers(_at_)benjamins(_dot_)nl
<mailto:pieter(_dot_)lamers(_at_)benjamins(_dot_)nl>
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com
<mailto:xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>> wrote:
I have the same problem when storing XSLT documents in eXist-db. Attribute
normalization also kills whitespace there because the spec says whitespace
should be ignored in attribute values (and eXist-db normalizes against the
spec). Isn't it about time to change the spec in this respect?
Pieter
On 28/07/2020 23:18, Willem Van Lishout
willemvanlishout(_at_)gmail(_dot_)com
<mailto:willemvanlishout(_at_)gmail(_dot_)com> wrote:
Hi list,
Like many of you, I assume, I use a version control system when working on
XSLT projects. I'm working together with multiple people, and we run the
code through an XML formatter before checking it in to avoid formatting
differences showing up in the diffs.
The problem is that, due to attribute value normalization, carriage returns
are removed from attribute nodes during XML parsing. When using long XPath
expressions (and this has become very common in XSLT 3, especially with
higher order functions), which are split in multiple lines, this results in
huge single line outputs which are impossible to read.
It seems any sort of XML processing will irreversibly transform the
whitespace, therefore I have to choose between:
- No formatting
- Formatting using non-XML tools?
- Finding a parser that bends the rules...
Have any of you experienced the same problem and did you find a solution?
Thanks.
Willem Van Lishout
willemvanlishout(_at_)gmail(_dot_)com
<mailto:willemvanlishout(_at_)gmail(_dot_)com>
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/2854576> (by
email <>)
--
Pieter Lamers
John Benjamins Publishing Company
Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
tel: +31 20 630 4747
web: www.benjamins.com <http://www.benjamins.com/>
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/3166594> (by
email <>)
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/3166594> (by
email <applewebdata://604C0837-F9C5-41A7-BAE2-10B711B40E47>)
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--