xsl-list
[Top] [All Lists]

Re: [xsl] Why does the tokenize() function behave strangely when I use ENTITIES and variables?

2016-04-07 09:02:23
It's called attribute value normalization, and is described in the XML 
specification. It's of the bizarreness of XML not being able to define 
consistently whether and when whitespace is significant. If you write a newline 
character entity explicitly in an attribute value, then it decides you probably 
intended it, but if a newline gets in there by a expanding an entity reference, 
it decides that you probably didn't.

When I do this kind of thing I'm increasingly inclined to use 
codepoints-to-string():

<xsl:variable name="rule-separator" select="'codepoints-to-string((10, 10))"/>

That's much more robust against entity-expansion and transcoding glitches.

Michael Kay
Saxonica



On 7 Apr 2016, at 14:40, Costello, Roger L. costello(_at_)mitre(_dot_)org 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi Folks,

I have a stylesheet which reads a text file and tokenizes it. The token 
delimiter is two consecutive newline characters (hex 0A, hex 0A).

If I use the tokenize() function like this:

      tokenize($text-file, '&#x0A;&#x0A;')

then the text file is correctly tokenized.

But if I create an entity:

<!DOCTYPE xsl:stylesheet [
   <!ENTITY line-separator     '&#x0A;'>
]>

and a variable whose value is two line-separators:

<xsl:variable name="rule-separator" 
select="'&line-separator;&line-separator;'"/>

and then use the variable with the tokenize() function:

      tokenize($text-file, $rule-separator)

then the text file is not tokenized correctly. Specifically, the XSLT 
processor uses two consecutive space characters (hex 20, hex 20) as the token 
delimiter rather than two consecutive newline characters (hex 0A, hex 0A) as 
the token delimiter.

Do you know why this is happening? How do I fix it?

/Roger

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>