xsl-list
[Top] [All Lists]

[xsl] Safe-guarding codepoints-to-string() from wrong input

2006-12-20 07:35:03
Hi all,

In some translation-stylesheet, I take user-input (arbitrary string) and replace a set of numbers to a set of characters, like this:

$input = "some [34]quoted[34] string"
output --> some "quoted" string

<xsl:analyze-string select="$input" regex="\[(\d+)\]">
   <xsl:matching-substring>
<xsl:value-of select="codepoints-to-string(xs:integer(regex-group(1))" />
   </xsl:matching-substring>
   <xsl:non-matching-substring>
       <xsl:value-of select="." />
   </xsl:non-matching-substring>
</xsl:analyze-string>

Because we are talking tons of data containing the above-like strings (in text files), I'd like to make the codepoints-to-string() a bit more rock-solid. In normal operation, it fails hard. But I'd like it to gracefully degrade: be liberal in what you accept.

I know that control characters are not allowed and throw an "Invalid XML character" error. Also, when adding very wide numbers (like "1234567") give a plural of the same error (Im not sure why). Some characters (I believe these are the ones that are not assigned in Unicode) result in an empty string (like "12345").

Is there a robust way of allowing/disallowing a set of codepoints (other than making one huge lookup list)?

Cheers,
Abel







--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--