Re: [xsl] 0x19 is not a legal XML character

Andrew Welch wrote:

On 6/28/07, Abel Braaksma <abel(_dot_)online(_at_)xs4all(_dot_)nl> wrote:

this may work and will remove all offending U+0019 chars.


The "offending" u+0019 characters could well be good content that's
being written/read in the wrong encoding.

True, but if I remember correctly, then all ISO-646 characters (theancient ASCII ones, before 0x80) are written as is in UTF-8, allISO-8859-x, CPxxx windows/dos encodings, TIS-620, Shift-JIS, GB2312 etc.The only notable exceptions are, I believe, the IBM EBCDIC encodings(but IBM500 is most often used, which has the End Of Medium right at0x19 as well). None of these encodings, not even the EBCDIC ones, usethe 0x19 for a diacritic.

Just trying to state that: I think it is very unlikely that encodingalone (read or write) will be the culprit here (which is often a culpritthough for higher characters).

Of course, it can be valid content, in which case the XML documentsshould be opened as XML 1.1 documents.


Simply stripping them out probably isn't the best approach - you need
to work out why they're there, what put them there and then fix that.
Patching it up afterwards is never a good idea.

agreed, just wanted to show how it can be done in XSLT, if you (the OP)felt a need for it.


Imagine explaining your process to someone else in a years time -
"this step is where we remove the u+0019 characters".

:D :D
Good design starts at the sources.

cheers,
Abel


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--