Marc Lambrichs wrote:
I'm reading in an xml-feed from Adobe InDesign and in some nodes there
are three characters that can't be interpreted by my xsl-translation
using utf-8. The codepoints of these 3 are (octal) 226, 128, 169.
First of all, I would like to know what these characters should
represent. And secondly, could I filter these characters out using
something like translate?
This is not possible. Of the range 226, 128 and 169 are octal, you
mistyped at least the digits '8' and '9'.
Assuming you meant decimal, and you are talking about codepoints indeed,
then there cannot be any problem in reading it, the codepoints 226, 128
and 169 represent the string 
 (not sure the mailer messes this
up), which are:
U+00E2, LATIN SMALL LETTER A WITH CIRCUMFLEX
U+0080, control
U+00A9, COPYRIGHT SIGN
See http://www.unicode.org/Public/UNIDATA/UnicodeData.txt for a full
list of codepoints.
In UTF-8, this is encoded as the following octets (view your input
hexadecimal and you can see if this is indeed correct):
U+00E2 >>> C3A2
U+0080 >>> C280
U+00A9 >>> C2A9
I am not sure what you mean with "can't be interpreted by my
xsl-translation using utf-8", because any valid XSLT processor
understands at least UTF-8 and UTF-16. However, if what you mean is that
these characters are there and should be removed, you can indeed use
translate() to remove them:
translate($yourinput, '
", '')
But if what you mean is that the input has somehow these three values
encoded in such a way that it is not UTF-8, then you will have to change
your input, because it is not possible to process non-UTF-8 (meaning:
containing illegal utf-8 sequences) as if it were UTF-8.
Cheers,
-- Abel Braaksma
http://www.nuntia.nl
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--