xsl-list
[Top] [All Lists]

Re: [xsl] unreadable characters from indesign

2007-01-17 15:49:11
Marc Lambrichs wrote:
I'm reading in an xml-feed from Adobe InDesign and in some nodes there are three characters that can't be interpreted by my xsl-translation using utf-8. The codepoints of these 3 are (octal) 226, 128, 169. First of all, I would like to know what these characters should represent. And secondly, could I filter these characters out using something like translate?


This is not possible. Of the range 226, 128 and 169 are octal, you mistyped at least the digits '8' and '9'.

Assuming you meant decimal, and you are talking about codepoints indeed, then there cannot be any problem in reading it, the codepoints 226, 128 and 169 represent the string 
 (not sure the mailer messes this up), which are:

U+00E2, LATIN SMALL LETTER A WITH CIRCUMFLEX
U+0080, control
U+00A9, COPYRIGHT SIGN

See http://www.unicode.org/Public/UNIDATA/UnicodeData.txt for a full list of codepoints.

In UTF-8, this is encoded as the following octets (view your input hexadecimal and you can see if this is indeed correct):
U+00E2  >>> C3A2
U+0080  >>> C280
U+00A9  >>> C2A9

I am not sure what you mean with "can't be interpreted by my xsl-translation using utf-8", because any valid XSLT processor understands at least UTF-8 and UTF-16. However, if what you mean is that these characters are there and should be removed, you can indeed use translate() to remove them:

translate($yourinput, 'â€&#169", '')

But if what you mean is that the input has somehow these three values encoded in such a way that it is not UTF-8, then you will have to change your input, because it is not possible to process non-UTF-8 (meaning: containing illegal utf-8 sequences) as if it were UTF-8.

Cheers,
-- Abel Braaksma
  http://www.nuntia.nl

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>