Niklas,
Three things to note here:
1. The Unicode codepoint for "LATIN SMALL LETTER A WITH DIAERESIS"
(which appears to be the offending character here) is 00 E4.
2. The ISO/IEC 8859-1 codepoint for the Latin equivalent of this, is
codepoint E4 in the Latin-1 table.
3. When Unicode is encoded as UTF-8 (which means: all 7-bit chars are
same as ISO-8859-1 and have length of one-byte, and the rest is done by
some smart algorithm, making the characters length two-byte or
three-byte long, and is independent of byte order), Unicode codepoint 00
E4 is encoded as the hexadimal C3 A4 byte sequence.
To test for this, you can do in Windows the following: create a text
with the letter ä only. Save as ANSI, view as Hexadecimal and you will
see a one byte doc as hex E4. Save the same document as UTF-8 (*not*
UTF-16 or other multibyte encodings for Unicode!) and you will see, when
viewed hexadecimally: C3 A4 byte sequence.
Now for your problem. It is logical to assume that the part of your code
that makes up for the text, finds correctly that the entity for "LATIN
SMALL LETTER A WITH DIAERESIS" is needed and encoded the text with
"&aml;". Which is very nice.
But the code that should make up for the URL, does not do the same
trick. I don't have your ASP code here, but I can only assume that
something goes wrong there. At the very least, the code sees the input
as ISO-8859 and encodes the two-byte UTF-8 sequence as ISO-8859, which,
no doubt, goes wrong.
I would suggest you do the following: use the same encoding for your
link (if a link is encoded with "ä", this will be correctly
translated by the browser to the right HTTP escapes). Another option is
changing your code in a way that it understands unicode. One thing comes
to mind: suppose you also use JScript of JavaScript, the escape() and
unescape() functions do not work correctly with Unicode (they are
infamous for that fact). Use the newer encodeURI() / decodeURI() instead.
Hope this brings you a bit in the right direction. I haven't read
everything in this thread, so I hope I haven't repeated others too much.
If I did, I apologize in advance.
Cheers,
Abel
The url commes out as "Avh%C3%A4mtning" and the link text as
"Avhämtning".
the encoding in the url is wrong. it should be Avh%E4mtning
No, the encoding in the URL is correct. The correct procedure for escaping
non-ASCII characters in a URL is to first encode the character in UTF-8, then
represent each octet of the UTF-8 sequence in hexadecimal as %HH.
The important question is, does the link actually work? If it doesn't work,
which browser are you using?
Michael Kay
http://www.saxonica.com/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--