ietf
[Top] [All Lists]

Question about the character set in HTTP-URLs

2000-08-01 05:10:02
Hey,

Maybe, I'm off topic but I have a question about RFC 2616 (HTTP 1.1).
Used URLs e.g. in PUT or GET methods may include non US_ASCII
characters. RFC 2616 directs the problem to RFC 2396 (URIs) which
claims that only some characters should be printed as is and
others should be escaped by "%" HEXNIBBLE1 HEXNIBBLE2.

Furthermore, RFC 2396 directs the problem of the default target
codepage back to the application of RFC 2396, RFC 2616 in this
case.

Does anybody know the default codepage in URIs of HTTP?

To figure out the problem:
Imagine some files with different foreign characters, e.g.
German umlaut a (HTML auml, Unicode 228, represented in Latin1) and
the Euro sign (HTML euro, Unicode 8364, not represented in Latin1).
What happens if a file name including one or both of this
characters are included? 
The RFC 2396 conforming name requires the usage of "%". But of
what character set? ISO 10646 (Unicode) can't be used because
of the length restriction in "%". Latin1 can't be used because
it doesn't contain the Euro sign. UTF8 or other MBCS can
convert all characters to RFC 2396 comforming characters but
this isn't mentioned in RFC 2616.

What is the appropriate way of handling special characters
and what do other foreign people with much different
characters like chinese, thai, etc?

Thanks, Florian



<Prev in Thread] Current Thread [Next in Thread>