Re: Question about the character set in HTTP-URLs

Hello Florian,

You are right that currently, there is no well-defined way to
include arbitrary characters into URIs, or to interpret URIs
and find out which characters they contain. So if you have
a file with an a-umlaut and an Euro sign in it, to construct
an http URI for it, you have to make sure you know the
encoding that the server exposes. This may be the same
encoding as the one that is actually used in the file
system itself (in many cases), or it may be not.

Efforts are going on to make sure we can improve on the
current state. You can find an overview, including the
document James already mentioned in another mail, at
http://www.w3.org/International/O-URL-and-ident.html.
If you want to make sure you stay in sync with this,
and will be able to enjoy the benefits of the effort
going on, you should set up your server so that it
exposes file names as UTF-8.

Regards,   Martin.

At 00/08/01 14:06 +0200, Florian Gro゜e-Coosmann wrote:

Hey,

Maybe, I'm off topic but I have a question about RFC 2616 (HTTP 1.1).
Used URLs e.g. in PUT or GET methods may include non US_ASCII
characters. RFC 2616 directs the problem to RFC 2396 (URIs) which
claims that only some characters should be printed as is and
others should be escaped by "%" HEXNIBBLE1 HEXNIBBLE2.

Furthermore, RFC 2396 directs the problem of the default target
codepage back to the application of RFC 2396, RFC 2616 in this
case.

Does anybody know the default codepage in URIs of HTTP?

To figure out the problem:
Imagine some files with different foreign characters, e.g.
German umlaut a (HTML auml, Unicode 228, represented in Latin1) and
the Euro sign (HTML euro, Unicode 8364, not represented in Latin1).
What happens if a file name including one or both of this
characters are included?
The RFC 2396 conforming name requires the usage of "%". But of
what character set? ISO 10646 (Unicode) can't be used because
of the length restriction in "%". Latin1 can't be used because
it doesn't contain the Euro sign. UTF8 or other MBCS can
convert all characters to RFC 2396 comforming characters but
this isn't mentioned in RFC 2616.

What is the appropriate way of handling special characters
and what do other foreign people with much different
characters like chinese, thai, etc?

Thanks, Florian

-
This message was passed through ietf+censored(_at_)alvestrand(_dot_)no, which
is a sublist of ietf(_at_)ietf(_dot_)org(_dot_) Not all messages are passed.
Decisions on what to pass are made solely by Harald Alvestrand.