Re: Character set registration

Maybe I misunderstood the discussion, but it was my belief from
comments on the mailing list that there were browsers and servers
today that supported accept-charset: unicode-1-1 and would transmit
documents in 16-bit Unicode using two octets for each character, and
where octets 10 and 13 had no special significance.


So what happens when you combine this with HTTP implementations that use CR,
LF, and CRLF interchangeably? It's called "chaos".

In effect what you're asking for is for MIME-based agents to change their
generic canonicalization behavior based on a parameter's value. This is exactly
what we've been trying to avoid. It should be possible to determine what sort
of generic canonicalization is allowed by looking at the top-level content type
only.

The HTML parser requires a front end to translate the sequence of
octets used in the character encoding (iso-2022-jp, shift-jis, etc.)
into a sequence of characters. Even if it _is_ 'difficult' to
implement an agent that can accept such character encodings, it
shouldn't mean that text/html; charset=unicode-1-1 should be
disallowed even in negotiated situations.


Your own examples in your previous messages show quite nicely why it should be
disallowed. If you want to use HTML with unicode-1-1, you need a new top-level
type for it.

Registering html as a subtype of application is a possibility, but this is one
case where I think a new top-level type actually is warranted.

As it stands, the MIME proposal would make such an indication not only
unwanted in situations where the charset was not previously
negotiated, it would make the negotiation itself syntactically
illegal. I don't think this is a requirement either for mail or for
the web.


I disagree. I think its a requirement for both as long as you're using
the text top-level type.

It may be that the distinction between text/* and application/* is
artificial, and that we should move toward automatically
cross-registering media types (or at least register anything that is
seen as text/foo to also be application/foo) in order to get around
this dilemma.


If anything, this discussion demonstrates that the distinction between the two
is anything but artificial.

                                Ned