Re: Charset compromise (Was Re: Character-set) header

Neil Katin writes:

First of all, a "point of information": Ned already pointed out that
we are arguing about a concept (do types other than text have a
content charset) rather than the packaging.  I'm assuming that
your message means that you're willing to accept the concept.


This is, of course, correct. We have been discussing the concept mostly, with
relatively few excursions into syntax.

About the packing:

Personally, I find the idea of chained attributes on the content-type
line a complete "dual" of putting those attributes on separate header
lines.  I made this point in writing to the list when the first draft
that had this came out, but no one ever commented on it.


I agree with this. I don't have a problem with either syntax as long
as we limit syntactic expansion in both cases. Specifically, I already
have a parser for content-type headers. I already have a parser for
header lines that contain a single keyword as their value. I can add
a parser for whatever distinguishing syntax we choose for embedding
character set information in the content-type header, should we choose
to go that way. I can also add a parse for the name=value stuff in content-type
headers, should we choose to keep this syntax.

But I object, STRONGLY, to continued expansion of syntax. If we keep all this
stuff, I don't want to see another type or subtype come along and define, for
example,that the fourth parameter is an RFC1148bis-style attribute-value
pair list:

   /attribute1=value1/attribute2=value2/ ...

I don't want to write a parser for this! (Actually, I already have one, but
most gateways and UAs do not.)

The reason I don't want to keep defining things and adding parsers is
simple -- if the syntax is constrained I can write a parser/analyzer for it 
that uses a table to figure out what to do with what it finds. As the
number of types and subtypes grows I just add to the table as needed. I don't
have to rewrite any code. But if the syntax expands (or changes) I have to
write new code. This is unacceptable, and a SHOW-STOPPER for me.

At Atlanta I brought this up again, and a compromise came out:
optional headers are put in a separate header; those headers that
are required are put within the content-type line.


The advantage of this approach is that we get rid of an extra parser. There is
no semantic difference aside from that.

This, of course, has those aspects of a compromise: you need all
the code of both proposals, but each side got to hold on to something
that they felt important: manadatory headers are represented all together
on the content-type line; optional headers appear just like any other
optional header in today's world (as a separate header line).


Exactly right.

This isn't my preferred choice, but it is an alternative I can live
with.


I agree; I certainly can live with a distinguished syntax for
character set information as part of the content-type. But I think it is
very important to realize that this is just syntactic sugar -- what we're
talking about here is basically a blessing of the concepts Neil and I have
been wanting all along.

That having been said, I have a suggestion for a syntax that I'd like to
propose. I'm a little leery of putting magic cookies into the subtype
itself to distinguish it. Instead, how about simply repeating the / before
a character set specification? You'd then have something like:

    Content-Type: text//us-ascii
    Content-Type: text-plus/TeX//us-ascii

Simply convert the double // to / to treat the character set as part of the
type/subtype string (if that's the view you have of the world, which may
indeed be valid for some applications). Extract the //whatever completely
if you want to remove character set information completely.

We could also insist that the character set appear last, although this is
by  no means mandatory.

Comments?

                                Ned