Re: Character-set header (was Re: Minutes of the Atlanta 822ext meeting)

Ned,

Your explanation of the reasons for the separate character set header 
was very helpful.  I think I now understand enough of what is going on 
to have an opinion (which is always dangerous).  It seems that there are 
two positions:

* Build a separate header so that you absolutely and for certain know 
where to find the character set info because, when it is meaningful, you 
need to pull it out without understanding how to parse subtypes.

* Keep the information in the subtypes to avoid a number of odd and/or 
nasty and/or meaningless situations.

Now, this may be naive, but couldn't *both* goals be accomplished by 
being a tad more clever about the content-type syntax?  What we have 
now, more or less, is
  type/subtype-and-stuff
where the "and stuff" consists of zero or more parameters whose content, 
meaning, and order are dependent on the subtype.

What would happen if we invented some peculiar charset-delimiting syntax 
and a small restriction s.t., for example:
  (i) If a character set designation appeared, it would be the first 
parameter for the subtype or the subtype itself.
  (ii) Some special syntax or designation were used for that parameter/ 
subtype which would permit one to lexically determine whether the thing 
was a character set designator, even if the subtype itself were not 
recognized.

I don't have RFC-XXXX in front of me and don't recall the current syntax 
proposal, much less the Atlanta conclusions, 
but notions like
    text/=ISO_8859-1
    text-plus/troff =ISO_8859-6;...
come to mind.  In that model,
    foo/bar =abcdef      Identifies a character set, without requiring 
                    you to know the "meaning" of either "foo" or "bar" 
and
    xxx/yyy zzz          Does not identify a character set, no matter 
                    what.
Whether or not the character set designation was permitted would then be 
a function of type and subtype definitions in the same header.

It is, granted, a little odd, and an "=" as an introducer is probably 
not the best model, but wouldn't this permit having one's cake and 
eating it too?  And it would avoid a lot of robustness problems, e.g., 
if some idiot provided two Content-type fields, you might have to guess 
which one to ignore, but you at least wouldn't have to guess which one 
was bound to the one (or two) character set fields.

    john