more content-charset


`Any problem is trivial given the right data structures.'

If as you say people will not write high-quality, robust parsers for RFC-XXXX
then I will withdraw my support for RFC-XXXX and will say now that we are all
wasting our time.

I do not give a goddam about cheap parsers written by lazy programmers.  The
purpose for standards is to have something to point at when something doesn't
work and identify what needs to be fixed.

Right now there aren't that many RFC-XXXX parsers in the world at all.  Of the
ones that do exist, how many of them will ever do anything useful with your 5-
25 lines of 80 char/line out-of-band information that you store alongside a
special subtype of application?  Isn't this something that is essentially
private to your software?  If it isn't, then why isn't it in RFC-XXXX?

I care about a simple BNF that expresses the syntax in a straightforward way
without complexity or special cases.  The BNF of RFC-XXXX as it stands is far
too complex with too many special cases.  The replacement BNF I proposed boils
down to:
        Content-Type    := type ["/" subtype] 1*[";" attribute "=" value]
It is clear, it is consistent, and it consolidates the information in one
place.  I can not emphasize how important clarity, consistency, and
consolidation are.  The current syntax is unclear, inconsistent, and scatters
data.

I don't understand why you are being so obstreperous on this.  Your own
admission is that it doesn't make that much difference to you.  It does make a
big difference to me; I have no clear idea how to deal with a Content-Charset
header.  I don't even know what it means in most cases.

Please remember that my code is low-level parsing code and I don't necessarily
have any control at all over UA's or MTA's.  I can't believe that you are
suggesting that I convert the data into the right format prior to delivery (as
if I control the MTA).  Why can't we get the data in the right format the
first time?  It isn't as if we're trying to preserve an infrastructure here as
we are for 7-bits; we're *defining* the format, damnit, and have the
opportunity to get it defined right.

I translate incoming RFC-822/XXXX mail into a set of abstract objects.  I can
see, very plainly, that the character set is part of the basic attributes of
certain types and not something that globally applies to all types.  All of
the other headers apply globally to all types -- Type, TransferEncoding, ID,
Description.

If the charset is an attribute, then it is one of a set of named
attribute/value pairs passed in the object.  If on the other hand it is a
separate header, then my code *must* (1) recognize the header, (2) insert it
in the object.

It isn't merely enough to insert all attribute/value pairs without caring what
they are or what they mean.  I have to *know* what Content-Charset: means; I
don't have to *know* what ;CHARSET=US-ASCII means.

You also give me a terrible problem.  What does an audio or video object look
like?  Does it contain a charset member?  Why should it?  Why shouldn't it?
If it should, then what does it mean?  If it shouldn't, what do I do when I
get one?  If I have a place for a charset member for audio or video, what
default do I use?

These decisions don't belong to the low level parser.  They belong to the UA.
The UA looks at the parameters and decides what they mean (or don't mean).
Don't assume that the RFC-822/XXXX parser is the UA.

If you want Content-Charset, don't you also want Content-Language?  Don't you
also want Content-Color-Palette?  Content-TV-System?  Content-Dolby-System?
Content-Filename?  Content-Audio-Rate?  Content-troff-macro-package?  If we
have Content-Multipart-Delimiter then we can get rid of parameters all
together.

Making the headers open-ended invites this abuse, and ultimately makes it
impossible for a parser to decide what in this mess needs to be passed to the
UA and what is `Favorite-Beer' bullshit.