Re: more content-charset stuff

Mark Crispin writes:

`Any problem is trivial given the right data structures.'

If as you say people will not write high-quality, robust parsers for RFC-XXXX
then I will withdraw my support for RFC-XXXX and will say now that we are all
wasting our time.


Please read what I said. Don't take a small chunk of a long message out of
context and blow it up all out of proportion.

As for what you do with your support of anything, I for one don't give a shit
what you do with it. if you like I can provide the exact scatology of something
you might possibly do with it if your anatomy allows for it.

What I SAID was that high quality RFC822 parsers exist currently. I think we
should take advantage of this _fact_, rather than depending on an evolutionary
process to produce yet another generation of high quality parsers, especially
since the gain we get by going for another round of parser development is
nonexistent.

I do not give a goddam about cheap parsers written by lazy programmers.  The
purpose for standards is to have something to point at when something doesn't
work and identify what needs to be fixed.


I don't care about them either, except that I recognize that they exist and I
think it is downright stupid to antagonize them simply for the sake of being
extra-special-clever.

Right now there aren't that many RFC-XXXX parsers in the world at all.


Precisely my point.

Of the
ones that do exist, how many of them will ever do anything useful with your 5-
25 lines of 80 char/line out-of-band information that you store alongside a
special subtype of application?  Isn't this something that is essentially
private to your software?


No, it is not private to my application. It is private to one operating system
I support (VMS). I expect that once we standardize this you will see it used a
lot, if the extensions to FTP VMS uses are any indication (and they are not
even standardized, yet several vendors support them).

If it isn't, then why isn't it in RFC-XXXX?


I don't put stuff in RFC-XXXX that belongs in a follow-on RFC. I have been
chastiszed enough for trying to do this in the past. We have designed a
framework. I hope to put the framework to use.

I care about a simple BNF that expresses the syntax in a straightforward way
without complexity or special cases.  The BNF of RFC-XXXX as it stands is far
too complex with too many special cases.  The replacement BNF I proposed boils
down to:
      Content-Type    := type ["/" subtype] 1*[";" attribute "=" value]
It is clear, it is consistent, and it consolidates the information in one
place.  I can not emphasize how important clarity, consistency, and
consolidation are.  The current syntax is unclear, inconsistent, and scatters
data.


I also care about clear consist BNF. But this is just another one of the
innumerable red herring you keep tossing out -- the amount of BNF is not a good
measure of anything at all. And besides, I'm proposing that the things you call
parameters can simply be on separate headers.  Since the BNF for headers is all
already in place, I don't need _any_ additional BNF. How's that for simplicity?

I don't understand why you are being so obstreperous on this.  Your own
admission is that it doesn't make that much difference to you.  It does make a
big difference to me; I have no clear idea how to deal with a Content-Charset
header.  I don't even know what it means in most cases.


Well, if you want the honest truth, I think your idea is poor. That's why I
don't like it.

Your idea is not completely unworkable. That's why I can live with it. I simply
think it is poor enough to criticize, and I'm doing just that. And your
arguments have thus far impressed me less and less and have appeared to get
increasingly slipshod as this debate has continued.

Please remember that my code is low-level parsing code and I don't necessarily
have any control at all over UA's or MTA's.  I can't believe that you are
suggesting that I convert the data into the right format prior to delivery (as
if I control the MTA).  Why can't we get the data in the right format the
first time?  It isn't as if we're trying to preserve an infrastructure here as
we are for 7-bits; we're *defining* the format, damnit, and have the
opportunity to get it defined right.


Let's see. You don't have control over the UA. You don't have control over
the MTA? What, precisely, is that you do have ;-)

This is another red herring. I was making a transformational argument only. I
simply pointed out that I could tranform one scheme into another quite
trivially and that you could not then tell the difference. I _never_ proposed
that you should actually do this!

I translate incoming RFC-822/XXXX mail into a set of abstract objects.  I can
see, very plainly, that the character set is part of the basic attributes of
certain types and not something that globally applies to all types.  All of
the other headers apply globally to all types -- Type, TransferEncoding, ID,
Description.

If the charset is an attribute, then it is one of a set of named
attribute/value pairs passed in the object.  If on the other hand it is a
separate header, then my code *must* (1) recognize the header, (2) insert it
in the object.


And what is the difference between this and identifying a set of headers to
pass as parameters? Suppose we say that any content- header is a potential
parameter and you should pass it on for possible use. Please explain to me why
this is any different.

It isn't merely enough to insert all attribute/value pairs without caring what
they are or what they mean.  I have to *know* what Content-Charset: means; I
don't have to *know* what ;CHARSET=US-ASCII means.


According to your description, you don't have to know what either one of them
means! You simply extract the information and pass it on to the viewer or
whatever.

You also give me a terrible problem.  What does an audio or video object look
like?  Does it contain a charset member?  Why should it?  Why shouldn't it?
If it should, then what does it mean?  If it shouldn't, what do I do when I
get one?  If I have a place for a charset member for audio or video, what
default do I use?


And this problem goes away when you see a ; charset=whatever; parameter
instead? I think not.

These decisions don't belong to the low level parser.  They belong to the UA.
The UA looks at the parameters and decides what they mean (or don't mean).
Don't assume that the RFC-822/XXXX parser is the UA.


I did not make any such assumption. A parser simply extracts the information.
What you have not done is tell me why parsing a content-type line is
fundamentally any different than parsing a bunch of headers (excepting that the
former is harder, requires more code, and thus increaes the chances of getting
it wrong).

If you want Content-Charset, don't you also want Content-Language?  Don't you
also want Content-Color-Palette?  Content-TV-System?  Content-Dolby-System?
Content-Filename?  Content-Audio-Rate?  Content-troff-macro-package?  If we
have Content-Multipart-Delimiter then we can get rid of parameters all
together.

Making the headers open-ended invites this abuse, and ultimately makes it
impossible for a parser to decide what in this mess needs to be passed to the
UA and what is `Favorite-Beer' bullshit.


The decision that was reached in Altanta was to use the positions after the
type/subtype for required parameters, and to use additional Content- headers
for optional parameters. I liked this idea then. I like it now.

However, we have now reached a point where we're caught between the two forms.
And I do agree with you that we need to go one way or another; it is the
mixture that really messes things up.

So I'm going to give up at this point. I need a standard more than I need for
this point to be settled in my favor, and frankly, your ability to toss out the
reddest of red herrings is truly amazing.

So, I have now retreated to the position that either optional parameter=value
or extra headers is fine. If we go with the parameter=value format I want the
content-charset header to be removed. In either case I want the charset subtype
of text removed.

I don't care whether we change multipart or not. I don't see any real problem
with having mandatory parameters without the name= form (plenty of languages
support this sort of trick for passing parameters to subroutines so I don't
see why we cannnot use it too). On the other hand, requiring the name=
in there is not a problem either.

I hope this is satisfactory to you.

                                        Ned