Re: Charset compromise (Was Re: Character-set) header


While I sympathise with Bob's concern about issues being changed
at an IETF meeting by those people who cannot attend, the content
charset issue *was* raised before the meeting.  Not hearing an adequite
response I raised it again.

It is not my intention to try and ram a solution down people's throat,
but I don't want toe compromise reached at Atlanta to thrown out because
of a "rules violation" when we did raise the issue beforehand.


You are right. I am wrong. When I reviewed the archive I remembered your 
comments. I'm afraid I failed to remember them before because I failed
to understand them. Still, sadly, the case.

As far as I can see your comments do not mention Ned's "gateway does
not have to look at the Content-type" argument. In fact I don't see any
concrete use suggested for the separate character set header. I have
a lot of trouble with purely theoretical arguments with no examples, 
but I can't blame you because I've done it myself.

As long ago as April you wrote:

With that said, I'ld like to state my opinion:

I think that the "character set encoding" information should
be split to a different field.  Why?  I have a model where there
are pieces of information that the "mail system" wants to understand,
independent of the data type -- things like encoding method(s),
compression method(s), character encodings, etc.  This is all
information on the body part "envelope"; things that are carried
around in addition to the actual data.


And then on 3-Jun:

----------------------

Page 19, 5.1, the Text content-type

#9 (show-stopper)

Character sets are currently identified for the text class of content-types.
The character set is represented as the sub-class of the "text" class,
but nowhere else are character sets identified.  There are several
problems with this approach:

Other classes than text have the need to identify content types.
For example, troff docs need this indication.  The example of a
binary "tar" file also could use the indication.  You might even
need it for a "binary" compound doc file that has embedded strings.

While it is true that many text document systems do embody a character
set identification within the document, not all do.  Since this is
reality, I think that a separate Content-character-set field should
be allowed for all body-parts; this field would default to US-ASCII
if not present.

This proposal costs nothing, and allows a broader set of applications
to use RFC-XXXX for non-US languages.
----------------------

Page 19, 5.1, the Text content-type

#10 (argument)

This is another character set comment.  It referrs to both ISO-10646
and ISO-2022.

Idenifying a character set is a "good thing".  But then you have to ask
what the UA (either end-user or gateway) is going to do with this
information.

Alas, ISO-10646 and ISO-2022 are less "character-sets" in the traditional
sense, as they are a character-set-registry-and-encoding standard.
What this means is that these two ISO standards say how to switch
the character sets (sometimes called pages).  Alas, in order to
find this information you typically have to scan the entire document
to glean this information.  It would be very useful to indicate,
at the top level of a body part, which actual character sets are
needed in order to display the document; this way the software can
tell if it can successfully translate (in the case of a gateway) or
display the document.

With that said, I'ld like to state my opinion:

I think that the "character set encoding" information should
be split to a different field.  Why?  I have a model where there
are pieces of information that the "mail system" wants to understand,
independent of the data type -- things like encoding method(s),
compression method(s), character encodings, etc.  This is all
information on the body part "envelope"; things that are carried
around in addition to the actual data.


So I agree: no decorum was ignored. If I didn't respond to you before
that is my fault. I just didn't know it was on the agenda until it
won the vote. Point-of-order retracted. As I said previously there
does not seem to be much support for my position anyway given that Mark's
concern is with header-orthogonality.

So, you've won your separate Content-Charset header. How about a nice
detailed example of what some User or other agent might actually do
with it.

Bob Smart