ietf-822
[Top] [All Lists]

Re: content-charset & checksums

1991-10-29 22:15:08
draft (it would be nice) but I want to hear some additional voices on
the list saying that this is OK.

Using name=value for parameters suits me fine. Some other matters:

Good.

1. I don't think the checksum header ideas floating around are well 
enough thought out to go in rfc-xxxx without a lot of danger of 
slowing things down. Since it can/should be optional can't it wait 
for another rfc? A simple checksum at the end of base64 (introduced
by yet another safe character) seems like a harmless idea if
everyone is happy with it.

Adding a checksum to the data portion is not something we can do later.
Old compliant decoders will be munged by such a thing -- there are not enough
characters left to reasonably encode a checksum without using characters that
currently are valid and used for something else.

2. If ISO-2022 defines an algorithm from a sequence of octets to a 
sequence of glyphs then it is a Character set not noticably different 
in that regard from many others. If it doesn't then its nothing. I 
think the problem is that ISO-2022 as used in Japan only supports JIS 
and ASCII: why not call it ISO-2022J? As far as I can tell no one has 
any hopes for a wider  application of ISO-2022, but supporting the 
Japanese subset seems necessary [it would be nasty to punish them when 
they have been good little 7-bitters unlike some].

I see Mark's point that ISO-2022 is a mechanism for selecting character
sets rather than a character set itself. As such, it probably should be a
subtype of text, with a set of attributes unique to it. I don't know enough
about the mechanics of 2022 (and I've even written a JIS to DEC Kanji converter
at one time) to know what the attributes should be, exactly.

Now a problem with ISO-2022 (which 10646/ATM/AUC seems determined
to share) is that the default meaning of octets before any escape
sequence is undefined. We should NOT use the Charset parameter for
this. We should NOT allow the concept of a character set which has
to have extra external information before it is meaningful. If we
have to deal with things like this then we have to register a 
different name for every combination of parameters that people want
to allow. For example we would register ISO-2022-J to mean "ISO-2022
with support for JIS and US-ASCII only, and starting in US-ASCII 
mode". I don't know whether it is possible to have parameterized
character sets in any manageable way. I do know that it is far
too late in the process to be thinking about this. Please lets drop
that possibility and accept one unparameterized name per Character
set: you really can do everything you want in this form because the
parameters only have a small number of useful values in real life.

If you're right, and you may well be, I don't have any problem with this
approach either. However, if you are wrong, and there turn out to be a fair
number of character sets people want to be able to specify under the 2022
umbrella, where's the harm in doing this in parameters? You already have the
parser, might as well use it!

3. All the headers are Content-something. So why not Content-version
instead of Body-version?

Yes!! I keep forgetting this one! I really like the idea of changing from
body-version to content-version. (Another approach would be to change all the
content- to body-, but on reflection I like content- better -- it sounds 
nicer.)

The other question is why have this parameter at all. I gather the
reason is to ensure that we don't get mixed up with the previous
simpler use of Content-type. The original idea was to be compatible 
with that previous use. If that has been abandonned we need some
way to reliably distinguish the rfc-xxxx use. Of the following 3
options I like the version header the least:

  (i)   Change "Content-type" (e.g. to "Content-format")

Or body-type ;-)

  (ii)  If we always had a subtype there would be no confusion since
        the old usage alway lacked a subtype. This eliminates the
        default subtype: not a great loss.

  (iii) Yet another header [Body-version or better Content-version].

I prefer (iii) rather than the implicit mechanisms in (i) and (ii) since it
allows son-of-RFC-XXXX to make changes and indicate them without having to
pick yet another header. I don't know if we'll ever have another progeny
out of all this, but I don't know that we will not, either.

4. On the header question. Keith's proposal is the front runner
at the moment. We haven't heard any strong (let alone show-stopping)
objections, and that was not the case for any of the other proposals.
So let's either hear the objections or get it out with rfc-xxxx.
I would like to see better alignment of the encodings but I guess we
can live without that. I'd like to see a more complete draft.

I still think a separate RFC is not a problem, but it can also be a parallel
one. I also am holding out for better encoding alignment here and I think there
won't be too much problem doing it.

5. How about some optimistic sole arranging for a couple of consecutive
RFC numbers to be allocated for these RFCs so that we can start 
using the real rfc numbers instead of xxxx. Or does this have to
wait for Santa Fe?

Sounds good to me. Mr. Chair?

                                        Ned


<Prev in Thread] Current Thread [Next in Thread>