Re: SHOW STOPPERS in the new RFC-XXXX draft

Mark Crispin writes:

     I do not like the way character sets are handled in the new draft.  The
more I look at it, the less I like it.  We now have two mechanisms; a subtype
of TEXT and the new Content-charset header.  Both stick out as glaringly bad
design that need to be hit with bricks.

     First brick is at the TEXT subtype.  This precludes the use of the
subtype syntax for anything other than character sets.


This is precisely why some of us still support the content-charset header.

What's worse, it mixes
usages.  It uses character set names such as US-ASCII or ISO-8859-xxx with
means of specifying character sets such as ISO-2022 and MNEMONIC.


I strongly recommend that if you want this changed, you come up with some text
to deal with it. We have asked people who feel religious about this stuff to
ante up for some time now.

     Second brick is at Content-charset.  This stands out forthrightly as a
poorly-thought out bad idea that is there only because some set of people
demanded it.


Frankly, I did not expect to find content-charset in this draft, and I was
delighted to see it appear. I am on record as disagreeing with you totally on
this one. I see no reason to hash this out all over again.

However, I for one can live without content-charset. If it helps us reach a
consensus, by all means let it be eliminated. I do think this is a mistake that
is going to come back and haunt us, but it is not the only one.

It's meaningless for most of the types, which by itself is
justification enough to axe it.

     Here is my proposal: character sets are specified as a parameter.  The
syntax used for this is the `attribute = value' syntax used with types BINARY,
APPLICATION, AUDIO, IMAGE and video.  This attribute will be called `charset'.
Entities that modify character sets but otherwise are not character sets, such
as MNEMONIC or ISO-2022, will continue to be subtypes of TEXT.  This opens up
the possibility of folding TEXT-PLUS into TEXT, by making its subtypes be
subtypes of TEXT.

     Here are some examples:
      Content-Type: TEXT;charset=US-ASCII
      Content-Type: TEXT;charset=ISO-8859-1
      Content-Type: TEXT/ISO-2022
      Content-Type: TEXT/RICHTEXT;charset=US-ASCII
      Content-Type: APPLICATION/ATOMICMAIL;charset=ISO-8859-1


Apart from the fact that you are now making me write yet another parser to deal
with this stuff, what's the difference between this and having this information
on a separate header line? I'll answer this one myself -- apart from the
syntax, there is NO difference. If you can explain to me what the difference
is I'd be delighted to hear it.

     I object to the characterization of ISO-2022 as `not recommended' and
incompletely specified.  ISO-2022 is quite well-specified.  Granted, its usage
is significantly narrower than its scope, but that's for the implementors and
users of ISO-2022 to worry about, not RFC-XXXX.


Put your text where your mouth is on this one. I am tired of listening to
people go on about how we characterize this or that approach to character sets
without providing substance to support their position. (Side note: Kudos to the
recent posting on 10646 that actually suggested, in concrete terms, how the
standard should be adapted to deal with the changes. Regardless of whether or
not this material is actually used, it was certainly presented in a nice way.)

It is my understanding that the bindings of the various character sets to the
various 2022 specification sequences is somewhat up in the air. Maybe I'm
wildly wrong on this. If so, please correct me by providing text that explains
this stuff for RFC, rather than continuing to rant and rave about it.

     Barring any future documents which define scope-limiting mechanisms, it
should be assumed that there are no scope-limitations.  Such a scope-limiting
mechanism might be:
      Content-Type: TEXT/ISO-2022;charset=JIS-X0208.1983
which specified that the only recognized shift-in code is <ESC>$B [the shift-
out codes back to ASCII are <ESC>(J and <ESC>(B ]  However, this does not need
to be documented here, and people should be able to interoperate as long as
they are allowed to use TEXT/ISO-2022.


It is precisely because we were unable to determine what in hell 2022
scope is that we do not recommend its use. If you have made this determination
please enlighten us.

     I object vehemently to the requirement that MULTIPART cookies (called
boundary-spec) be restricted to alphanum.  I carefully build my cookes to be
impossible in BASE64 and QUOTED-PRINTABLE segments.  Cookies should be defined
as 1*70<any CHAR except SPACE or CTLs>.

     This is a major SHOW STOPPER for me.


Although I can certainly live with MULTIPART boundaries being limited to
alphanumeric, I also would like to open them up a bit. However, I can also
live with the current specification.

However, I object, strongly, to the notion that they should be opened up to
include all characters except spaces or CTLs. This will not work. At the very
least the characters available should be restricted to the minimal invariant
subset of ASCII. Furthermore, they should also be restricted to the set of
characters that will not confuse simple parsers of the Content-Type:  headers.
This leaves you with the intersection of {CHAR - tspecials} and the minimal
invariant subset.

Note also that the use of dashes at the beginning of a boundary line eliminates
any possibility of confusion with BASE64, so this objection of yours is at
least in part a red herring. Quoted-printable material is easily caste in a
form that cannot be confused with a boundary -- simply quote any leading
dashes. That just about covers the conflicts with encoded forms, I believe.

     I object to making the ALTERNATIVE subtype of MULTIPART be REQUIRED for
conformance.  It is a cute functionality, but it is extra baggage for
implementors to worry about.


I also am a little uncomfortable with mandates for this level of compliance,
but for somewhat different reasons. I think recognition of alternative as a
valid multipart subtype should be required. However, specification of how it
should be dealt with by a UA is a local issue, and I don't think this is
territory for us to make mandates in. I object to the way that HR does similar
things, and I don't want to play favorites in my own work.

We can certainly recommend that alternative be implemented in this way. I
don't think we can require it without changing the scope of the problem
area we are dealing with.

     I have given long and hard thought to this, and I think that the DIGEST
subtype of MULTIPART should be excised from the standard.  This is a band-aid
to help RFC-934.  However, the exact same function is obtained with a regular
MULTIPART type with each of its parts being `Content-Type: MESSAGE'.  What's
more, the latter is more flexible, since it allows a TEXT header for the
digest.


The clever thing about digest (as you probably know) is that all you need to
implement it is a state variable that says what the default content-type is
when no content-type specification is present. I really like this feature.
In the worst case is it pretty harmless. I don't want to lose it.

     The BNF for type MESSAGE on page 28 is broken, as it requires that either
the RFC822 or PARTIAL subtypes be given.


I think this is a case of some desire to eliminate the default state. I
for one don't want to eliminate it, so I would like to see the BNF change to
reflect the text, rather than the other way around.

     The optionality of the second partnum of MESSAGE/PARTIAL is a bad idea.
Not many people are going to implement PARTIAL anyway, so you might as well be
tough on specifying an explicit syntax.


I disagree strongly with this. First of all, I think you will see a lot of
implementation of message/partial. It will be especially popular in file
servers of various sorts. I don't think message size limitations are going to
be going away soon, and I do think messages are going to increase in size.

Second, a single pass implementation may not know how many parts are
going to be generated until they have all been created. This is one major
reason for making this number optional on all but the last one.

Third, making the part count optional on all but the last part makes it
possible to deal with inconsistent part counts in a consistent way. I don't
think this is very important, but it is an advantage.

     The BNF for type APPLICATION on page 31 neglects to include the ODA
subtype.  It also omits the ATOMICMAIL stuff that Andrew is using.


Noted.

     The audio subtypes should include X-SUN and X-NEXT as well-known (and
equivalent) subtypes that are essentially U-LAW with a header as follows:
typedef struct {
    int magic;                /* must be equal to 0x2e736e64 */
    int dataLocation; /* Offset or pointer to the raw data after header */
    int dataSize;     /* Number of bytes of data in the raw data */
    int dataFormat;   /* The data format code (1 = MULAW-8) */
    int samplingRate; /* The sampling rate (generally 8012) */
    int channelCount; /* The number of channels (generally 1) */
    char info[4];     /* Textual information relating to the sound. */
} SNDSoundStruct;


I will leave this for people who are using audio to debate. I will note that
this format is not sufficiently general for all situations. For one thing,
sampling is often done in something other than an integral number of samples
per second - 7990.9 is one common rate. In real time applications such
discrepancies DO make a difference.

     The restriction that text mail must have a Content-Type header with a
specific character set can never be enforced.  I would prefer not to have
unenforcible requirements in RFC-XXXX, particularly ones that add extra header
lines for what users will feel is no good reason.


What are you referring to here?

     As noted above, I object to the ALTERNATIVE subtype of MULTIPART be
mandatory.


See above.

     The richtext-to-text translator should be a routine to copy a source
richtext string to a target text string.


Minutiae.

     In the BNF on page 43, attribute has = instead of :=


Noted.

     As noted above, the definition of boundary-spec needs to be fixed.  It
even conflicts with examples in the text!

 Body-Version is misplaced.


?

                                Ned