PLEASE READ -- Open issues list for RFC-XXXX

OPEN ISSUES IN RFC-XXXX
N. Borenstein & N. Freed
November 6, 1991

With the IETF meeting in Santa Fe less than two weeks away, we would like to
crystalize and clarify the outstanding issues in RFC-XXXX, in the hope that
this will increase the likelihood of finalizing the document shortly after that
meeting.

We are very grateful for the large number of suggestions we have received on
this document.  Time does not permit us to produce an entirely new draft before
the Santa Fe meeting.  We have, therefore, divided the suggestions into two
categories:

1.  Comments that affect the substance of the document are, we hope, ALL
mentioned in what follows.  If you made a substantive comment that you consider
important and that you do not see reflected in this document, please tell us. 
There is a good chance it got lost in the shuffle, as we did not deliberately
omit ANY open issues regardless of our position. (This applies particularly
strongly to the comments of Dave Crocker and Olle Jarnefors.  We are extremely
grateful to the two of you for your voluminous and detailed comments.  We have
tried to extract the crucial substatnive comments from these, deferring the
more minor ones for the copy editing stage, but we might have missed something
that is in fact crucial at this stage.)

2.  Other comments that we believed to be non-controversial have been noted,
and many of them have been saved and will be incorporated into the next draft. 
In general, we tend to accept ALL corrections of grammatical errors, MOST
clarifications of wording, SOME stylistic suggestions, and FEW aesthetic or
typographical suggestions.  If you have comments that are in this category, but
which you feel more strongly about than most such comments, please be sure to
check for them in the next draft.  It is possible that they got lost in the
shuffle rather than explicitly rejected, as the volume of such comments has
been quite large.

The open issues (category #1, above) are further subdivided into two lists. 
The first list is those issues where consensus is not yet apparent.  The second
list is the issues for which we believe there is an emerging solution that is
not a show-stopper for anybody.  We have supplemented the list of issues with
our own current positions.

THE MOST IMPORTANT THING TO DO NOW IS TO MAKE SURE THERE ARE NO OPEN ISSUES
THAT ARE NOT ON THESE LISTS.  IF WE HAVE OMITTED ANY, PLEASE TELL US AS SOON AS
POSSIBLE.

The second thing to do between now and Santa Fe is to make as much progress as
posible on the list of open issues.  In particular, if any of the stated
"AUTHORS' POSITION" clauses are show-stoppers for anyone, we want to know NOW.  

It is our hope that these lists will clarify the discussions and the set of
remaining work that must be done in order to submit this document to the IAB as
a proposed standard.

The good news is that there are only around 20 items on these lists, most of
them on the second (emerging solution) list.

Without further ado, here are the two lists of open issues.

LIST A:  OPEN ISSUES WITHOUT APPARENT CONSENSUS

A. 1:  Audio formats.  There are currently two proposals on the table, and they
are actually pretty close -- the key open issue is whether the audio data is
preceded by header data, or whether all non-audio information is included on
the content-type line.

AUTHORS' POSITION:  Put it on the content-type line.

A. 2:  Checksums:  There seems to be a consensus that we should have checksums,
but no consensus on the mechanism, applicability, or level of requirement.

AUTHORS' POSITION:

Modify RFC-XXXX to allow a checksum algorithm and value to be specified either
as a suffix on the Content-Transfer-Encoding line or as a new Content-Checksum
line. Putting this information on the Content-Transfer-Encoding line is neater,
but it does violate some sense of sequence, since a checksum is applied before
the encoding is done. We could go either way on this subissue; see A.5 for
a closely related issue.

Although an algorithm is specified as part of the header information we need to
pick a single algorithm that must be supported and used. Currently we're
leaning towards MD4, since (1) it seems more than adequate, (2) it is already
specified in an RFC, (3) there exists sufficient operational experience to be
reasonably confident that it works, and (4) an implementation is readily
available.

The checksum itself will always be encoded using the BASE64 encoding.

Checksums can be specified for ANY encoding. The argument has been made that
"text is often modified or corrupted and thus should not have checksums applied
to it". Isn't the detection of such problems the whole point of having
checksums? A checksum may provide a reliable indication that a particular
encoding must not be used along a particular path for a particular type of
data.

The canonical representation that is checksummed must be defined for each of
the encodings. Definition of the canonical form is fairly simple for all
encodings except for base64. (In base64 record boundaries are out of band --
any attempt to put them in band creates an ambiguity.) There are several ways
to deal with this problem. One that is totally unambiguous is to have two
checksums, one for the data and one for the boundaries -- compute a checksum
for the data without the boundaries and then compute a checksum of the sequence
of distances between the boundaries, expressed in, say, bytes modulo 2^32.
Another alternative is to eliminate record boundaries completely. Since
boundaries are demonstrably useful this proposal is somewhat less attractive.

The use of checksums should be optional. The validation of checksums, if one is
specified that uses the standard checksum algorithm, is mandatory. The action
to be taken if a checksum does not match is left up to the implementation, but
ignoring the checksum failure is not allowed. If a nonstandard or unsupported
checksum algorithm is used an implementation should note that the checksum was
not validated and provide this information to the message recipient.

I (Ned) should have a draft proposal detailing this approach to checksums ready
by the time of the meeting.

A. 3:  Non-ASCII Headers.  The Keith Moore proposal, in particular. 
Should it be part of RFC-XXXX?

AUTHORS' POSTION:  We endorse the Moore proposal.  We endorse putting it in
RFC-XXXX if all outstanding issues are resolved. These include the use of
numbers and not character set names, alignment with quoted-printable, use of
underscores, and the use of ? as an introductory character. We feel that
numbers are problematic, alignment with quoted-printable is desireable,
existing use of _ should be retained and ? should be retained as the
introductory character.

A. 4:  Quoted-printable encoding:  Changes to ease use in non-ASCII
headers (A.3)

AUTHORS' POSTION:   Change ":" to "=" as q-p special character. Using
underscore as a substitute for spaces is somewhat more questionable, and
we feel that this variance from the Moore proposal is tolerable.

A. 5:  Modification of the encoding model to treat it as a pipe. In
       particular,  a mechanism for specifying compression should be added.

AUTHORS' POSITION: This should be deferred to a future document. We were unable
to reach anything even close to closure on this. The Body-Version header
provides a facility for adding such extensions that are not backwards
compatible.

LIST B:  OPEN ISSUES WITH APPARENTLY ACCEPTABLE (NON-SHOWSTOPPER) SOLUTIONS

B. 1:  Where are PostScript, TeX, and Troff?  

AUTHORS' POSITION:  Remove text-plus/postscript in favor of image/postscript. 
Remove text-plus/tex and text-plus/troff entirely in favor of a later document
that spells out their use in a way that both handles macro packages and takes a
position on security issues (e.g. shell escapes).

B. 2:  Where do body parts start & end?

AUTHORS' POSITION:  A boundary-spec INCLUDES the preceeding newline, thus
permitting body parts that are not newline-terminated.  A body part thus starts
AFTER the CRLF that delimits the part header from the part body, and ends
BEFORE the CRLF that initiates the boundary-spec.  BNF to be updated
accordingly.

B. 3:  Multipart subtypes.  Various changes suggested.

AUTHORS' POSITION:  No change.  Keep /digest, /parallel.  Defer /archive
for another document.  Keep /alternative mandatory.

B. 4:  Alphabet for boundary markers:

AUTHORS' POSITION:  Alphanumerics only is too restrictive.  We should add back
the symbols that are EBCDIC invariant and not tspecials.  The EBCDIC invariants
aside from alphanumerics are:

    .<(+&*);-/,%_>?:'="

Intersecting this with (not tspecials) gives:

    .+&*-%_?'=

B.5 will eliminate = as well, leaving alphnumerics plus .+&*-%_?'

B. 5:  Regularizing the syntax

AUTHORS' POSITION:  All paramaters will be attribute=value.  The only things
affected are "multipart" and "message/partial".  Multipart will have a
"boundary" parameter, with default value of "--------" (so the default
interpart boundary is 10 hypens, final boundary 12 hyphens).  message/partial
will have the parameters "id", "number" and "total".  Default values are the
empty string, 0, and 0, which implies that the defaults are useless for
message/partial.  This syntactic regularization will of course require
significant rewriting of the BNF, and will probably also require some
restructuring of the document so that parameters like "charset" which may apply
to multiple content types will appear in their own section.

B. 6:  Character sets.  

AUTHORS' POSITION:  Get rid of text/charset subtypes, in favor of a parameter
syntax "charset=whatever".  Eliminate Content-charset entirely.

B. 7:  Text-plus

AUTHORS' POSITION:  Merge text-plus into text, which now has no subtypes.  The
default subtype will be called "PLAIN".

B. 8:  Incomplete "character sets" -- How to handle 2022, 10646, MNEMONIC

AUTHORS' POSTION:   Do NOT try to define these in the document.  Don't even
give placeholder names for them.  Suggest that future documents may define a
way to express the use of 10646 or MNEMOMIC as a character set, or something
like "ISO-2022-jpn-7" as a text subtype. (Fallback:  if an expert (MRC?) can
provide the right prose, we could define the 2022 stuff here.)

B. 9:  Binary type is not yet adequate for bitstreams.

AUTHORS' POSITION:  Resistant to "binary/bitstream" suggestion.  Possible
compromise:  Optional "padding=n" parameter on content-type to specify number
of bytes used to pad a bitstream to an 8-bit boundary.

B. 10:  message/rejected and/or message/forwarded subtypes

AUTHORS' POSTION:  Defer to future a document.

B. 11: Change "binary" to "file"

AUTHORS' POSTION:  No.  There are binary objects other than files.

B. 12:  Body-Version header

AUTHORS' POSTION:  Retain.  No change of name to "content-version". 
Clarification:  Body-Version is NOT required for body parts, only at top
level.  (Required in content-type message, IF the encapsulated message
is itself XXXX-compliant.)  Do NOT extend syntax to describe subtypes of
XXXX, sets of RFCs complied with.

B. 13:  What is ASCII?  Replace Appendix D with the words "ANSI X3.64-1986"?

AUTHORS' POSTION:  If our appendix in fact corresponds precisely to ANSI
X3.64-1986, we should nuke the appendix and say so.  Does it? Experts on
characters sets please comment further.

B. 14:  Trojan horses in mail

AUTHORS' POSTION:  We should not define/endorse any content-types that
are known to have Trojan horses.  (This is one reason why we removed
troff & tex.)  We should add a Security Appendix explaining the dangers
of certain conceivable content-types, but we should ban nothing.

B. 15:  OID's (Kille)

AUTHORS' POSTION: OID issues should be deferred to future documents. Various
other ISO-related matters, such as X.400 interoperability and content-types
X.400 interoperability needs have also been deferred in a similar way.