transport-related issues in RFC-XXXX

A few isolated comments, on the theory that the large debates are raging
adequately and don't need additional help from me.

TRANSPORT strawmen:
  I can see three possible outcomes from the extended transport formats
approach in RFC-ZZZZ:
  (i) People decide it is just too much trouble, that encoding everything
for some 7-bit transport is better, and drop the whole business.  IMHO, if
that is the outcome, people should study the PRIME draft RFC, because that
is what you will end up with de facto.  Now, that is important, because,
if one assumes that 8bit transport will be available in all relevant
implementations (i.e., implementations that don't support 8bit transport
are not relevant), RFC-XXXX can be made *much* simplier in several places.
Indeed, one might argue that it should be rejected on the grounds of
excess complexity.
  (ii) The various factions who argued, on this list, that, if one was to
support 8bit transport, why not 16 or 32 (or other numbers) will quietly
disappear and RFC-ZZZZ will either be cut back to 8bits only or the
provisions for extensions to wider transport will just never be specified
and implemented.  Neither would upset me particularly: those provisions
are in the RFC-ZZZZ draft because I (with a lot of help from Ran and
others) wanted to explore what it might mean to support them.
   (iii) Those factions remain active.  If they prevent a consensus from
forming around some child of RFC-ZZZZ (i.e., an extended-SMTP with
extended-negotiation), then we will be back to the first option.  On the
other hand, if a consensus forms around their position, then Ran is quite
correct: the material in RFC-XXXX that explicitly assumes either 7 or 8
needs to be reviewed.

To my tastes, only the first of these options is really, seriously,
undesirable.  Not in itself, but because it leads inexorably to one or
more "just send 8bits" models with whatever (real or imagined) potential
that has for breaking existing implementations and with the certainty of
information distortion and loss even if no implementations roll over and
die.   And it is avoiding that first option, not great personal conviction
about the absolute necessity of 8bit transport, that started me working on
what turned out to be ZZZZ.

  But the current draft of XXXX does seem to me to have a metaproblem in
this area.  It may have been latent in the previous draft, but it seems to
be very much present in this one.  What I discovered with ZZZZ was that,
as soon as one moves past either the assumption of "7 or 8" or the
assumption of "no character sets whose first 128 positions were not more
or less ASCII", the existing relationship between 821 and 822 required the
transport agent to move uncomfortably far into what one would like to
consider transport-transparent message body issues.  I think Nathaniel has
discovered that, if one is going to try to deal with a lot of issues that
are conceptually transport-related in a message body format standard, one
is going to have to push rather far into the transport domain.  At least
the document shows symptoms that I can construe only that way.  The "no
nested encoding"/"no encoding except for certain body parts" issues, of
course, complicate these issues conceptually from both sides, and
complicate the interactions between transport mechanisms and UAs, while
making them much easier in practice.  I don't intend to reopen that issue,
I just want to make the context clear.
  Let me raise an example from the transport side of the fence.  There has
been a recent flurry of comments (on the other list and privately) that
ZZZZ essentially says "if you are using any of this extended stuff, then
your message body is RFC-XXXX-conforming".  That is a nasty thing for a
transport protocol to have to say; transport protocols are supposed to be
transparent.   But look again.  Say we have an MTA that receives "wide"
(e.g., >7 bits) and faces a 7bit MTA at the next hop.  Logically, it has
only a limited range of options--RFCs can only control which of these
options are permitted and what is specified for each: 

  -- It can reject or bounce the message as not further transportable. 
  But it was determined in Atlanta that intra-Internet conversion gateways
between "wide" and "7bit" should be explicitly permitted, authorized, and
supported, so this can't be the preferred behavior.

  -- It can encapsulate the message, with no attempt to pull out internal
headers, and assume that the receiving transport agent will reverse and 
untangle things.
  But this assumption isn't realistic, since that receiving transport
agent may be an "old" one that doesn't know anything about this business.
Moreever, especially if "old" transport agents are involved, encapsulation
is the worst form of nested encoding, which we agreed here is a poor idea.

  -- It can convert from one canonical form to another canonical form.
  But this requires that it understand what canonical form it is
converting from.  If it is going to take, e.g., valid RFC-XXXX in 8 bit
form and turn it into valid RFC-XXXX in 7 bit, then it must know that it
is dealing with RFC-XXXX and not RFC-2000.
  Now I know of only two ways to give it that knowledge.  One is to
specify that everything will be RFC-XXXX, period.  This is global out-of-
band information, and such things work fairly well, even though it
involves being globally specific about what is being transported.  And the
other is to further modify the transport envelope so that the format being
transported is announced there.
  RFC-XXXX's "Body-version" is an example of trying to do this essentially
transport job in the message format.  You need a transport announcement to
guarantee that the message contains a Body-version field and that it needs
to be paid attention to.  After that, perhaps one should just move the
value of that field to the transport level as well.  But, without the
transport announcement, "Body-version" is as vunerable to miscellaneous
semi-private 822 extensions as anything else.  Want to interpret
  Body-version: Somewhat like Marilyn Monroe
anyone?
   Again, I'm not suggesting that what Nathaniel has done is wrong or that
it should be changed, only that both of us are working within the
constraints of a slightly muddled situation and trying to make do.

But I think it is the case that RFC-XXXX should avoid constraining
anything that might be incorporated into some final RFC-ZZZZ until those
options are eliminated at the transport level.  Otherwise, if nothing
else, one begins to build a case that certain features in ZZZZ are
unnecesary simply because RFC-XXXX does not support them.  And that is a
very good position to take iff you want to see option (i): no 8bit-
negotiated model, leading, I predict, to all of the nice 8bit-senders just
sending.

I think that means that the "7 or 8" assumptions ought to come out.  Ran's
suggestions are a reasonable start.


CHARACTER SETS:
 (1) I am strongly opposed to defining the use, definition, or
subtype/parameterness of ISO 10646 until there is an ISO 10646.  Again,
Ran's outline of some weasel-words is reasonable.  But I think one must
defer that for future work.
 (2) Erik and the 2022 sub-working-group: Yes, the 2022 assertion should
be cleared out until they report.  Note that a definition of 2022
identical to the Japanese use isn't adequate without significant
out-of-bound information.  In particular, as I understand it, the Japanese
approach to 2022 is significantly Eurocentric (!  :-) ): There is alway an
initial state of ASCII, which is switched out of.  2022 does not specify
any initial states; in the absence of out-of-band information, even ASCII
would have to be explicitly designated onto GL before one could send
anything. 
   Ran also suggests:
    >   "These future subtypes should not be used until RFCs
    > describing their complete specification and use, including the values
Not strong enough.  MUST NOT.

(3) Overkill ASCII definitions.  Ultimately the very best definition is to
point to X3.4-1986 and leave it at that.  Unfortunately, there is a
history, which we have seen again on this list recently, of reading
"ASCII" and "X3.4" as "well, just about ISO 646 International Reference
Version" (which is true) and then getting to "well, just about ISO 646
National Language Versions".  And on the second, we get into meaning-level
(as distinct from transport-level) interoperability problems.  So I think
it is (and remains) very important to stress that, when we say "ASCII" we
mean precisely what X3.4 says, and not any analogous standards or creative
interpretations.

Craig writes:
   >What's wrong with base64 in 8 bits?  Or native 10646 bodies in
   >old-ZZZZ-style 8-bit SMTP?
  "Native 10646" is (or was until recently) a 32 bit character set that
can use any bit pattern in any of the four octets.  That means that one
must do one of two things: either have rather complex transport escaping
rules to prevent the middle of some character being construed as, e.g.
CRLF or one must figure out a way to treat it as 32 bit chunks (transport
at the applications level if you like).  The original RFC-ZZZZ tried the
first, I think unsuccessfully.  The current RFC-ZZZZ makes provision for
the second.
  Of course, if one prohibits ever transporting "native" 10646 and instead
relies on, e.g., the AUC proposal, at least part of this problem
disappears.  But I think that prohibiting it could turn out to be a lost
cause.  As we have seen with 8bit transport, unless one provides an
approved way to do something that people want to do, they will do what
they like and wrap themselves in cloaks of righteousness in the process.

MULTIPART/DIGEST:
  Consider this a placeholder.  Effective elimination of this
functionality would be a showstopper for me.
     --john