Re: gzip-8bit

ned+ietf-822(_at_)mrochek(_dot_)com wrote:

> Very unlikely, actually. CTE has always been extensible and has always allowed
> X- tokens. The rules for handling unknown CTEs are also well defined.

Actually, there is a problem; the rule seems to be the following
from RFC 2045, section 6.4:

    Any entity with an unrecognized Content-Transfer-Encoding must be
    treated as if it has a Content-Type of "application/octet-stream",
    regardless of what the Content-Type header field actually says.

That basically means that the content is treated as opaque. Obviously,
one cannot decode an unknown encoding, so that part of the CTE is
moot.  But the other part, viz. the content domain (7bit, 8bit, or
binary) is unspecified.  So an MTA presented with a message with
unknown CTE cannot tell (based on MIME header fields) whether or
not the next stage of transfer requires 8BITMIME or binary transport
or plain old 7bit transport.


First of all, there is normally supposed to be no need for a transport MTA to
need to know the domain of the message as a whole or of its various parts. Only
the range of the message as a whole is supposed to matter to the transport. And
the range is determined by the mechanism used to get the message to the MTA in
the first place. That could be conventional SMTP (7bit), 8bitMIME (8bit),
binary SMTP (binary), or some other mechanism that lies outside of the

standards.

Of course the mechanism may not represent the true state of affairs, e.g.
message may be 8bit while being transferred over regular SMTP. It could even be
7bit while being transferred with 8bitMIME. Regardless, if an MTA elects not to
believe the transport labelling, it would be well advised to check the actual
range of the message rather than believing what the various CTE labels say.

I suppose the MTA in question could peek inside the message body to
determine the domain. Unless of course it's not all available (e.g.
it's in a stream too large to be held in memory) -- oops.


Here you appear to be talking about the range of the message, not the domain.
But again, the range is something you're supposed to determine from the
transport, not by looking at the CTE labels of the parts.

And when it comes to upgrading or downgrading of an individual part, both the
domain and range of that part may be important. But the domain can only be
determined by decodiing the entire part, so the oops you refer to already
exists with the present set of CTEs.

As far as the range of an individual part goes, it should be known if the CTE
is known. if the CTE is unknown then you cannot perform the upgrade or
downgrade operation anyway. If the operation is an optional one you move
on. If it isn't optional then you have two choices: (1) Bounce the message or
(2) Check the actual range of the data (which of course can be done without
decoding) and see if it is OK to move forward without performing the downgrade
operation.

The bottom line is that viable options exist for the handling of unknown CTEs
even when dealing with a strema, limited memory, or whatever. They may not be
particularly pleasant, but I will again point out that we do in fact have
experience with what happens when CTEs that lie outside the original set are
used.

If compression is to be considered part of the CTE, then it is
certainly conceivable that some CTE may have binary domain, and
lacking explicit information an assumption of binary domain is
safe (it won't result in foisting incompatible content on the
next hop).  OTOH if compression is considered a separate
attribute, then there doesn't seem to be much point in a CTE with
a binary domain other than "binary", and one could assume 8bit
domain for an unknown CTE.


We've been over this many times before. Compression cannot be a separate
attribute because current agents assume that once the CTE is removed they have
the data identified by the content-type in hand and no further processing is
required. Adding compression as, say, a different header is therefore something
that is guaranteed to cause massive breakage. Whereas adding compression
through a new CTE is something that is should not cause problems with standards
compliant agents.

The danger is that as both of the 2045-specified encodings (as
opposed to the identity CTE values) have 7bit domains, there may
be implementations that presume a 7bit domain for any unrecognized
CTE (if I were a gambler, I'd bet on it).


I'm not much of a gambler, and even if I was this certainly isn't a bet I would
ever have made. But regardless, my sympathy is limited since there never was
any statement that additional CTEs would always be 7bit.

                                Ned