ietf-822
[Top] [All Lists]

Re: Content-Transfer-Encoding and yEnc

2002-04-03 00:11:51

There is a proposal afoot, known as "yEnc", for encoding binary Netnews
articles so that they will pass through 8bit transport systems, such as
NNTP. See <http://www.yenc.org>. The perceived benefit is that it reduces
the size of an article by around 30%, as compared with Base64 or uuencode,
and that is a useful benefit not only for transport, but also for storage
space on newsservers.

If you look at the spec, you will see that it is a perfectly reasonable
encoding. Roughly speaking, octets appear as themselves, except for NUL,
CR and LF (and '='), which are escaped by putting an '=' in front and
adding a constant. OK, it is a bit more complicated than that, and there
are extra Bells and Whistles such as a byte count and a CRC check, and
provision for other extensions.

So it could perfectly well have been formulated as a
Content-Transfer-Encoding, but instead it is being promoted as Yet Another
ad hoc protocol, and they have even persuaded some browser implementors to
support it. And when asked why they are doing it that way, they say

    "because we raised this in the MIME newsgroups, and we were told that
    it is not possible to add new CTEs, and in any case it would take
    years, and the Gods of the Internet would never agree to it."

Which is as poor an excuse as they come, even though there may be a grain
of truth behind it.

There isn't even that. I was approached about this. I explained that
it is indeed possible to define new CTEs and how to do it. But for whatever
reason I was unable to get the point across. There's a significant amount
of misunderstanding of MIME, of deployment issues, and how to manage a
transition plan out there, it seems.

Now, if you look at RFC 2045 and RFC 2048, you will see that there ARE
mechanisms for establishing new CTEs, either by a standards track RFC, or
on a private basis using an x-token. BUT these mechanisms are accompanied
by a dire warning against "The standardization of a large number of
different transfer encodings". Which is fair enough, but does not amount
to an absolute "NEVER" prohibition.

Exactly.

    So the question I am asking of this list is whether a proposal for a
    CTE for encoding binaries within an 8bit domain would stand a
    reasonable chance of being accepted by the IESG (assuming for the
    moment that it was technically sound and properly specified)?

The key, of course, is restricting this to an 8bit domain like netnews. Do that
and the problem is vastly simplified. And the answer is that a reasonable
proposal would have a good chance of being accepted.

It would also be possible to specify this for domains like email where
8bit is not guaranteed. But for that you probably have to have a
negotiation/downgrade mechanism.

The benefit is a potential saving of around 30% in bandwidth and storage.
It would be primarily aimed at Usenet (over 90% by volume of which is
currently binaries), though obviously it could be used for Email too. It
would work immediately over the NNTP transport, though gateways to email
might have to downgrade it to Base64 until such time as email systems had
caught on. But I doubt that any significant quantity of the present binary
bulk of Usenet ever finds its way into email.

The downside of not doing it is that it will probably go ahead on an ad
hoc basis anyway. There is mention of including "yEnc:" in the Subject, or
mis-using Content-Type: application/yencoded with CTE: 8bit, or even of
resurrecting the obsolete "conversions" parameter. The mind boggles, but
mind-boggling is unlikely to stop the browser implementors whose loyalty
to IETF standards is tenuous at best :-( .

FWIW, such misuse of a content type is explicitly forbidden.

But as you say, the mind boggles.

But "where will all this stop" you ask. Indeed it is difficult to see what
further encodings would be needed once you had this one which, I reckon,
would barely increase the size of the transported message by more that a
couple of percent beyond a pure binary encoding. Indeed, the only other
CTE I can envisage would be one for 'gzip', or other compression methods
(but those do not apply to the present situation, because most image and
audio media types are already compressed as tight as they will go).

Well, yEnc as presently formulated has a failure mode where some messages
will grow to 2X their original size. I also don't like the fact that
it shifts the range of almost every character. It is possible to fix both
of these problems and have an encoding with an upper bound guarantee that
leaves the original data mostly untouched.

Just my personal take -- neither of these are necessarily showstoppers.

                                Ned