ietf-smtp
[Top] [All Lists]

Re: [ietf-smtp] Compressing SMTP streams

2016-02-01 07:20:49


--On Monday, February 01, 2016 10:53 AM +0000 Paul Smith
<paul(_at_)pscs(_dot_)co(_dot_)uk> wrote:

On 31/01/2016 02:48, John R Levine wrote:

...
The 8BITMIME
extension  was defined in 1994, and nearly all current MTAs
can handle it.  But  all it says is that the MTA can handle 8
bit characters.  It doesn't  affect the rule that the message
must consist of lines no more than  1000 characters long with
\r\n at the end of each.

It also says that the MTA will do something sensible --downgrade
and encode or reject-- if some next-hop system will not accept
the extension.   As noted elsewhere, a lot of systems ignore
that requirement and just adopt what we used to call "just send
8", but that really isn't a good idea, if only because, today, a
supposedly 8-bit-clean system that doesn't bother to advertise
8BITMIME probably suffers from other symptoms of an ancient or
lazy implementation.

As far as I can  tell, the main
effect is that people can send ISO-8859-x and UTF-8 without
encoding, which is useful but generally not a big deal.

It certainly improves readability of the raw stream.  I agree it
is generally not a big deal for transmission message size, but
beware of falling into the same reasoning that brought us Q-P:
while it improves readability for Latin script with some
decorated characters, it tends to make an even bigger mess of
messages in which substantially every code point is outside the
ASCII range.

BINARYMIME was defined in 2000 to avoid that issue, and
invents a new  BDAT command that uses a byte count rather
than escape sequence, so  the message body can be an
arbitrary sequence of octets.  Gmail,  hotmail and
icloud/me.com support it, Yahoo and AOL don't, but I've 
never seen client software that would take advantage of it.

IIR, much of what motivated BINARYMIME at the time was concern
about non-text messages in which concepts like "line" were
meaningless and people tampering with characters that might
denote line-end (in other contexts) could cause a mess.   Yes,
it also allows arbitrary-length lines in text messages, but the
advantages of that are rarely worth the trouble and risks (see
below).

What we've never seen is a quoted-unprintable encoding, which
is like  QP but intended for binary data.  It could be like
QP without soft  line breaks and \r\n pairs are ignored.
It's an obvious idea so there  must be some reason.  Dave
Crocker or John Klensin or Ned Freed would  know. 

Obvious or not, I don't remember it coming up.  Remembering that
Q-P has very poor readability and encoding density properties
for anything but mostly-ASCII text with a fairly small number of
decorated Latin characters and is _far_ worse than Base64 for
non-text data in which the high-order bit is set in many or most
of the octets, the pain of having to deal with yet another
encoding almost certainly exceeds the small advantages.

On the other hand, if one knows the message is entirely ASCII
and can transmit "binary", there is another compression method
that is fast, cheap, and yields about a 12% improvement with no
dictionary overhead.  That is, of course, to toss the first bit
of every octet, transmitting only the seven payload bits.   I
have vague memories of that being done by some systems
(especially those that natively encoded ASCII as five seven-bit
characters in a 36 bit word) during the mail-over-FTP period,
but don't recall seeing any use with SMTP.  Again, a pain with
not very much gain.

The problem with any of these is what to do if YOU accept 8
bit characters, but you have to send the message to a mail
server which doesn't say it does. Some just pass it on and
hope (which is against the rules AFAICS - e.g. we regularly
receive messages with NULL characters in), other than that you
can recode the message (which risks breaking DKIM etc) or
reject the message (which risks upsetting someone).

And, today rather than in the 90s, any rejection risks getting
tangled up with the backscatter or other spam issues and being
discarded, thereby having the message neither delivered nor the
sender notified of the problem.  
 
That's why we don't support these extensions.  If every server
supported BINARYMIME it wouldn't be a problem, but the
transition period is nasty.

And that is the problem with every SMTP extension that causes a
failure if every subsequent MTA doesn't accept it.  To the
extent that message validity and integrity methods (DKIM and
friends included) take recoding, translation, etc., off the
table in order to solve other problems, we've made the problem
even more difficult.  The transition periods are nasty and
probably only worth it for features that either bring huge
advantages (as others have pointed out, in "normal" SMTP
environments, a bit more compression doesn't) or that are
visible and important to end users (many people believe
non-ASCII addresses fall into that category, but the deployment
rate is not exciting and their importance is still an unproven
hypothesis).

best,
     john

_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp