[Top] [All Lists]

Re: On encodings : random thoughts....

1991-05-10 15:52:33
Alain FONTAINE writes:
2- The two main transfer-encodings proposed are of a very different
nature. BASE64 is a scheme where the octet stream is transfer-encoded
into a string of printable characters. One is of course aware that those
characters will themselves be natural-encoded as a binary stream for
transmission, but the whole scheme is designed so that the values used
for transmission dont play any role at all. So a BASE64
transfer-encoding prepared on a machine based on one natural-code will
decode properly on a machine based on another natural-code if a 'normal'
transcoding has been performed (BASE64 is of course one million or more
times better than uuencode since a- it is documented outside source code
b- the characters selected for the transfer-encoded representation have
all chances to be correctly transcoded in any usable mail gateway). To
state it otherwise : each machine only has to be able to recognize the
64 characters while natural-encoded in the local code. With some care
(using character constants, etc), it is possible to write a decoding
program that will work on machines based on different natural-codes, by
simple recompilation (of the transcoded source, of course..).
QUOTED-PRINTABLE is quite a different animal : there is no separation
between the encoded and the encoding values. Entering into the details
of what can happen would certainly bore everyone to death, but I am
ready to do the exercise if anyone does care to read it. The final
conclusion is that, for faithfully decoding QUOTED-PRINTABLE, the
decoder should a- know the natural-code of the machine it is running on
b-know the natural-code used by the transfer-encoder to write the
transfer-encoded message (how) c-perform a reverse transcoding to
recover the transfer-encoded message as written by the transfer-encoder
d-perform the transfer-decoding and e-if the object is readable text,
transcode again into the local natural-code to make readable. Of course,
all this will probably fail anyway if the mail has gone through a
gateway between different natural-codes, for the same reason that make
uuencode fail in the same circumstances (anyone pretending the contrary
does not work in The Real World TM). It seem that QUOTED-PRINTABLE does
also not protect the trailing blanks from the voracious appetite of some

I agree with the analyses:

We have 2 major 8-to-7-bit encoding families:
1. base64
2. quoted-printable (I regard my quoted-readable in this class too).

Base64 is for transferring data binarily without any change, and when
mail is used for anything else than "plain text".
base64 should be the 8-to-7 choice of the ietf-822 list.
I regard the base64 specification as a closed issue.

qouted-printable/quoted-readable is used for transferring "plain text".
It should not be used with anything else but "plain text".
And the objective of it is to be read by humans.
I think there is no real concern about trailing blanks here.
People do not care if there are trailing blanks at the end of the
line when they read their mail. At least I do not notice.

If we accept that we have a  special content-type of "plain mail"
and this is for human-to-human communication, then we should
make a stable four-wheel car that is designed for this purpose.
A car that can go as many places as possible, and be handy in
many environments. The key issue here is character sets.

Alain mentioned all the stages needed to do the quoted-printable
right with the receiving UA. It was a bit complicated, and required
that you had a UA capable of doing this at the receiving site.
To me that sounds more like a top-tuned bicycle than an all-round car.

I will suggest (persistingly :-) that the quoted-readable
notation is more suitable to this job. The requirements on an
intelligent UA is not as high as for quoted-printable,
where you need to know the character set of the sender and how to
convert to the readers character set. This potentionally means that
you need to be able to convert from a myriad of character sets.
With quoted-readable you only need to know how to transform from
a well-defined notation (specified in the RFC) into the reader's
character set. Thus the new UA having this capability
can be cheaper and more stable in its specifications; it does not
need to be updated with new info on how to handle new character sets.
And handling of new characters (defined in an updated RFC) will
probably not be in the readers character set, so that will default
just like any other undisplayable character, and thus come for free.

With quoted-readable you can even manage without a new UA at
the receiving site, as you can guess what "J&o/rn" really stands for.
So on 7-bit downgrading I suggest: use quoted-readable for "plain mail"
and base64 for the rest.


<Prev in Thread] Current Thread [Next in Thread>