This is a request for a change to the mime2 document (unlike some of
my other messages :-).
The Content-Transfer-Encoding (CTE) is specified in such a way that
there is a well-defined order in which to encode, and then to decode,
things. For example:
Content-Type: image/gif
Content-Transfer-Encoding: base64
The order in which this thing is *encoded* is:
image -> gif -> base64
So it *must* be decoded in the following order:
base64 -> gif -> image
My suggestion is that the MIME draft be updated to clarify the order
in which CTE, charset and text subtype are encoded and decoded. If
you have a message like this:
Content-Type: text/foo; charset=bar
Content-Transfer-Encoding: blurfl
Then my suggestion is that the *encoding* order be:
text -> foo -> bar -> blurfl
So you ask "Why should it be in this order? Why can't you reverse the
order in which text subtype and charset are encoded/decoded?"
The answer is that, otherwise, you would end up with stuff that is
unreadable in current software. A very good example of this is the
"richtext and iso-2022-jp" problem, often discussed on this list. If
you take some text, first encode it in iso-2022-jp, and *then* encode
it in richtext, then the bytes that correspond to "<" are converted to
"<lt>", which makes the result unreadable in current iso-2022-jp
software. (Japanese characters take up 2 bytes each, and converting
"<" to "<lt>" would change the number of bytes from an even number to
an odd number, hence causing garbage to be displayed.)
I'm advocating the order "text -> subtype -> charset -> CTE" because
of the richtext/iso-2022-jp problem. But I should also point out that
it is even more important to specify *an* order, even if it's not this
particular order. For example, if we have
Content-Type: text/enriched; charset=super-10646
Content-Transfer-Encoding: quoted-printable
and both "enriched" and "super-10646" happen to use "=" as some kind
of quoting character (or other syntax marker), then the order in which
the en- and de-codings are applied is critical. For example, if
"enriched" uses
=whatever=
for fixed-width text, and "super-10646" uses
=xxx
for some of its characters, and quoted-printable uses
=3D
to represent the "=" itself, then the encodings would be applied like
this (an example):
This (S) is a Super-10646 character.
|
| enriched
V
This (S) is a =Super-10646= character.
|
| super-10646
V
This (=xxx) is a ==Super-10646== character.
|
| quoted-printable
V
This (=3Dxxx) is a =3D=3DSuper-10646=3D=3D character.
Note that the change I'm requesting is simply a clarification. We
should make the relation between "charset" and text subtype more
explicit.
Erik