the order in which encodings are applied

This is a request for a change to the mime2 document (unlike some of
my other messages :-).


The Content-Transfer-Encoding (CTE) is specified in such a way that
there is a well-defined order in which to encode, and then to decode,
things.  For example:

    Content-Type: image/gif
    Content-Transfer-Encoding: base64

The order in which this thing is *encoded* is:

    image -> gif -> base64

So it *must* be decoded in the following order:

    base64 -> gif -> image

My suggestion is that the MIME draft be updated to clarify the order
in which CTE, charset and text subtype are encoded and decoded.  If
you have a message like this:

    Content-Type: text/foo; charset=bar
    Content-Transfer-Encoding: blurfl

Then my suggestion is that the *encoding* order be:

    text -> foo -> bar -> blurfl

So you ask "Why should it be in this order?  Why can't you reverse the
order in which text subtype and charset are encoded/decoded?"

The answer is that, otherwise, you would end up with stuff that is
unreadable in current software.  A very good example of this is the
"richtext and iso-2022-jp" problem, often discussed on this list.  If
you take some text, first encode it in iso-2022-jp, and *then* encode
it in richtext, then the bytes that correspond to "<" are converted to
"<lt>", which makes the result unreadable in current iso-2022-jp
software.  (Japanese characters take up 2 bytes each, and converting
"<" to "<lt>" would change the number of bytes from an even number to
an odd number, hence causing garbage to be displayed.)

I'm advocating the order "text -> subtype -> charset -> CTE" because
of the richtext/iso-2022-jp problem.  But I should also point out that
it is even more important to specify *an* order, even if it's not this
particular order.  For example, if we have

    Content-Type: text/enriched; charset=super-10646
    Content-Transfer-Encoding: quoted-printable

and both "enriched" and "super-10646" happen to use "=" as some kind
of quoting character (or other syntax marker), then the order in which
the en- and de-codings are applied is critical.  For example, if
"enriched" uses

    =whatever=

for fixed-width text, and "super-10646" uses

    =xxx

for some of its characters, and quoted-printable uses

    =3D

to represent the "=" itself, then the encodings would be applied like
this (an example):

    This (S) is a Super-10646 character.
                    |
                    | enriched
                    V
    This (S) is a =Super-10646= character.
                    |
                    | super-10646
                    V
    This (=xxx) is a ==Super-10646== character.
                    |
                    | quoted-printable
                    V
    This (=3Dxxx) is a =3D=3DSuper-10646=3D=3D character.


Note that the change I'm requesting is simply a clarification.  We
should make the relation between "charset" and text subtype more
explicit.


Erik