ietf-822
[Top] [All Lists]

printable multibyte encodings

1992-12-16 09:00:17
[Apologies if this has been discussed already; I should probably
read through the archives.  Also, apologies if you somehow see
this twice, since the first attempt disappeared into the ether...]

There's been quite a bit of discussion about incorporating
multibyte characters (Unicode, ISO-10606) in messages.  So far,
most of the suggestions I've seen have been along the lines of
UTF-1, or FSS-UTF (UTF-2?), or straight 2-byte (assuming an 8-bit
binary clean path).  All of these require eighth bit characters,
and would require further encoding (MIME quoted-printable or
base64) to pass over a 7-bit link.

Does anyone think it would be worthwhile to define single-step
encodings of multibyte characters down to 7-bit printable
characters?  Two simple extensions to MIME quoted-printable
spring to mind:

     1. The '#' character is also special, and the sequence #ABCD
        represents the single 16-bit character with value ABCD hex.

     2. A doubled equals sign introduces a 16-bit character, so
        ==ABCD would represent the single 16-bit character with
        value ABCD hex.

Method 1 allows 5- rather than 6-character encoding, and thus a
bit less waste, but introduces a new special character (literal
#'s would have to be encoded as =23).

It would also be straightforward to define a base64-16
content-transfer-encoding, in which eight base64 characters
would represent three 16-bit characters.

I grant that it's undesirable to describe a multiplicity of
encodings when fewer (or one) will do.  However, something
bothers me about using a multibyte encoding (i.e. one of the UTF
variants) which assumes the reliability of the eighth bit and so
which will almost always have to be turned right around and fed
through quoted-printable or base64.  (In particular, I think
we're going to have to acknowledge the possibility of multiple,
cascaded encodings specified in the Content-transfer-encoding:
header.)

I have more thoughts on this issue, but I think I'll wait for a
round of people's comments first.  (You do not need to write just
to let me know about the ongoing work with ISO-2022-JP; I'm well
aware of it.)

                                        Steve Summit
                                        scs(_at_)adam(_dot_)mit(_dot_)edu

<Prev in Thread] Current Thread [Next in Thread>