ietf-822
[Top] [All Lists]

16/32-bit charsets and MIME-encoding

1993-02-11 01:06:35

There is a lot of talk about ISO 10646 and UTF-encodings, but what we
really need to talk about is how to use 16/32-bit character sets in MIME
and how to encode them in 7-bits. With the expected character coding to use
in the future in MIME to be ISO 10646 instead of ISO 8859-*, ascii or others
as character coding in message transfer. Locally at a site you may use
what ever character coding you like, including UTF encodings. This will
simplify matters so you only have to handle ISO 106464 <-> local coding
instead of "many character codings" <-> local coding.

Note: A well defined ISO 10646 encoding allows ascii mail to be nearly
unchanged.

What we need is then to define an encoding suitable to encode 16/32-bit codes.

Erik van der Poel some time ago defined a Base64 like encoding for 16-bit codes.
Something in that way is what is needed. Though the encoding should probably
have several escape codes to reduce overhead. By encoding 6 bits per encoding
character we could encode as follows (x means a base64 encoding 6 bits):
  =xx           (encode 12 bits)
  ?xxx          (encode 18 bits)
  etc.


For declaring the ISO 10646 in MIME we could use UCS as character code name.
But also we may have to include "level" as Unicode what to use what
probably will be level 3 in the IS allowing any combination of combining
characters. It would be better if we could restrict us to only allow
combining characters when no code exists for the combined character
(probably level 2 in the IS) as this would simplify UCS <-> local coding.


When we have 8-bit and binary transfer of bodies MIME/SMTP need to talk about
an binary encoding that can be space efficient (like some version of UTF), but
not now. I suggest to the beginning UTF and other binary ISO 10646 encodings
are used only locally at a site.

    Dan

<Prev in Thread] Current Thread [Next in Thread>