what we're looking for here seems to be complex messages, where a
message contains multiple parts and each part can be encoded using
a different method. some folks want the encoding methods to be
nestable, so that a uuencode'd GIF can be represented directly.
everybody wants this to be more or less compatible with the "------"
method that we use for digests now. most people want to be able to
represent non-textual bodyparts in 8-bit format; some people want to
be able to represent even textual body parts in 8-bit format.
this is a hellacious hairball and we need to start by acknowledging
that everybody can't get what they want. more than that, probably
no single person will get everything they want. so listen up, all:
prepare to be disappointed. prepare for a final solution that is a
rotten compromise that makes your stomach turn.
to that end :-), let me spec a little of this out:
message :: headers blank-line body
headers :: header [...]
body :: line [...]
i'd like to keep the name part of the header (everything to the left of the
colon) spec'd to 7-bit ASCII. i'd like all the reserved words in complex
headers like "received:" to 7-bit ASCII. actual text (like the subject
field or the full-name/comment parts of to/cc/from) should somehow be
allowed to be 8-bit. i don't know what to suggest for the stuff inside of
<brackets> in to/cc/from. the restrictions on domain names (anything to
the right of an @) should be whatever DNS spec's, which is probably 7-bit
ASCII.
without saying anything about body parts or encodings, i'm already into
an 8-bit transport. if a message that has 8-bit data in its Subject: field
needs to be sent over a 7-bit transport, it has to be encoded. this
encoding needs to be something that a user or user-agent can make sense
of, since it may never reach another 8-bit transport and even if it does
i'm not sure it should be decoded back into 8-bit data. we can argue that
one. an encoding like \NNN where NNN is an octal number would suit those
of us in the UNIX(tm) community pretty well but we aren't the whole world
and no doubt a Norwegian recipient would rather not see a \221 where an
accented character would normally appear. we may want to consider a new
encoding enumeration which is painful to generate or tear down but which
has equivilences which are chosen to be readable in their encoded form.
like i said, this can be argued.
the headers will have to have some kind of magic cookie added to them
when a message (headers and/or body) has 8-bit data in it. this cookie
can be munched slightly when/if a transport or user-agent needs to
encode it into the "readable 7-bit" notation i mentioned earlier. i
believe that the presence of this magic cookie (really "a new header")
should tell anyone who cares about the distinction, that this message
conforms to the newer mail RFC's and all else that that may imply.
let's try to agree on a framework with arguable variables, and then
argue about the variables. does anyone think that what i've said above
will work? (it's a restatement of what several other people have said,
so i know that *somebody* thinks it's reasonable). does anyone know a
reason why we cannot or should not start with the above framework?
cheers,
Paul Vixie
DEC Western Research Lab <vixie(_at_)pa(_dot_)dec(_dot_)com>
<paul(_at_)vixie(_dot_)sf(_dot_)ca(_dot_)us>
Palo Alto, California, USA ...!decwrl!vixie ...!vixie!paul