Firstly I think it should be a separate RFC. We want it to apply to all
messages, not just to rfc-xxxx messages with a Content-type header. The
reason is this: you send an rfc-xxxx message to someone with an old mail
reader/writer. They hit reply. We want the things in the headers that were
in an alternate character set and which get transferred by the reply program
to the new headers to become visible in their original form again when they
get to new mail-readers.
That being the case it might be nice to separate out the common character
set and encoding stuff into a separate RFC: "Character Sets and Encodings
for Internet Mail Messages". So there would be 3 rfcs. Alternatively
Keith's RFC can reference rfc-xxxx for this stuff. But the latter will
be a bit strange because there will be aspects which are not used in rfc-xxxx
(namely the short [single character] names for encodings and character
sets, and the _ in quoted-printable). The one character "names" for
character sets and encodings are an essential feature of Keith's proposal
in interoperating with existing software without embedding the real info
in very large amounts of junk.
(2) Numbering things is fine but I want the ability to name them as well.
The example "numbers" in Keith's proposal should be "small number
[normally one] of letters or digits". We should grab "M" for mnemonic.
(3) I don't like the yet another encoding problem it raises. If we need to
change quoted-printable to align it with the needs of headers, we should
change it NOW. (Note that this in particular requires a change in RFC-XXXX
and not in Keith's proposal.) Is there any problem with replacing the :
with an =?
We also need to add "_" to quoted-printable encoding, with "=_" as an allowed
escape.
Not only should the Q encoding be made the same as quoted-printable,
but the B encoding should be the same as base64. The only difference for
the latter was the removal of "," from B. However this doesn't solve
the whole problem of vertical motion: The correct answer is to prohibit
vertical motion and then the ban on "," follows without requiring a
separate encoding. You can lift text from the mnemonic proposal on this
(or any other) matter: it will save it going to waste.
Finally, Keith's proposal makes [cq ]text look more horrible than is
necessary. I think space should be allowed to stand for itself in
[cq ]text. It isn't hard to search past one word looking for the 4th "?".
With this change it makes sense to allow 7bit encoding as well [code "7",
usable when no "=" or "?" in text]. This change is only necessary
for improved interoperability with existing mail readers, but that is
likely to be an issue for quite a while.
Bob Smart
P.S. RFC-xxxx [or the separate rfc on character sets and encodings] should
define a few terms. In particular we are using "Character set" in a
rather specialized way:
Character set:
As used here a character set is a set of glyphs and a way of representing
a sequence of lines of glyphs from that set as a sequence of octets,
and inversely of interpreting a sequence of octets as a sequence of
lines of glyphs. There doesn't have to be a one-one correspondence between
octets and glyphs: e.g. there can be multiple octets to a glyph or there
can be sequences of octets which only shift the interpretation of the
following octets and don't themselves generate any glyphs. There can
also be (sequences of) octets representing horizontal or vertical motion.
Where you have a set of glyphs with each glyph having a unique number, I
would call that a character table. So ISO-2022 is a character set which
works by using shift sequences to shift between character tables (one
of which is the character table used by us-ascii).
These are just matters of definition, not of debate. It doesn't matter
what words we use as long as we agree on the meaning.