About the mnemonic-in-headers idea as the way to get non-USASCII text
into mail headers. I want to expand on Nathaniel's problem and remind
the audience of another one. Also, Robert Ullmann's suggestion deserves
a comment.
Sure, Nathaniel's example was unquoted, and thus was a straw-man. Too
bad. Perhaps his problem could have been re-phrased something like:
Lots of systems out there will break when exposed to the morass of
quoted characters that this proposal will engender. That's a different
sort of problem, indeed. Some of us try to be careful in our rfc-822
lexical scanners and quoters, and RFC 822 is pretty rigorous in how
correct quoters and scanners will behave, but that doesn't mean that
those rules are followed rigorously. If I can be forgiven the raising
of some hackles (Mark Crispin's, I think) a little, I'll remind folks of
the problems that MM had on CMU Tops-20 systems. We guys on
andrew.cmu.edu started using plus signs in mail addresses (legitimate by
RFC 822), but MM treated such addresses as obviously invalid.
What do you put in From and what goes in Real-From? The From address
can be stripped down to the mailbox name, and the Real-From can be any
user-oriented decoration at all. Thus, I'd express mail from somebody
with mailbox he(_at_)idt(_dot_)unit(_dot_)no and real-name (in mnemonic)
``H&XFard
Eidnes'' as as the combination (forgive my detail misunderstanding)
From: he(_at_)idt(_dot_)unit(_dot_)no
Real-From: US-ASCII/mnemonic: H&XFard Eidnes
instead of
From: H&XFard Eidnes <he(_at_)idt(_dot_)unit(_dot_)no>
A severe additional problem with the mnemonic proposal (besides its
western-centricity) is this. Given that the special header appears that
identifies key text strings as being in mnemonic encoding, what fields
and sub-fields are subject to this encoding (and therefore decoding)?
Clearly, lines like Subject:/Comments: are to be decoded. And the
intent is that the ``mailbox'' RFC 822 type is to receive special
treatment:
mailbox = addr-spec ; simple address
/ phrase route-addr ; name & addr-spec
I'd guess that the ``phrase'' should be decoded, but in addr-spec, the
``local-part'' should be left alone. How about comments? Are they to
be decoded in From: lines? How about in other lines, like Received: or
Date:? How about all the extension fields--are they to be decoded? How
about the Received: header: its optional ``for'' clause contains an
addr-spec. Is that to be decoded? What about the message massagers
along the way that add Received: lines and possibly manipulate other
headers? What should they do about interpreting encoded text, or making
sure that the text they generate is encoded if that's indicated?
Now, I can make guesses about these questions as well as the next guy,
but implementors can't be allowed to guess. The problem must admit
these decisions as part of its specification, and I'm not convinced that
the problem will admit any such solution.
About Robert Ullmann's observation, which I think is that 8-bit
characters can be encoded via RFC 822 simply by prefixing them with the
appropriate quote character. My copy of RFC 822 says that the
characters that can appear in header fields must fundamentally be
subsets of its CHAR class, which is listed as:
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
As far as I can tell, this excludes the possibility of putting 8-bit
characters in verbatim. The 7-bit restriction isn't just an SMTP issue;
rather, I have to read this line as applying it to the headers (and
bodies) of RFC 822 messages. Either Mr. Ullmann made a mistake here, or
this is another provocative and contentions assertion that the 7-bit
issues can go away just by pretending that they don't exist, and
declaring all non-8-bit mail handlers broken.
Craig