ietf-822
[Top] [All Lists]

Re: new-ish idea on non-ascii headers

1991-09-21 17:43:52
John C Klensin writes:

While the language of RFC822 somewhat obscures it, the intent of the 
     phrase <mailbox(_at_)domain>
syntax was, at least in part, to permit routing, by means other than 
RFC821/822, to multiple distinct individuals who share the same mailbox.  
This is in spite of the fact that some UAs (including the one I'm using 
:-( ) can't get this syntax right.

I've never heard this before. If it is true, I think the people who came up
with this idea should get a stern talking-to for being so silly. This idea is
completely unworkable in practice.

Routing and delivery of a given message instance should depend totally on the
message envelope. The message header has nothing whatsoever to do with it.
There's nothing that says an address in the envelope has to appear in the
header. It often does not in practice. There's also nothing to prevent an
address in the envelope from appearing 10,000 times in the header, each time
associated with a different phrase.

The intent of the standards are clear in this regard, I believe. Messages
are delivered to the recipients listed in the message envelope. That's it.
Since the RFC821 envelope cannot contain phrases they have no impact on
delivery. This principle is reiterated in X.400, incidentally, and any
hope of X.400 interoperability depends totally on this being true.

I realize that some nonstandard mail systems use corrupted versions of RFC822,
and in these systems phrases (or their equivalent) are active chunks of
information (CC:Mail is a good example here). This sort of aberrant behavior is
not a topic this group need concern itself with, however.

While no resolution emerged that was clear enough to justify a
SHOULD/MUST statement, there was discussion in HR about whether some
sending agent that was cleaning things up to avoid multiple delivery 
could/should construe 
     a <mailbox(_at_)domain>
and 
     b <mailbox(_at_)domain>
as equivalent, i.e., if one or the other could be deleted and which one.

This is a non-issue unless the routing protocol you're using allows this form
of address in the envelope. RFC821 does not. (We're also not talking about
RFC821 extensions here, so even if it did it would not be relevant to this
discussion.) Delivery, therefore, cannot be dependent on this form of addresses
if things are being done according to the standards.

I see no reason for HR to have dealt with this -- the intent of the original
standards is completely clear. HR did make the phrase optional in front of a
route, but as far as I can tell that's the only reference to this issue in the
actual HR that was produced. And frankly, stuff that the authors debated is not
of interest -- if it didn't make it into the document there's no reason we
should concern ourselves with it.

The only time that headers impinge on delivery issues is when a message is
replied to and the header addresses are used to construct the headers and
envelope of the reply. But when this is done the preceeding phrase attached
to the address is lost (again assuming RFC821) so how could it possibly have
any delivery implications?

A properly constructed UA will provide facilities for flexible selection of
addresses from another message's header as well as automatic duplicate
elimination. While relatively few UAs provide reasonable services of this
kind (in my opinion), there is no intrinsic implementation problem here.

The relationship of this problem to, e.g., the "Real-XXX" proposal is 
that one cannot treat this phrase subfield, as distinct from "(...)"
comment fields, as information-free for addressing purposes.  Some clear 
convention is needed, either that the phrase in the RFC-XXXX address is 
the critical one and the corresponding Real-XXXX header becomes "correct 
spelling if anyone is interested" or that removes the phrase and points 
to "Real-XXX".

I disagree. I think that in practice you have to treat it is information free
for delivery purposes. Now, there's nothing to prevent a user from using this
information (for example, if you put an offensive phrase in there, a recipient
may ignore your message or flame you, or whatever). Recipients may in fact want
to use the phrase as a key to perform special actions on messages. But as far
as addressing and delivery goes any attempt to use the phrase leads directly to
terrible, terrible trouble. (I could repeat Dave Crocker's comments about the
problem of correlating envelope addresses with header information here but I
won't bother.)

For the latter, one might require
   From: "*" <user(_at_)domain>
   Real-from: ...
or something similar, rather that permitting arbitrary text in the 
"From:"-phrase.

As long as the phrase is syntactically correct, it can be anything you want.
This is allowed by the current specification -- it is what the standards call
for and I think declaring it illegal now is the same thing in principle as
declaring 7-bit-only systems broken. While I don't generally condone the
use of fringe facilities in the standards, the use of quoting in comments
and phrases is NOT a fringe facility. It is a fundamental facility and
vitally important if you want to have usable software.

I think it is important to not lose track of this particular bit of 
interoperability in the process of solving broader problems.

I think it is VITAL that we do lose track of this notion as soon as possible.
If it takes writing additional prose that clarifies that ignoring this text as
a source of delivery information is MANDATORY, I will be happy to write it.

I want to add my support to those who are distrustful of the ability of 
systems to properly handle RFC-822 quoting conventions when those are 
used intensely.  That is an argument against "just use mnemonic or 
quoted-printable in the primary headers".  Please keep in mind that we 
have mail systems around that will bounce mail because they don't like 
certain headers, even headers that are none of their business.  These 
problems are also the sources of some elegant and user-friendly error 
messages of the "Unknown error N" persuasion.

If you cannot handle RFC822 quoting convention your mailer is BROKEN. The use
of quoted printable and/or mnemonic does not change this. These mechanisms
only, repeat ONLY, change the way that you are supposed to DISPLAY the contents
of these headers.

Note that this is completely on a different level from mailers that wrap lines
or otherwise engage in activities that are somewhat dubious. You won't find
anything in the standards that explicitly prohibits this sort of thing
(especially when you're talking about a UA or a gateway, where the lack of
tight specifications on these things is intentional). These dubious activities
violate the spirit of the specifications somewhat, but that's all. By contrast,
the inability to deal with basic parsing issues is not something I have any
tolerance for. This is spelled out explicitly in the standards and conformant
software must conform.

However, for the moment let's posit your distrust, and ask if the use of
quoted-printable or mnemonic are going to increase the use of what you call
"intense" RFC822 quoting conventions, especially the ones that tend to break
mailers. In order to assess this you have to examine the representations a bit
more closely than we have done previously. So let's do this now. First of all,
what characters are considered special in an RFC822 quoted string or comment?
The list is pretty short -- the characters are (, ), ", and \.

Right away we find out something interesting. \ is not used by mnemonic at all!
In fact, mnemonic provides an alternate representation for this troublesome
character. This is actually very, very nice -- in my experience this is the
character that broken parsers mess up on most.

" is used to represent itself. In addition it is used to represent a double
acute accent. This is not one of the more common accent characters, in my
experience. I suspect that in most languages the use of a double quote is more
common in practice than is the use of this particular accent. As such, I don't
see that this is going to cause any significant increase in the use of
"intense" quoting.

Now, the ( and ) characters are used fairly heavily. But these characters are
not exceptional except in comments -- they have no special status in quoted
string, which is the case you were most worried about here, I think. (Note that
comments can in all cases be removed from headers as a preprocessing step, and
many mailers do this. As such, there are fewer problems in dealing with them
than there are in dealing with quoted strings.)

On balance, my opinion is that the use of mnemonic will not increase the
use of "intense" RFC822 quoting conventions significantly. In fact, it
may reduce the use of the more troubling ones.

Quoted-printable is even more interesting. The only special character that is
of significance in quoted-printable is a colon. This character has no special
status in either quoted strings or comments. As such, quoted-printable can be
used to ELIMINATE the need to use any of the fancier quoting facilities in
RFC822. I then make the claim that when dealing with broken parsers
quoted-printable could be a major WIN, rather than a problem.

   That said, the notion of a lot of headers that have subtle 
interactions and that must all be considered together doesn't appeal to 
me very much either.  In the present RFC822, if I find a From field, it 
is plausible and rational for me to assume that it contains the "From" 
information.  A proposal that says, "but, hey, you also have to scan all 
of the other headers looking for Real-From in case there is 
supplemental information there" is scary.  If nothing else, it feels 
subjectively as if it is one of those slippery slope problems, e.g., we 
might want to evolve to
   From: ??? <user(_at_)domain>
   Real-From-phrase: @#$%^&
   Read-From-Address: /ADMD=Internet/SU=User/...

This is the problem I've had with this duplicate header business from
the beginning.

It does have its attractions, but...
  And that is an argument against the "Real-XXX" strategy.  So I agree 
with what I understand Nathaniel to be saying: there may be no really 
clean solution here.

I disagree. I think the use of mnemonic is clean, simple, and elegant.

If we are going to do Real-From, Real-to, etc., we also need 
Real-Resent- and all of those, and maybe Real-Forwarded-... and all of 
*those*.  Also Real-reply-to, which I don't remember seeing listed.
  The other concern is that there are a number of commonly-used, but not 
RFC-822-specified, headers that one might want to represent using this 
model.  For example, would one expect to want to see:
   Organization: some ASCII-ized spelling
   Real-organization: organization name in mnemonic or quoted-printable.

You bet. This is yet another can of worms that I don't want to open.

The other arguments against variations on "put mnemonic into the phrases 
themselves", supplemented by a "Header-charset" header, are that unless 
ordering is imposed, the headers may need to be scanned twice and we 
violate the "try to not dump garbage on the user" principle for 822 (but 
not XXXX-conforming) mail receiving/reading systems.

Nope. This is precisely the argument for using mnemonic encoding rather than
quoted-printable. The nice thing about mnemonic is that even without a fancy UA
and double scanning the contents of these strings is NOT garbage. Rather, it is
text that is as close as you can get within the confines of US-ASCII (actually
a subset of US-ASCII). You really should take a look at mnemonic if you haven't
done so already -- I can usually figure out the meaning of the "chords" it uses
without looking at the document. Keld did a fantastic job in this regard.

Real- headers are the thing that dumps garbage on the user, not the use of
mnemonic in phrases. As such I think that they are true to the principles of
RFC822, whereas Real- and/or the use of quoted-printable is definitely not.

Thus, even if I accept your views on these matters as valid (which I do not) I
find upon further analysis that quoted-printable satisifies the need you see
for avoidance of fancy RFC822 quoting better than normal text does now, while 
mnemonic satisfies the need you see for avoiding "garbage dump". Regardless
of the direction you jump, I think the one sure way you lose is with the
Real- headers or with the status quo.

                                Ned