Re: new-ish idea on non-ascii headers

Ned Freed writes, in response to my attempt to do a little and 
analysis...

I've never heard this before. If it is true, I think the people who came up
with this idea should get a stern talking-to for being so silly. This idea is
completely unworkable in practice.

Routing and delivery of a given message instance should depend totally on the
message envelope. The message header has nothing whatsoever to do with it.
There's nothing that says an address in the envelope has to appear in the
header. It often does not in practice. There's also nothing to prevent an
...


I think I was not clear enough here.  I don't disagree about any of 
this.  At the level of "delivery" we are normally talking about when we 
are discussing "mail handling systems", of any flavor, systems deliver 
to mailboxes, mailboxes are specified in envelopes (and, secondarily, in 
headers), and that is the end of the story.

But there can be systems that touch mail post-mailbox-delivery.  The 
only point is that we should not deliberately write software that 
interferes with their operation.  If it helps to write the words 
"not-mail" in front of systems in that sentence, I have no problems with 
it.

This principle is reiterated in X.400, incidentally, and any
hope of X.400 interoperability depends totally on this being true.

  I don't want to debate this issue, but it is worth noting that the 
X.400 address itself contains provision for identifying the concepts 
that we would call "mailbox" and the concepts we might call "extended 
personal name" separately.  So one could reverse this argument by 
claiming that X.400 responded to our being vague about this issue and 
its semantics by providing for incorporating that information into the 
address itself, resulting in both having the information retained in 
canonical form *and* in "reiterating the principle".
  It is interesting to note that at least one significant commercial 
vendor of Internet-to-X.400 gateway services picks up the "phrase" from 
an Internet address being routed into its system and uses it as 
"surname".  They apparently do not provide any other way of specifying 
"surname" in the syntax they use when translating X.400 addresses back 
into the Internet.  And, for most X.400 systems, surname is pretty 
important, not miscellaneous noise.
  Now I think they are doing bad stuff here, and are seriously broken.  
They have been flamed, lavishly and with extensive technical discussion, 
in private.  They have not felt it necessary to respond to those 
comments.  But it is precisely this type of behavior that started me 
thinking, again (it is a cyclic disease with me), that we need to take 
address phrases a little more seriously.
   And it *is* a topic with which this group should concern itself with, 
if only because nothing in the protocols justifies discarding that 
information or otherwise trashing it.

And frankly, stuff that the authors debated is not
of interest -- if it didn't make it into the document there's no reason we
should concern ourselves with it.

  The HR discussion was raised only to indicate that this is not a new
issue that I'm raising out of sheer perversity.  A lot of things were
left out of the documents because they would have required breaking new
ground (which the WG chair wisely (IMHO) kept that group out of) or
because no clear agreement could be reached about what needed to be done
(or, in fairness, whether anything needed to be done).  But, in many of
the latter cases, there was a clear hope that, when WGs came along to
deal with the embedding context, they would take up the issues and deal
with them. 

That raises two issues, on which I'd appreciate comments from the Chair. 
The first is one of agenda: if the goal of this group is "finish 
RFC-XXXX", then I favor, strongly, clearing as much of everything else 
away as possible--presumably including trying to fold in PEM and sender 
authentication, a lot of essentially transport issues, etc.--adopting
only a "try to do no harm" guideline.  If it is "fix 822", then all of
these complex issues are on the table and won't go away.  Personally,
I'd prefer to see RFC-XXXX in my lifetime, and that argues for the first
approach. 

  The second issue is whether, given that we are trying to extend 822 
and not replace it, whether is a reasonable to argue for a doctrine of 
being very conservative about the assumptions we make about what is 
going on in practice.  Doing so will tend to preserve interoperability 
with systems that are now doing things that are marginally within the 
specifications but in the "strange", "bizarre", "bad idea", or 
"stretched interpretation" categories.  I'm not talking about things 
that Ned and I (and any other right-thinking person :-) ) would conclude 
are clearly banned, either in RFC-821/822 or in HR, but about things 
that out past where the specifications stop.
  Take my brain-damaged Internet-PrivateSystem-X.400 gateway situation.  
The specifications in RFC-821/822 stop with mail transport and mail 
delivery to a mailbox.  As I said, Ned and I don't disagree about that.  
But, in this particular case, what that means is:
    S>  RCPT TO: 
<funny-string-with-more-%-signs%mumble(_at_)gateway(_dot_)domain>
    R<  250 OK I know what to do with that.
    ...
    To: Smith <funny-string-with-%-signs%mumble(_at_)gateway(_dot_)domain>
This is accepted for delivery, and the gateway host effectively delivers 
it to a mailbox with semantics specified by the local-part.  That is as 
far as we [can] specify.  Now the process that starts rewriting this for 
forwarding (an explicitly invoked gateway, with all of the authority 
that goes with that) might check the header address for consistency (as 
it defines that) against the envelope address, but we can't even require 
that it do that as long as we can't detect the symptoms of certain 
misbehavior from "the outside".  It then builds an X.400 address from 
the header address, and, if that process sucks up the "phrase" and gives 
it specific X.400 definition, well... I think it is a really stupid 
idea, but I don't think it is prohibited unless we suddenly and 
retroactively adopt an "anything not explicitly permitted is prohibited" 
doctrine.  This is simply beyond the scope, not only of what 821/822 
specified, but of the areas in which they presumed to specify.
   Note that, if we adopt such a doctrine, it probably kills RFC-XXXX, 
so let's not :-).
   Also note that this argument doesn't overlap into the "just send 
8bit" stories: RFC821 and 822 are very clear on those issues.
   Similarly, if we know that certain features cause lots of 
interoperability problems already, is it permitted to argue that we 
should avoid stressing those features further, or is that argument 
prohibited on the grounds that the deviant and inadequate systems should 
be fixed?  Note that it is plausible to argue that precisely those 
features that have caused problems in the past, especially if they are 
marginal, should be used *more* because that is the best way to get them 
fixed in all cases.

What I'm looking for here is a ruling/suggestion on whether these are 
legitimate areas of inquiry, or whether they should be dropped as not 
part of the WG mandate.  Mr Chairman?

-------------------

I disagree. I think the use of mnemonic is clean, simple, and elegant.

  With the understanding that I like mnemonic, and have always like 
mnemonic, and will probably continue to like mnemonic...
  Every time mnemonic comes up, one of two objections comes up (the 
second more in the earlier days, and the first more lately):
   (i)  Mnemonic is much better adapted to the character sets that 
reflect languages that have a relatively small repertiore of alphabetic 
or phonetic characters than it is to languages with, e.g., potentially 
unbounded collections of ideographic characters.
   (ii) However mnemonic is expanded, and no matter what character 
collections are registered, there will always be "one more character 
set" that it does not accomodate today, even if it might accomodate it 
tomorrow.  Unless we are going to tell people to not use those character 
sets (tempting, indeed), we will always need an escape mechanism that 
depends on a pairing of character set identification and recoding of the 
bit patterns (e.g., quoted-printable) to supplement a system that 
depends on a glyph registry (e.g., mnemonic).

Perhaps what we need is a little structuring, internal to the "phrase" 
that permits either mnemonic or quoted-printable to be used as 
appropriate.  I think the idea stinks, but the alternatives may all be 
worse.   Maybe that is the two sentence summary of my long note.

You really should take a look at mnemonic if you haven't
done so already -- I can usually figure out the meaning of the "chords" it uses
without looking at the document. Keld did a fantastic job in this regard.

   I have.  Several times.  And I agree about the fantastic job.  But, 
being of limited imagination, my ability to figure out "chords" seems to 
decrease as the underlying character set has more and more glyphs that 
don't have obvious analogies in the writing systems derived from Greek, 
Latin, and maybe North Semitic.  I wouldn't have expected otherwise.

Regardless
of the direction you jump, I think the one sure way you lose is with the
Real- headers or with the status quo.

   I think it agree with this analysis.  The only other case that is 
plausibly worth considering would be the "complete alternate form" 
headers, which are not part of the "hunt for Real- headers" problem.
  -john