ietf-822
[Top] [All Lists]

Re: New-ish idea on non-ascii headers

1991-09-17 21:33:54
Bob Smart writes:

A. Nobody (I hope) is suggesting _any_ change to the rfc822 rules
   for how headers are constructed as sequences of octets. What is
   felt to be required is new rules for how these are displayed as
   glyphs [new rules to apply when there are Content headers].

I don't think this distinction is clear to everyone, and it is worth
repeating.

B. If we don't let the Icelanders put their funny glyphs in the Subject
   lines and other similar places in the headers then they will go off
   and ignore us. Ditto just about everybody. So I don't think you
   can say it is unimportant. Nor do I think the Real headers will
   be regarded as satisfactory.

I agree completely.

C. Stuff in the headers has _NO_ immediate delivery implications.
   It is true that the From (/Reply-to/etc) header is used by mail
   agents to generate return envelope addresses but I fail to see
   anything in the proposals that have been made in this area which
   should have any negative impact on this.

I agree completely with Bob on this too. In proposing that mnemonic encoding be
used in headers, we are in a position to make NO changes to existing header
syntax. No change at all, period. The only, repeat ONLY, change would be to the
semantics applied to headers when they are DISPLAYED. And this change would
only apply when a new (preferably never before used) header line is present. 

I think it is important to reiterate the advantages mnemonic encoding has at
this point. Mnemonic encoding uses only invariant characters from the US-ASCII
character set. As such it is simply a way of specifying glyphs using multiple
characters. This has great advantages over using any extended character set
that assigns meanings to characters outside the confines of US-ASCII -- doing
this means you have to change some syntax, whereas mnemonic requires no syntax
changes at all.

Mnemonic also has the advantage that characters are replaced with entities that
roughly look the same as the original glyph. This is not perfect -- it cannot
possibly be perfect -- but it a hell of a lot nicer looking than what you get
if you look at ISO10646, Unicode, quoted-printable, or base64 directly without
the necessary display hardware. Moreover, Keld has been nice enough to tally up
mnemonic encodings for most of the European characters, and there is some hope
that this can be extended to cover Asian characters as well (yes, the fit is
less satisfactory, but once again in comparison with the alternatives it is
once more a win).

Are there any problems? Of course there are. We have to define, in precise
terms, what parts of what headers this display option applies to. I would
recommend it apply to:

(1) Comments (you know, the stuff enclosed in parentheses -- it can appear
    anywhere in a structured RFC822 header.
(2) Certain headers whose contents are pure, uninterpreted strings of text.
    This includes Subject: and Comments: from RFC822. It might include
    other headers defined elsewhere, but these two are the only two in 
    RFC822.
(3) Some phrases. Specifically, I'd recommend making this apply ONLY to quoted
    strings. (A phrase is defined as a sequence of atoms or quoted strings.) 

There may be some problems with (3) -- there are cases where phrases in RFC822
may be somewhat active (Keywords: is the one I'm thinking of), and as such if a
gateway is converting things there may be trouble if the conversion is not
applied consistently (note, however, that a gateway tasked with converting
something beyond US-ASCII has to do something -- this proposal may fix a
problem rather than create one). However, the number of references to "phrase"
in RFC822 is small and these can be considered on a case-by-case basis if need
be.

If there is a problem I'd really like to understand it. Can we please
have more details of that nightmare that woke you up at 4am :-).

Ditto. I want to hear what objections there are, if any. Keep in mind, however,
that the alternative seems to be to disallow display of such material
completely; I for one don't think this solution is really viable.

                                Ned