Patrik,
I found your proposal very useful in describing the issues and
problems, many of which it appears to solve. It also eliminates my
objection to solutions proposed for one or two headers, leaving other
headers for "later", for which I am grateful.
A few brief comments on a few aspects of it.
(1) As one or two others have pointed out in the last few days, the one
major issue which I left out of my summary would be, for example,
someone in Sweden or Germany addresses a letter to someone in Russia,
and both wish to spell names in their individual languages and character
sets. This would result in one character set in the phrase of the From
field and a different one in the phrase of the To field.
In a pathological case--possibly involving distribution to people in
both countries--someone might want to include both the German and a
Russian translation in a Subject field.
While I am personally less concerned with the multi-character-set
Subject problem than with the personal names, it seems to me that
neither of these can be dealt with under your proposal without use of
multioctet "universal" character sets or mnemonic (quoted-readable). Is
that your intent?
(2) Just to be sure that we do not misunderstand, when you say...
A) Header-Transfer-Encoding
Allowed Header-Tranfer-Encoding types are:
....
- 8-bit
This would, of course, require agreement (however that is determined)
on an 8-bit transport arrangement.
(3) I think we must be very careful about the potential for 2nd DIS
10646. We could have, I believe, gotten ourselves into serious trouble
in the first part of the year had we followed the advice of several
people that both the general structure of 10646 and the availability of
compaction mode 5 were fixed and would not change.
The notion of "Internet standard approval pending final approval of
ISO Standard" really does not work, since, if many people implement and
deploy a solution, and then the ISO Standard changes drastically (as
will certainly be the case between 1st DIS 10646 and 2nd DIS 10646), we
are faced with having to choose between:
- invalidating and changing a number of existing implementations,
upsetting users and vendors alike and
- ending up with an Internet standard that is based on an obsolete
draft version of an ISO standard and which, as a result, is not
supported for any purpose other than Internet transport and which must
be maintained within the Internet concept.
Neither is attractive.
As a result, I think we must examine very carefully any solutions
that remain only "partial" "until 10646 is approved".
This argument applies with even more force to the AUC proposal. I
think we can all be quite confident that, sooner or later, there will be
an IS 10646, even though I am reluctant to speculate on what might be in
it. But to focus on a proposal for a particular feature or supplement
to 10646 seems to me to be premature, as we should have learned from
compaction method 5.
In particular, you say, under "header type"...
Only one will not be used:
- ISO-10646
Out of date. (We suppose ISO-10646-ACU will be used instead)
Again, I don't think we disagree but, since minor misunderstandings on
this list have periodically led to major explosions, (i) there never has
been an ISO 10646, there has only been First DIS 10646. The latter is
obsolete, the former has not yet existed. See above for remarks on the
AUC (I assume that is what you meant) proposal.
(4) Special (non-ASCII) character sets
Because of the possible introduction of multi-octet character sets,
RFC-822 schould be interpreted like this:
Any reference to a specific character in RFC-822 is
interpreted as a reference to the octet that represents
this character in US-ASCII.
Unless you propose to forever ban any character set that does not have
the glyphs of US-ASCII as an unambiguous subset (e.g., not permit
extensions from the sets you have listed), this language is not quite
sufficient. Such a prohibition might be reasonable, but, if it is what
you intend, the final proposal will need to be specific about it.
There is some language in the revised and commented RFC-ZZZZ, which
will be posted to the other list or tomorrow. It deals with this issue
in the transport context and, I think, in a very general way.
(5) Quoted-readable/Mnemonic
Your proposal says...
Note that we only have to introduce extra quoting because of the character
set only arise if we use Quoted-Readable, ISO-2022 and ISO-10646-AUC.
I don't understand the problem in this situation with Mnemonic.
Indeed, mnemonic, since it is based on the glyphs of 10646 (and is a
superset of them) but not the structure of 10646, is the only proposal
on the table that gives us the properties of a universal character set
(e.g., different languages in different headers but within the context
of your proposal) without anticipating the approval of an International
Standard in a particular form. This, of course, assumes that the Asian
characters can be worked out in a reasonable fashion.
ISO 2022 raises several additional problems, arising primary because
it is not a character set but a collection of separable rules. If we
can avoid providing for it in headers, we can avoid replaying all of the
arguments about it that ran through the list in the (northern
hemisphere) spring.
(6) Header parts...
C) What parts of the header does the new header control
Headers:
Subject:
Comments:
Content-Description:
Summary: (from RFC-1036)
Organization: (from RFC-1036)
I would appreciate a comment from the chair on this, but I think IETF
should be quite reluctant to authorize, or specify treatment for, header
fields for which Internet standard, or at least standards-track,
descriptions, do not exist in a standards-track proposal. An
implementor must know what to implement, and informational and
experimental RFCs are not satisfying in this regard. I would welcome a
standards-track RFC that would specify some of these optional fields,
but that document would be the place to specify their character set
encoding.
...and all user-defined fields in RFC-822.
I don't understand what this is intended to mean. If you are
referring to the X- fields, the comments above apply even more strongly:
I don't know how one can specify things one hasn't seen, and whatever
agreements govern the use of these fields can also include their
interpretation.
(7) The Received line...
Your discussion should be extended to all "trace fields", including
Return-path. Even though Return-path should be inserted into headers
only by the final delivery MTA, they have a tendency to creep into
transported headers and the treatment should be clear.
Otherwise, your analysis exactly parallels the analysis I recently
completed from the transport perspective. Either it is correct or we
are simultaneously very confused.
(8) Remarks on remarks...
At the beginning we had RFC-822. It defined the characters in the headers
to be 0-127. Some parts of the headers included special characters.
No, it didn't. This has been the source of much of the confusion and
criticism of the last few days. It defined the characters in the
headers to be [US] ASCII, ANSI X3.4, not semi-arbitrary patterns of 7
bits.
When we start to use more characters that the 0-127 we have to encode
them in some way. Why not do that in the same, or at least equivalent way,
as the Body of the mail. We therefore saw the possibility to introduce
the Header-Transport-Encoding and the Header-Type.
Strictly speaking, the first sentence above should begin "When we
start to use any characters, or interpretations of character positions,
that are not in [US] ASCII (ANSI X3.4), we have to encode...". In other
words, while no implementation has been able to detect the violation or
enforce the rule (perhaps fortunately), use of a national variation on
ASCII has technically always required some encoding.
--john