Re: non-ascii headers


% Now, since I have my flameproof suit on already, let me suggest in very 
% general terms that we might think about patching a kludge onto mnemonic 
% in the hope of making this problem go away.  Note that is "make go away" 
% not "solve"--I think the gist of both Ran's and my comments is that 
% there isn't going to be a neat solution.  We ask Keld to think about an 
% escape convention that would permit, within mnemonic, representing a 
% character by a notational pair consisting of 
%     { character-set-designation, code point }
% The set of "character-set-designation"s would be equivalent to the 
% candidates for "character set" in a separate header or a content-type 
% text subfield or...
        
%   As an even more obnoxious variation, one could provide for 2022-like 
% shifting in and out of this mode, designating the character set as part 
% of the shifting activity.  At that point, by a little more magic and 
% handwaving, we could say "the following types of header fields are in 
% mnemonic when RFC-XXX is in use" and treat quoted-printable as a subset
% of mnemonic.  That really cleans up the inter-header referencing mess. 
        
%   Now, these paired values have no mnemonic significance at all.   Too
% bad. But it is possible to designate *anything* at the simple cost of
% finding it in some standardized or registered character set, or by
% dashing off to ECMA to register another one. 
        
% These are terrible kludges.  Maybe they let us get on with our lives.

  I think that John has done a good job of summarising the problem and
directing focus towards a solution.

  Here are some (hopefully constructive) solution-oriented observations.

  1. Mnemonic encoding is really useful for some language sets.
  2. Some (many ?) Japanese users are commonly using ISO 2022-derived
        schemes already and would like to be able to continue using them.
  3. Mnemonic does not work well for ideogrammatic languages.
  4. BASE64 encoding of some 8/16/32-bit character set is probably
        more useful than mnemonic for ideogrammatic languages.

  So, I suggest that however the extended width (i.e. non-US ASCII)
header text is to be handled, that we devise a mechanism for
permitting at least BASE64 encoding and Mnemonic encoding.  There are
probably a lot of ways that this could be done.

  Perhaps some denotation could be made in one of the header fields 
(it could be its own header, I'm not religious on such matters :-)
as to what encoding scheme has been applied to the headers.  This would
permit both BASE64 and mnemonic and any future wonderful better
approaches that might arrive.  Let the default value be "no encoding"
(i.e. US ASCII plain text) for backwards compatibility.

  This also has the virtue of helping the user to know that something
has been mnemonically encoded even if he's never heard of the RFCs.
If the user has a clue as to what is going on, there is a better
chance for figuring out how to deal with whatever is there.

  Additionally, the Japanese users could also use ISO 2022 if they feel the
need by defining locally some identifier for their use of ISO 2022
mechanisms (without that identifier or mechanism being something that
RFC-XXXX needs to define or address).

  Finally, this mechanism could probably be shared with whatever
mechanism is going to be used to mark the encoding of non-7-bit
headers in mail transported via Extended SMTP.  There has to be
linkage between Extended SMTP and RFC-XXXX in how things are encoded
in the SMTP envelope header fields when anything but 7-bit US ASCII
are used.

  Many (all ?) of these ideas are not new.  The key thing is that I
really want to avoid making "mnemonic" the default or only value used
in any extended header or extended text mail body so that we are
even-handed linguistically towards all users in the mechanisms that we
standardise.

Regards,

  Ran
  atkinson(_at_)itd(_dot_)nrl(_dot_)navy(_dot_)mil