ietf-822
[Top] [All Lists]

Non-ASCII hdrs: Encoded-Variables Redux

1991-10-17 12:33:14
As the debate over non-ASCII headers has raged on, I find myself more
and more drawn back to one of my earliest proposals in this are, the
"encoded-variable" approach.  After reading Harald's latest proposal
(which is thoughtful and well-intentioned but still has problems, most
notably the inability to mix character sets on a header) and after
getting only one negative response to my earlier question ("was
encoded-variable such a bad idea?") I've decided to dust off this old
bird and see if it flies.

What follows is from the April draft of RFC-XXXX.  It's worth noting
that this is the most general solution on the table.  It permits headers
to contain not merely multiple character sets, but other stuff as well
-- your subject could even be a multipart object that includes text and
a picture, for example.  If this generality frightens people (it
actually appeals to me), we could always back off to a requirement that
the only valid content-type in encoded-variables is "text."

I also happen to think that this is more elegant than the other
solutions on the table, but then elegance isn't really the goal here
anyway...

I know you're all busy reading the new draft of RFC-XXXX, but those of
you who are most impassioned about non-ASCII headers might want to read
this as well.  Enjoy.   -- Nathaniel
----------------------------------------------------------------
The Encoded-Variable Header Field

A particularly thorny problem, not addressed by the Content-Encoding
header field specified earlier in this memo, is the problem of including
data other than MAILASCII in a message header.  

It is tempting, to many, to simply declare that such inclusion is too
problematic, and that message headers should always be entirely
MAILASCII.  After all, most of the information in the header is not
intended for human consumption anyway.  However, there are certain parts
of the header that are intended entirely for human viewing, and these
are the parts where MAILASCII is deemed most unsatisfactory.  In
particular, there is widespread desire to have the contents of the
Subject field and the names of message senders and recipients appear in
languages that cannot be represented in MAILASCII.

The heart of the problem is the fact that RFC822 prescribes a great deal
of syntax and semantics for the message header area, all of it based on
MAILASCII.  Tampering with this, it would seem, could introduce a great
deal of complexity, as well as bugs involving backward compatibility.

Instead, this memo proposes a mechanism by which the header area remains
entirely MAILASCII, but encodes non-MAILASCII information in a manner
from which it can easily be restored by conforming user agents.

The basic idea is that, in certain parts of the headers which are never
machine-interpreted, the human-readable data might best be represented
in a content-type other than MAILASCII.  In such cases, the data are to
be represented, in the header field, by a "variable reference" -- a
placeholder for a value defined elsewhere in the message header area. 
The variables are defined by one or more "Encoded-Variable" headers,
with a syntax as specified below.

Thus, for example, if a user's name includes characters that cannot be
represented in MAILASCII,  it can be replaced by the name of a variable
that is defined elsewhere.  To improve readability by UA's that only
handle MAILASCII, it is recommended that the variable name itself be as
close an approximation as possible to the correct name.  Thus, for
example, one might have;

From: $Keld_JXrn_Simonsen <keld(_at_)dkuug(_dot_)dk>
Encoded-Variable: Keld_JXrn_Simonsen = quoted-printable, iso646, 
        Keld_J&0Crn_Simonsen

*** NOTE:  It would be nice to get the character set & hex code right
for the above example.

Where multiple variables need to be defined, multiple Encoded-Variable
header fields may be used.

It is important to constrain the use of encoded-variables to places
where they will not interfere with the established syntax or semantics
of header fields.  For that reason, their use is explicitly restricted
to the Subject and Comments header fields, and to the "phrase" portion
of RFC 822 addresses.  This implies a small redefinition of RFC 822's
"optional-field", "mailbox", and "group" syntax:

optional-field =
                 /  "Message-ID"        ":"   msg-id
                 /  "Resent-Message-ID" ":"   msg-id
                 /  "In-Reply-To"       ":"  *(phrase / msg-id)
                 /  "References"        ":"  *(phrase / msg-id)
                 /  "Keywords"          ":"  #phrase
                 /  "Subject"           ":"  var-text
                 /  "Comments"          ":"  var-text
                 /  "Encrypted"         ":" 1#2word
                 /  extension-field           ; To be defined
                 /  user-defined-field        ; May be pre-empted

mailbox     =  addr-spec                    ; simple address
                 /  var-phrase route-addr   ; name & addr-spec

group       =  var-phrase ":" [#mailbox] ";"

The two new syntactic entities, "var-text" and "var-phrase", are defined
as follows:

var-text =  *text / var-ref

var-phrase =  phrase / var-ref

var-ref =  "$" var-name

var-name = atom

NOTE that the definition of "atom" permits underscores, but not spaces
or any other "specials" as defined by RFC 822.  Note also that this does
not actually change the legal syntax defined by RFC 822, because a
"var-ref" is itself a valid instance of "phrase" or "*text".  Thus, no
correct existing parsers should be broken by the new definitions. 
However, the old parsers will not recognize a difference between a
var-ref and any other instance of *text or phrase, and will therefore
not do any variable substitution.

The syntax of the Encoded-Variable field is defined as follows:

Encoded-variable = var-name "=" Content-Encoding 
                   "," Content-Type "," var-contents

var-contents = *text

Here the var-contents is the encoded value of the variable, of a type
given by Content-Type and encoded with the encoding given in
Content-Encoding.  Both a Content-Type and a Content-Encoding are
required for each Encoded-Variable header field.

<Prev in Thread] Current Thread [Next in Thread>