ietf-822
[Top] [All Lists]

NEW proposal on encoded-headers

1991-10-16 05:58:30
oh no, not another one.......

well, but as usual, I think that I am the one with the right solution.
Regard this as something to be hashed over at Santa Fe.

                                Harald Tveit Alvestrand


SUGGESTION FOR THE TEXT OF A HEADER-DEFINING RFC

REQUIREMENT:

- That the header fields of an RFC-822 and RFC-XXXX compliant message
which are regarded by the sender and receiver as "human information"
should be able to transport characters outside the USASCII range

- That this ability should NOT be limited to any pre-given set of
fields

- That this ability should NOT break existing mailers

The fields in question include, but are not limited to:

From,To,CC: phrase <user(_at_)domain> - the "phrase" part
From,To,CC: user(_at_)domain (comment) - the "comment" part
Subject: string - the whole string

DESCRIPTION:

- We define a new field, Header-encoding:

- This Header-encoding: field contains three things:
  - A Character-set identifier, from RFC-CHAR
  - An Encoding, from RFC-XXXX
  - A list of fields encoded using this character set encoding,
    possibly with delimiters to identify parts of the field.

- The header may occur one or more times, but the set of headers it
  describes has to be disjunct.

Grammar for the header ([] means "may be omitted")

Header-encoding-description ::= charset, encoding, fieldlist
charset ::= atom
encoding ::= atom
fieldlist ::= fieldname [delimiters] [";" fieldlist]
fieldname ::= atom
delimiters ::= ":" [fromchar] "-" [tochar] [restartchar]
fromchar ::= char
tochar ::= char
restartchar ::= char

The alternatives "dash to end" and "beginning to dash" are ambiguous;
if the "delimiters" :-- occur, they are interpreted as "dash to end"
(just to make it unambiguous)

The purpose of "restartchar" is to catch things like "every
phrase-field in a To: line, but none of the addresses". Normally this
is a comma, which is why the ; is used for separating the fieldlist.


Example header:

Header-encoding: ISO_8859-1, Quoted-Printable, To:-<,; From:(-)
Header-encoding: RFC-MNEM, 7bit, Subject
To: \:AEgrim \:8Fberg <Ogrim(_dot_)Aberg(_at_)sics(_dot_)se>, \:97len Flatmark 
<alen(_at_)sics(_dot_)se>
From: <haavard(_at_)idt(_dot_)unit(_dot_)no> (H\:8Fvard)
Subject: Writing &AE in a header line


ALGORITHM:

When, and ONLY when, *the message is shown to an user who is
interested in seeing the non-USASCII characters*, the following steps
are applied, to ONLY those fields that are listed in the
Header-encoding: header's third part:

1) Remove any quoting done by \ or " according to RFC-822 rules

2) Decode the string, starting from "from-char" (default beginning of
string), stopping at "to-char" (default end of string). If
"restart-char" comes after "to-char", start decoding again.
All non-decoded characters are interpreted as if it was USASCII.

3) Display the string as if it was in the character set given.


DISCUSSION:

The problems that have been identified include:
- Headers that may or may not be in encoded format (example Received:).
  The fact that Received: does not occur in the list of encoded
  headers should solve that.
- Headers may need to be parsed twice to identify the correct charset.
  The fact that this is done only at the interested end-users's
  machine should make that mostly a void argument.
  USASCII-satisfied people who need the speed should turn the feature off.
- Headers may have pseudo-"encoded" values in them (Subject:
  +477597094).
  Again, either Subject: is listed in the encoding header, where the +
  will get turned into ++ or \+ or whatever, or it is not.
- Address fields may be damaged if they are en/decoded, or the
  implementation becomes complex if they are not.
  It is the sender's option to set the format of the address field.
- The problem with wanting to use different charsets in part of the
  same header (To: to a Japanese and a Scandinavian at the same time)
  is not solved.
- If more complex delimiters than single chars are needed, this is not
  solved.

I am prepared to defend making this proposal into an RFC.

  


<Prev in Thread] Current Thread [Next in Thread>