Comments on draft-resnick-2822upd-02.txt


My apologies for not making these comments sooner, but I have now read
carefully through the whole draft and have encountered various nits and
niggles. Also various things I do not understand (to which I have raised
questions) and things which seem a little odd and might be worth thinking
further about. Also some specific suggestions for improvement (some off
which, I appreciate, may be hard to incorporate at this late stage). And
also some plain bugs, which of course must be fixed.

                       Internet Message Format
                       draft-resnick-2822upd-02

Abstract

  This document specifies a syntax for text messages that are sent
  between computer users, within the framework of "electronic mail"
  messages.  This specification is a revision Request For Comments

                                               ^
                                               of

  (RFC) 2822, which itself superseded Request For Comments (RFC) 822,

1.  Introduction

1.1.  Scope

  This document specifies a syntax only for text messages.  In
  particular, it makes no provision for the transmission of images,
  audio, or other sorts of structured data in electronic mail messages.
  There are several extensions published, such as the MIME document
  series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
  for the transmission of such data through electronic mail,........


No mention of RFC 2047, or of RFC 2231?


     Note: This specification is not intended to dictate the internal
     formats used by sites, the specific message system features that
     they are expected to support, or any of the characteristics of
     user interface programs that create or read messages.  In
     addition, this document does not specify an encoding of the
     characters for either transport or storage; that is, it does not
     specify the number of bits used or how those bits are specifically
     transferred over the wire or stored on disk.


That last sentence seems wrong/confusing, given that the document specifies
everything in terms of US-ASCII, presumably with the intent that US-ASCII
should be the normal means of interchange over the 'wire' between agents (in
the absence of explicit agreement otherwise).

1.2.  Notational conventions

1.2.3.  Structure of this document

  Section 4 of this document specifies an "obsolete" syntax.  There are
  references in section 3 to these obsolete syntactic elements.  The
  rules of the obsolete syntax are elements that have appeared in
  earlier revisions of this specification or have previously been
  widely used in Internet messages.  As such, these elements MUST be
  interpreted by parsers of messages in order to be conformant to this
  specification.  However, since items in this syntax have been
  determined to be non-interoperable or to cause significant problems
  for recipients of messages, they MUST NOT be generated by creators of
  conformant messages.


Can we be clear about the _intent_ of this obs-syntax?

Is the intent to be able to read/display/print ancient messages which
people still have on file? In which case, please can we say that there is
no longer any expectation that obs messages can still be transmitted and
delivered (by RFC2821 or otherwise), and hence only MUAs (but not MTAs)
are REQUIRED to accept them.

Or, alternatively, is the intent that some ancient software still
generates messages using the obs-syntax, and hence MTAs MUST still accept
them? In which case, for how much longer?

Moreover, does it make sense to distinguish between obs constructs that
were theoretically permitted by RFC822 but in practice never seen in the
wild, as opposed to those which actually saw significant usage at one
time? IOW what possibilities exist for the eventual removal of at least
some of the weirder obs constructs?


2.  Lexical Analysis of Messages

2.1.  General Description


     Note: This document specifies that messages are made up of
     characters in the US-ASCII range of 1 through 127.  There are
     other documents, specifically the MIME document series ([RFC2045],
     [RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]), that
     extend this specification to allow for values outside of that
     range. ...


It is likely that the extensions from the EAI WG will shortly be added to
that list. Any possibility of a mention now?

2.1.1.  Line Length Limits

  There are two limits that this specification places on the number of
  characters in a line.  Each line of characters MUST be no more than
  998 characters, and SHOULD be no more than 78 characters, excluding
  the CRLF.


Can we de-emphasise that SHOULD, and make it clear that this is a matter
of good practice (in the sense of BCP) rather than a normative feature?
Perhaps s/SHOULD/should/? Too many agents have used this as an excuse to
rewrite lines en route (maybe there should be a SHOULD NOT for that).

Perhaps also some informative reference to 'format=flowed' might be in
order.


  The more conservative 78 character recommendation is to accommodate
  the many implementations of user interfaces that display these
  messages which may truncate, or disastrously wrap, the display of
  more than 78 characters per line, in spite of the fact that such
  implementations are non-conformant to the intent of this
  specification (and that of [I-D.klensin-rfc2821bis] if they actually


Where did that '78' come from? I am aware of lots of systems that do
horrid things such as you mention if there are 80 characters in a line,
but I am aware of none where problems arise with exactly 79. In other fora
where I have seen this discussed, the consensus was that exceeding '79'
was the signal for troubles to start.


2.2.  Header Fields

2.2.3.  Long Header Fields

  The process of moving from this folded multiple-line representation
  of a header field to its single line representation is called
  "unfolding".  Unfolding is accomplished by simply removing any CRLF
  that is immediately followed by WSP.  Each header field should be
  treated in its unfolded form for further syntactic and semantic

                                             ^^^^^^^^^

  evaluation.


'Semantic' yes, but why is that 'syntactic' there?

2.3.  Body

  o  Lines of characters in the body MUST be limited to 998 characters,
     and SHOULD be limited to 78 characters, excluding the CRLF.


Again, that 'SHOULD' is a matter of BCP.

3.  Syntax


3.2.  Lexical Tokens

     Note: Readers of this specification need to pay special attention
     to how these lexical tokens are used in both the lower-level and
     higher-level syntax later in the document.  Particularly, the
     white space tokens and the comment tokens defined in section 3.2.3
     get used in the lower-level tokens defined here, and those lower-
     level tokens are in turn used as parts of the higher-level tokens
     defined later.  Therefore, the white space and comments may be
     allowed in the higher-level tokens even though they may not
     explicitly appear in a particular definition.


All of which can be _exceedingly_ confusing. There was much discussion on
this list about 12 months ago about an alternative syntactic improvement
suggested by Bruce Lily, and I thought we had reached a consensus then to
bring it in. It is possible, even at this late stage, to do so?

3.2.2.  Quoted characters


     Note: The "\" character may appear in a message where it is not
     part of a quoted-pair.  A "\" character that does not appear in a
     quoted-pair is not semantically invisible.  The only places in
     this specification where quoted-pair currently appears are
     ccontent, qcontent, dcontent, no-fold-quote, and no-fold-literal.


We have already noted that no-fold-quote, and no-fold-literal can go. But,
as I have pointed out in a separate thread, you would remove a severe
interoperability problem with Netnews if you removed it from <dcontent> as
well (allowing just a "\" to appear as a normal character).

3.2.3.  Folding white space and comments

  Strings of characters enclosed in parentheses are considered comments
  so long as they do not appear within a "quoted-string", as defined in
  section 3.2.5.  Comments may nest.


That is not strictly correct, as a "\)" may appear in a <comment> without
closing the comment.

ctext           =       NO-WS-CTL /     ; Non white space controls
                                       ;
                       %d33-39 /       ; The rest of the US-ASCII
                       %d42-91 /       ;  characters not including "(",
                       %d93-126        ;  ")", or "\"


Do you _really_ want to permit NO-WS-CTL in a <comment>?

  Throughout this specification, where FWS (the folding white space
  token) appears, it indicates a place where folding, as discussed in
  section 2.2.3, may take place.  Wherever folding appears in a message
  (that is, a header field body containing a CRLF followed by any WSP),
  unfolding (removal of the CRLF) is performed before any further
  lexical analysis is performed on that header field according to this
  specification.  That is to say, any CRLF that appears in FWS is
  semantically "invisible."


Eh? That seems to be confusing "lexical analysis" with "semantic
analysis". If it had said "before any semantic analysis is performed" I
would have understood it.

3.2.5.  Quoted strings

qtext           =       NO-WS-CTL /     ; Non white space controls
                                        ;
                        %d33 /          ; The rest of the US-ASCII
                        %d35-91 /       ;  characters not including "\"
                        %d93-126        ;  or the quote character


Again, do you really want to permit NO-WS-CTL in a <quoted-string>?

3.2.6.  Miscellaneous tokens

  Three additional tokens are defined, word and phrase for combinations
  of atoms and/or quoted-strings, and unstructured for use in
  unstructured header fields and in some places within structured
  header fields.

  word            =       atom / quoted-string

  phrase          =       1*word / obs-phrase

  utext           =       NO-WS-CTL /     ; Non white space controls
                          %d33-126        ; The rest of US-ASCII

  unstructured    =       (*([FWS] utext) *WSP) / obs-unstruct


<phrase>s, <unstructured>s and <comment>s are the places where RFC 2047
raises its ugly head. It is the most confusingly written RFC I have
encountered (and it could be considered as separate from the rest of
the MIME standards, since it can be used without the MIME-Version header).

For a truly outrageous suggestion, we might incorporate the whole of RFC
2047 into here, cleaning it up in the process. No, that is too much to
propose at this juncture, but there are a couple of lesser things we might
do to help:

1. Include <encoded-word> in the syntax at all the proper places (which
might at least encourage inventors of new extension headers to follow
suit). It would need a convincing explanation, of course.

2. And if that is a step too far, we could still point out that sequences
of the form "=? ... ? ... ? ... ?=" have a special significance within RFC
2047 (whether they exceed that 76 character limit or not), and that such
sequences SHOULD NOT be used within <phrase>s, <unstructured>s and
<comment>s unless that special significance is intended.

3.3.  Date and Time Specification

  A date-time specification MUST be semantically valid.  That is, the
  day-of-week (if included) MUST be the day implied by the date, the
  numeric day-of-month MUST be between 1 and the number of days allowed
  for the specified month (in the specified year), the time-of-day MUST
  be in the range 00:00:00 through 23:59:60 (the number of seconds
  allowing for a leap second; see [RFC1305]), and the zone MUST be
  within the range -9959 through +9959.


why not "within the range -2359 through +2359"?

3.4.  Address Specification

  When it is desirable to treat several mailboxes as a single unit
  (i.e., in a distribution list), the group construct can be used.
   ...
  Because the list of mailboxes can be empty, using the group construct
  is also a simple way to communicate to recipients that the message
  was sent to one or more named sets of recipients, without actually
  providing the individual mailbox address for each of those
  recipients.


s/each of/any of/ or s/each of/some of/


3.4.1.  Addr-spec specification

  An addr-spec is a specific Internet identifier that contains a
  locally interpreted string followed by the at-sign character ("@",
  ASCII value 64) followed by an Internet domain.  The locally
  interpreted string is either a quoted-string or a dot-atom.  If the
  string can be represented as a dot-atom (that is, it contains no
  characters other than atext characters or "." surrounded by atext
  characters), then the dot-atom form SHOULD be used and the quoted-
  string form SHOULD NOT be used.  Comments and folding white space
  SHOULD NOT be used around the "@" in the addr-spec.  A liberal syntax
  for the domain portion of addr-spec is given here; it is left to
  other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
  [I-D.klensin-rfc2821bis]) to give more precise limitations on the
  syntax.


Can we strengthen that by saying that the 'liberal syntax' MUST be further
restricted to conform to some published specification such as the ones you
have listed (without precluding further such specifications in the future,
of course)?



dcontent        =       dtext / quoted-pair

dtext           =       NO-WS-CTL /     ; Non white space controls
                                       ;
                       %d33-90 /       ; The rest of the US-ASCII
                       %d94-126        ;  characters not including "[",
                                       ;  "]", or "\"


I have already pointed out, in a separate thread, the severe
interoperability problems with Netnews of this definition of <dcontent>
(at least insofar as its use within <msg-id> is concerned). The
troublesome items are <quoted-pair>, NO-WS-CTL, SP and ">" (though "\" in
dtext would be OK). My suggestion for restricting the 'liberal syntax'
above was also directed at mitigating this problem.

  ....  In both cases, how addressing is
  used and how messages are transported to a particular host is covered
  in [I-D.klensin-rfc2821bis].  These mechanisms are outside of the
  scope of this document.


There may be other transport mechanisms than I-D.klensin-rfc2821bis. So it
would be better to say "is covered in separate documents such as
[I-D.klensin-rfc2821bis]".


3.5.  Overall message syntax

  A message consists of header fields, optionally followed by a message
  body.  Lines in a message MUST be a maximum of 998 characters
  excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
  characters excluding the CRLF. ...


Again, that is a matter of BCP. "recommended" would be quite strong
enough.

3.6.  Field definitions

  The header fields of a message are defined here.  All header fields
  have the same general syntactic structure: A field name, followed by
  a colon, followed by the field body.  The specific syntax for each
  header field is defined in the subsequent sections.


I have already pointed out, in a separate thread, the severe
interoperability problem that arises with Netnews if you do not require a
SP after the colon. Since every MUA I am aware of routinely inserts that
SP, I cannot see that anything would be lost by requiring it here.

  |                |        |            |                            |
  | keywords       | 0      | unlimited  |                            |
  |                |        |            |                            |


Why is Keywords unlimited (in Netnews it is 1)? It is no big deal since
this field is so seldom used. But its presumed intended use of indexing
collections of email messages using this field would be simplified if only
one occurrence was allowed (the obs syntax would still allow multiple
occurrences, of course, and wording similar to that in 4.5.3 could be
used).

3.6.2.  Originator fields

  The originator fields indicate the mailbox(es) of the source of the
  message.  The "From:" field specifies the author(s) of the message,
  that is, the mailbox(es) of the person(s) or system(s) responsible
  for the writing of the message....


Are those sentences intended to be normative, BCP (or even deliberately
vague :-) ).

For example, some people 'munge' their From: addresses in order to appear
anonymous, or to confuse address harvesters. Whether that is a desirable
practice or not is none of our business, but a normative interpretation of
those words would seem to rule it out. I might well agree that it is not
BCP, but it happens.

The wording currently proposed by the USEFOR WG for this is:

   Contrary to [RFC2822], which implies that the mailbox or mailboxes in
   the From header field should be that of the poster or posters, a
   poster who does not, for whatever reason, wish to use his own mailbox
   MAY use any mailbox ending in the top level domain ".invalid"
   [RFC2606].

But if RFC2822 does not actually imply that, then we might have to think
again.

Or maybe this issue really belongs under Security Considerations?

  In all cases, the "From:" field SHOULD NOT contain any mailbox that
  does not belong to the author(s) of the message.  See also section
  3.6.3 for more information on forming the destination addresses for a
  reply.


Hmmm! That has a distinctly normative feel to it :-( .

3.6.3.  Destination address fields

  The destination fields specify the recipients of the message.  Each
  destination field may have one or more addresses, and each of the
  addresses indicate the intended recipients of the message.  The only

              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              indicates an intended recipient

  difference between the three fields is how each is used.

3.6.4.  Identification fields

  The "Message-ID:" field contains a single unique message identifier.
  The "References:" and "In-Reply-To:" field each contain one or more
  unique message identifiers, optionally separated by CFWS.

                                ^^^^^^^^^^

Interoperability with Netnews would be improved without that "optionally".
AFAICS all current MUAs routinely include some WSP there (probably
because they are following the lead of RFC 1036).

 no-fold-literal =       "[" *dcontent "]"


As already mentioned, the present definition of dcontent causes severe
interoperability problems with Netnews.

  The "In-Reply-To:" and "References:" fields are used when creating a
  reply to a message.  ..., while the
  "References:" field may be used to identify a "thread" of

                        ^^^^^^
                       is often

  conversation.

  The "References:" field will contain the contents of the parent's
  "References:" field (if any) followed by the contents of the parent's
  "Message-ID:" field (if any).  If the parent message does not contain
  a "References:" field but does have an "In-Reply-To:" field
  containing a single message identifier, then the "References:" field
  will contain the contents of the parent's "In-Reply-To:" field
  followed by the contents of the parent's "Message-ID:" field (if
  any).  If the parent has none of the "References:", "In-Reply-To:",
  or "Message-ID:" fields, then the new message will have no
  "References:" field.


It would be useful to mention that when the References field gets too long
it MAY be pruned (the minimum requirement being to retain the first and
the last two entries - including the one just being added). I have known
of cases where References fields grew to such a length (and MUAs in the
followup chain had failed to introduce folding, or even removed folding
already present) that the 998 limit was breached with disastrous
consequences.

  The message identifier (msg-id) itself MUST be a globally unique
  identifier for a message.  ...


It would be useful to say here that two msg-ids can always be compared for
equality by a simple octet-by-octet comparison (but, of course, one would
first have to ensure that property was true).

  .....  Since the msg-id has
  a similar syntax to addr-spec (identical except that comments and
  folding white space are not allowed), a good method ...

                       ^
           and quoted-strings
          (as already noted)

3.6.5.  Informational fields

  The informational fields are all optional.  The "Subject:" and
  "Comments:" fields are unstructured fields as defined in section
  2.2.1, and therefore may contain text or folding white space.  The
  "Keywords:" field contains a comma-separated list of one or more
  words or quoted-strings.


That last sentence is redundant, because a <word> cam already be a
<quoted-string>.


  subject         =       "Subject:" unstructured CRLF

  comments        =       "Comments:" unstructured CRLF

  keywords        =       "Keywords:" phrase *("," phrase) CRLF


Why did Organization get left out? It is a defined field within Netnews,
but it is also already widely used within Email.


  ....  When used in a reply, the field body MAY start with the
  string "Re: " (from the Latin "res", in the matter of) followed by
  the contents of the "Subject:" field body of the original message.


If we are going to discuss Latin Grammar, then please let us to so
correctly. "Res" is the nominative form of the fifth declension noun
meaning "thing", "matter", "issue", etc.  "Re" is an abbreviation of the
phrase "in re" meaning "in the matter of", and in which "re" is the
ablative form of the same noun (the preposition "in" is always followed by
an ablative in static cases such as this, though it takes the accusative
form - e.g. "in rem" - in dynamic cases where the meaning is "into").

so if, instead of
    the string "Re: " (from the Latin "res", in the matter of)
you write
    the string "Re: " (an abbreviation of the Latin "in re", meaning "in
    the matter of")
all will be correct.

  ....  The
  "Keywords:" field contains a comma-separated list of important words
  and phrases that might be useful for the recipient.


Actually, its main intent was to be picked up by search engines and the
like (agreed, the search engine might belong to the recipient).


3.6.6.  Resent fields

  When resent fields are used, the "Resent-From:" and "Resent-Date:"
  fields MUST be sent.  The "Resent-Message-ID:" field SHOULD be sent.
  "Resent-Sender:" SHOULD NOT be used if "Resent-Sender:" would be
  identical to "Resent-From:".


Why the SHOULD for Resent-Message-ID? What evil befalls if it is omitted?

3.6.7.  Trace fields

  The trace fields are a group of header fields consisting of an
  optional "Return-Path:" field, and one or more "Received:" fields.
  The "Return-Path:" header field contains a pair of angle brackets
  that enclose an optional addr-spec.  The "Received:" field contains a
  (possibly empty) list of tokens followed by a semicolon and a date-
  time specification.  Each token must be a word, angle-addr, addr-
  spec, or a domain.  Further restrictions are applied to the syntax of
  the trace fields by specifications that provide for their use, such
  as [I-D.klensin-rfc2821bis].


Can be find a better word instead of "token" here? "Token" usually means
some sort of keyword (e.g. as used in the MIME standards). "Item" or
"sub-field" are possible alternatives.

3.6.8.  Optional fields

  Fields may appear in messages that are otherwise unspecified in this
  document.  They MUST conform to the syntax of an optional-field.
  This is a field name, made up of the printable US-ASCII characters
  except SP and colon, followed by a colon, followed by any text which
  conforms to unstructured.

  The field names of any optional-field MUST NOT be identical to any
  field name specified elsewhere in this document.

 optional-field  =       field-name ":" unstructured CRLF

 field-name      =       1*ftext

 ftext           =       %d33-57 /               ; Any character except
                         %d59-126                ;  controls, SP, and
                                                 ;  ":".

  For the purposes of this specification, any optional field is
  uninterpreted.


This is misleading, because it has to cover all new header fields
introduced by extensions and these will be, in general, structured. For
example, does the use of <unstructured> here imply that RFC 2047 may be
used freely on any header field not defined by this document - clearly
2047 was not intended to imply that, but it would not be hard to read it
that way. Better to invent some new term <foobar> with the same syntax as
<unstructured>, and then to say in that last sentence that any internal
structure of a <foobar> has to be defined elsewhere.

4.  Obsolete Syntax

  Earlier versions of this specification allowed for different (usually
  more liberal) syntax than is allowed in this version.  Also, there
  have been syntactic elements used in messages on the Internet whose
  interpretation have never been documented.  Though some of these

    ^^^^^^^^^^^^^^                                     ^^^^
    interpretations                             Eh? I thought none of them
                                                was to be generated.

  syntactic forms MUST NOT be generated according to the grammar in
  section 3, they MUST be accepted and parsed by a conformant receiver.


which begs the question "what is a 'receiver'?" See earlier discussion.

  ....  This
  allowed many complex forms that have proven difficult for some
  implementations to parse.


Which sounds like a good reason for no longer requiring agents to parse
them :-( .

4.1.  Miscellaneous obsolete tokens

     Note: The "period" (or "full stop") character (".") in obs-phrase
     is not a form that was allowed in earlier versions of this or any
     other specification.  Period (nor any other character from
     specials) was not allowed in phrase because it introduced a
     parsing difficulty distinguishing between phrases and portions of
     an addr-spec (see section 4.4).  It appears here because the
     period character is currently used in many messages in the
     display-name portion of addresses, especially for initials in
     names, and therefore must be interpreted properly.  In the future,
     period may appear in the regular syntax of phrase.


But this is not an "obsolete" construct. We discussed this around 12
months ago, and the consensus then was that it ought to be renamed as an
<extended-phrase>, and moved out of the Obsolete Syntax.

Observe also that threat that it might one day be promoted to the regular
syntax. Are we ready yet to implement that threat? Probably not, but its
day will surely come. But it needs to be said in a more conspicuous place,
not hidden away in a section with "obsolete" in its name.

Likewise for <obs-phrase-list), wich is only used in <obs-keywords>. But
do we really care whether or not you are allowed to write
    Keywords: Joe Q. Public ?



obs-qp          =       "\" (%d0-127)

obs-body        =       *((*LF *CR *(obs-text *LF *CR)) / CRLF)

obs-text        =       %d0-9 / %d11-12 /       ; %d0-127 except CR and
                       %d14-127                ;  LF

obs-unstruct    =       *((*LF *CR *(obs-utext *LF *CR)) / FWS)

obs-utext       =       %d0-8 / %d11-12 /       ; %d0-127 except CR, LF,
                       %d14-31 / %d33-127      ; and white space


The syntax given for these obs-constructs includes also the syntax for
their regular counterparts, which makes it very hard work to discover
exactly where the difference lies because of the huge redundancy that is
introduced. For example, if you had written

   obs-qp        =       "\" %d0

nothing would have changed, but it would be immediately obvious what the
difference was. Most of the obs-syntax could be re-written in such a
non-redundant manner, though I grant you there are a few cases which would
be difficult.

4.3.  Obsolete Date and Time

  The syntax for the obsolete date format allows a 2 digit year in the
  date field and allows for a list of alphabetic time zone
  specifications that were used in earlier versions of this
  specification.  It also permits comments and folding white space
  between many of the tokens.

 obs-day-of-week =       [CFWS] day-name [CFWS]

 obs-year        =       [CFWS] 2*DIGIT [CFWS]

 obs-day         =       [CFWS] 1*2DIGIT [CFWS]

 obs-hour        =       [CFWS] 2DIGIT [CFWS]

 obs-minute      =       [CFWS] 2DIGIT [CFWS]

 obs-second      =       [CFWS] 2DIGIT [CFWS]

 obs-zone        =       (( "+" / "-" ) [CFWS] 4DIGIT) /

                           ..............

This lot was particularly difficult to spot the differences. I believe
that in <obs-hour> the first [CFWS] is unnecessary, and the [CFWS] at the
end of <obs-second> might be better moved to the start of <obs-zone>,
since that reflects better the way FWS is treated in the regular syntax.

4.4.  Obsolete Addressing

  There are three primary differences in addressing.  First, mailbox
  addresses were allowed to have a route portion before the addr-spec
  when enclosed in "<" and ">".  The route is simply a comma-separated
  list of domain names, each preceded by "@", and the list terminated
  by a colon.  Second, CFWS were allowed between the period-separated
  elements of local-part and domain (i.e., dot-atom was not used).  In
  addition, local-part is allowed to contain quoted-string in addition

                         ^^                                  ^^^^^^^^^^^
                        was

  to just atom.  Finally, ....

    ^^^^^^^^^^^^
    in lieu of any of those period-separated atoms

5.  Security Considerations

  Care needs to be taken when displaying messages on a terminal or
  terminal emulator.  Powerful terminals may act on escape sequences
  and other combinations of ASCII control characters with a variety of
  consequences.  They can remap the keyboard or permit other
  modifications to the terminal which could lead to denial of service
  or even damaged data.  They can trigger (sometimes programmable)
  answerback messages which can allow a message to cause commands to be
  issued on the recipient's behalf.  They can also affect the operation
  of terminal attached devices such as printers.  Message viewers may
  wish to strip potentially dangerous terminal escape sequences from
  the message prior to display.  However, other escape sequences appear
  in messages for useful purposes (cf. [ISO.2022.1994], [RFC2045],
  [RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]) and therefore
  should not be stripped indiscriminately.


Eh? what "other escape sequences" do you have in mind? I am not aware of
any meaningful use of NO-WS-CTLs in any of those standards. OTOH, if that
list is correct, should RFC 2231 be added to it?

  Many implementations use the "Bcc:" (blind carbon copy) field
  described in section 3.6.3 to facilitate sending messages to
  recipients without revealing the addresses of one or more of the
  addressees to the other recipients.  Mishandling this use of "Bcc:"
  may disclose confidential information which could eventually lead to
  security problems through knowledge of even the existence of a
  particular mail address.  For example, if using the first method
  described in section 3.6.3, where the "Bcc:" line is removed from the
  message, blind recipients have no explicit indication that they have
  been sent a blind copy, except insofar as their address does not
  appear in the message header section.  Because of this, one of the
  blind addressees could potentially send a reply to all of the shown
  recipients and accidentally reveal that the message went to the blind
  recipient.  When the second method from section 3.6.3 is used, the
  blind recipient's address appears in the "Bcc:" field of a separate
  copy of the message.  If the "Bcc:" field sent contains all of the
  blind addressees, all of the "Bcc:" recipients will be seen by each
  "Bcc:" recipient.  Even if a separate message is sent to each "Bcc:"
  recipient with only the individual's address, implementations still
  need to be careful to process replies to the message as per section
  3.6.3 so as not to accidentally reveal the blind recipient to other
  recipients.


But how could such revelation be avoided? Surely, if a blind recipient
replies to any message, his identity will be given away by the From header
of the reply? I see nothing in 3.6.3 that helps with this problem.

6.  IANA Considerations

  This document has no actions for IANA.


Oh yes it does!

RFC 3864 requires IANA to maintain a registry of header fields within
Email, News and HTTP. Anybody who defines a header field is obliged to
provide a template for use in that registry.

In the case of RFC 2822, Graham Klyne wrote a set of templates in RFC
4021. But these all reference RFC 2822 as the relevant specification
document, and so will become outdated as soon as 2822bis appears. On top
of that, they are all (IMHO) unnecessarily verbose.

So you need to write a full set of revised templates in here. I suggest
you look at draft-ietf-usefor-usefor-12 for how we laid out those
templates for USEFOR (carefullly avoiding such verbosities :-) ).

Appendix A.  Example messages

  Messages are delimited in this section between lines of "----".  The
  "----" lines are not part of the message itself.


That is indeed an excellent notation. The Bad News is that you have
nowhere used it :-( .

Appendix A.1.2.  Different types of mailboxes

  This message includes multiple addresses in the destination fields
  and also uses several different forms of addresses.

  From: "Joe Q. Public" <john(_dot_)q(_dot_)public(_at_)example(_dot_)com>
  To: Mary Smith <mary(_at_)x(_dot_)test>, jdoe(_at_)example(_dot_)org, Who? 
<one(_at_)y(_dot_)test>
  Cc: <boss(_at_)nil(_dot_)test>, "Giant; \"Big\" Box" 
<sysservices(_at_)example(_dot_)net>
  Date: Tue, 1 Jul 2003 10:52:37 +0200
  Message-ID: <5678(_dot_)21-Nov-1997(_at_)example(_dot_)com>

  Hi everyone.

  Note that the display names for Joe Q. Public and Giant; "Big" Box
  needed to be enclosed in double-quotes because the former contains
  the period and the latter contains both semicolon and double-quote
  characters (the double-quote characters appearing as quoted-pair
  construct).  ...

    ^^^^^^^^^
    constructs


Appendix A.1.3.  Group addresses

From: Pete <pete(_at_)silly(_dot_)example>
To: A Group:Chris Jones <c(_at_)a(_dot_)test>,joe(_at_)where(_dot_)test,John 
<jdoe(_at_)one(_dot_)test>;
Cc: Undisclosed recipients:;
Date: Thu, 13 Feb 1969 23:32:54 -0330
Message-ID: <testabcd(_dot_)1234(_at_)silly(_dot_)example>

Testing.

  In this message, the "To:" field has a single group recipient named A

                                                                       ^
                                                                       "

  Group which contains 3 addresses, and a "Cc:" field with an empty

         ^
         "

  group recipient named Undisclosed recipients.


Wouldn't it be better to show a Bcc: header for the "Undisclosed
recipients" example?

Appendix A.4.  Messages with trace fields

  As messages are sent through the transport system as described in
  [I-D.klensin-rfc2821bis], trace fields are prepended to the message.
  The following is an example of what those trace fields might look
  like.  Note that there is some folding white space in the first one

                                   ^^^^^^^
                                   folded

  since these lines can be long.

Appendix A.5.  White space, comments, and other oddities

  White space, including folding white space, and comments can be
  inserted between many of the tokens of fields.  Taking the example
  from A.1.3, white space and comments can be inserted into all of the
  fields.

From: Pete(A wonderful \) chap) <pete(his account)@silly.test(his host)>
To:A Group(Some people)
    :Chris Jones <c@(Chris's host.)public.example>,
        joe(_at_)example(_dot_)org,
 John <jdoe(_at_)one(_dot_)test> (my dear friend); (the end of the group)
Cc:(Empty list)(start)Undisclosed recipients  :(nobody(that I know))  ;
Date: Thu,
     13
       Feb
         1969
     23:32
              -0330 (Newfoundland Time)
Message-ID:              <testabcd(_dot_)1234(_at_)silly(_dot_)test>

Testing.

  The above example is aesthetically displeasing, but perfectly legal.


Though legal, you should point out that it contains things that are
deprecated by 3.3 and by 3.4.1, and that agents might well choke on them.

7.  References

7.2.  Informative References

  [RFC1036]  Horton, M. and R. Adams, "Standard for interchange of
             USENET messages", RFC 1036, December 1987.


RFC 1036 is not actually referenced anywhere in the document. Moreover, if
you DO have a need to reference it (please do :-) ) then you ought to
refer to draft-ietf-usefor-usefor-12 instead.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 Clerewood Ave, 
CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5