Re: Comments on draft-resnick-2822upd-02.txt

My apologies for not making these comments sooner, but I have now read
carefully through the whole draft and have encountered various nits and
niggles. Also various things I do not understand (to which I have raised
questions) and things which seem a little odd and might be worth thinking
further about. Also some specific suggestions for improvement (some off
which, I appreciate, may be hard to incorporate at this late stage). And
also some plain bugs, which of course must be fixed.

                       Internet Message Format
                       draft-resnick-2822upd-02

Abstract

  This document specifies a syntax for text messages that are sent
  between computer users, within the framework of "electronic mail"
  messages.  This specification is a revision Request For Comments

                                               ^
                                               of

  (RFC) 2822, which itself superseded Request For Comments (RFC) 822,

1.  Introduction

1.1.  Scope

  This document specifies a syntax only for text messages.  In
  particular, it makes no provision for the transmission of images,
  audio, or other sorts of structured data in electronic mail messages.
  There are several extensions published, such as the MIME document
  series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
  for the transmission of such data through electronic mail,........

No mention of RFC 2047, or of RFC 2231?


RFC 2047 specifies a means to include non-ASCII text in email headers and has
essentially nothing to do with the transmission of structured data through
email. So why should it be mentioned in this context?

As for RFC 2231, I suppose an argument can be made that it touches on
transmission of nontext material by providing added labelling options, but it's
frankly a stretch and given 2231's lower standards status I am opposed to
mentioning it here.

     Note: This specification is not intended to dictate the internal
     formats used by sites, the specific message system features that
     they are expected to support, or any of the characteristics of
     user interface programs that create or read messages.  In
     addition, this document does not specify an encoding of the
     characters for either transport or storage; that is, it does not
     specify the number of bits used or how those bits are specifically
     transferred over the wire or stored on disk.

That last sentence seems wrong/confusing, given that the document specifies
everything in terms of US-ASCII, presumably with the intent that US-ASCII
should be the normal means of interchange over the 'wire' between agents (in
the absence of explicit agreement otherwise).


What it says seems fairly clear to me: Systems exist that store things in units
other than octets, use counts rather than inline terminators, and so on and so
forth and this document isn't attempting to proscribe such behavior.

In any case, if you want the wording changed I think you'd best suggest
some alternative.

1.2.  Notational conventions

1.2.3.  Structure of this document

  Section 4 of this document specifies an "obsolete" syntax.  There are
  references in section 3 to these obsolete syntactic elements.  The
  rules of the obsolete syntax are elements that have appeared in
  earlier revisions of this specification or have previously been
  widely used in Internet messages.  As such, these elements MUST be
  interpreted by parsers of messages in order to be conformant to this
  specification.  However, since items in this syntax have been
  determined to be non-interoperable or to cause significant problems
  for recipients of messages, they MUST NOT be generated by creators of
  conformant messages.

Can we be clear about the _intent_ of this obs-syntax?

Is the intent to be able to read/display/print ancient messages which
people still have on file? In which case, please can we say that there is
no longer any expectation that obs messages can still be transmitted and
delivered (by RFC2821 or otherwise), and hence only MUAs (but not MTAs)
are REQUIRED to accept them.

Or, alternatively, is the intent that some ancient software still
generates messages using the obs-syntax, and hence MTAs MUST still accept
them? In which case, for how much longer?


I see little if any justification for additional elaboration of the intent
here. As far as I'm oncerned the intent covers both your "alternatives' and
quite a few other things as well.

More generally, the problem with trying to nail down intent is that once you do
so the ability of the construct to meet other, as-yet-unplanned needs may be
compromised. See below for an example of another possible use of the obs-
syntax - helping with interop requirements - should we make our own lives
harder down the road simply because this document didn't say that one
intent of this was to make feature enumeration easier?

Moreover, does it make sense to distinguish between obs constructs that
were theoretically permitted by RFC822 but in practice never seen in the
wild, as opposed to those which actually saw significant usage at one
time? IOW what possibilities exist for the eventual removal of at least
some of the weirder obs constructs?


The grammar is always quite complex and confusing. Adding another splitting
point would make it even more complex and confusing. Additionally, I really
don't think we need yet another round of interminable arguing over whether or
not some system that hasn't been seen or heard of in several decades did such
and such or so and so.

I will also point out that the obsolete syntax split will be very helpful if we
have to engage in a supported feature enumeration exercise as part of moving to
draft standard. As it stands all we would have to do is demonstrate that
support exists to consume this junk, not produce it. Identification of some
subset of obsolete features that are supposedly never used could easily
bring up the question of whether or not such features should be removed
completely, which I fear would unavoidably send us straight down a huge
rathole.

So, while I get your point and see some benefit to tracking the implementation
status of obsolete features (and non-obsolete ones as well), I don't believe
the benefits of doing so exceed the costs. If this is to be done it should be
done in some other document or perhaps just a web page. 

(Incidentally, I suspect that any such collection exercise will, if conducted
properly, surprise you in how many of the festures you haven't seen personally
were actually used some time and some where. But that's just a guess on my
part.)

2.  Lexical Analysis of Messages

2.1.  General Description


     Note: This document specifies that messages are made up of
     characters in the US-ASCII range of 1 through 127.  There are
     other documents, specifically the MIME document series ([RFC2045],
     [RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]), that
     extend this specification to allow for values outside of that
     range. ...

It is likely that the extensions from the EAI WG will shortly be added to
that list. Any possibility of a mention now?


Given that EAI is chartered to produce expermental specifications, IMO a
mention here is really not appropriate.

2.1.1.  Line Length Limits

  There are two limits that this specification places on the number of
  characters in a line.  Each line of characters MUST be no more than
  998 characters, and SHOULD be no more than 78 characters, excluding
  the CRLF.

Can we de-emphasise that SHOULD, and make it clear that this is a matter
of good practice (in the sense of BCP) rather than a normative feature?


That would be a significant change in normative language from RFC 2822 and
could easily be seen as requiring a reset to proposed  Moreover, if memory
serves, this language went into RFC 2822 not because the WG wanted it but
rather because the IESG insisted on it.

The specific member of the IESG that insisted on it is gone now, but the
sentiment behind the requirement may live on for all I know.

Besides, I think a SHOULD is actually appropriate here. SHOULD means you should
do it unless you have a really good reason not to.

Perhaps s/SHOULD/should/? Too many agents have used this as an excuse to
rewrite lines en route (maybe there should be a SHOULD NOT for that).


If so, they are relying on a flagrant misreading of the document. Attempting to
prevent such exercises is itself and exercise in futility - the best you can
ever do is point out that the claim isn't supported by the actual language.

As for having a SHOULD NOT about agents altering messages in transit, IMO the
place for that - assuming it makes sense to do - would be the SMTP
specification, not here.

Perhaps also some informative reference to 'format=flowed' might be in
order.


That cannot be done meaningfully without also referring to MIME. While the
present layering of these specification is awkward when it comes to stuff like
this, I  think having all sorts of informational cross references would confuse
as much as it clarifies.

  The more conservative 78 character recommendation is to accommodate
  the many implementations of user interfaces that display these
  messages which may truncate, or disastrously wrap, the display of
  more than 78 characters per line, in spite of the fact that such
  implementations are non-conformant to the intent of this
  specification (and that of [I-D.klensin-rfc2821bis] if they actually

Where did that '78' come from? I am aware of lots of systems that do
horrid things such as you mention if there are 80 characters in a line,
but I am aware of none where problems arise with exactly 79. In other fora
where I have seen this discussed, the consensus was that exceeding '79'
was the signal for troubles to start.


I've always felt that the 78 character limit was one byte lower than it really
needed to be. But I a long way from convinced that now is the time to change
this.

2.2.  Header Fields

2.2.3.  Long Header Fields

  The process of moving from this folded multiple-line representation
  of a header field to its single line representation is called
  "unfolding".  Unfolding is accomplished by simply removing any CRLF
  that is immediately followed by WSP.  Each header field should be
  treated in its unfolded form for further syntactic and semantic

                                             ^^^^^^^^^

  evaluation.

'Semantic' yes, but why is that 'syntactic' there?


Don't you have to parse things like address fields in order to then perform
semantic analysis? Syntactic not only seems appropriate, IMO it would be an
error not to have it.

2.3.  Body

  o  Lines of characters in the body MUST be limited to 998 characters,
     and SHOULD be limited to 78 characters, excluding the CRLF.

Again, that 'SHOULD' is a matter of BCP.


Same argument as before.

3.  Syntax


3.2.  Lexical Tokens

     Note: Readers of this specification need to pay special attention
     to how these lexical tokens are used in both the lower-level and
     higher-level syntax later in the document.  Particularly, the
     white space tokens and the comment tokens defined in section 3.2.3
     get used in the lower-level tokens defined here, and those lower-
     level tokens are in turn used as parts of the higher-level tokens
     defined later.  Therefore, the white space and comments may be
     allowed in the higher-level tokens even though they may not
     explicitly appear in a particular definition.

All of which can be _exceedingly_ confusing. There was much discussion on
this list about 12 months ago about an alternative syntactic improvement
suggested by Bruce Lily, and I thought we had reached a consensus then to
bring it in. It is possible, even at this late stage, to do so?


I agree with that it is confusing. Howevver, I did not find Bruce's alternative
helped matters all that much. Nor do I recall any strong consensus to bring it
in. In fact what I recall is almost exactly the opposite: That since we're
attempting to move to draft now we would make every effort to keep the changes
to the absolute minimum. A wholesale change to the grammar doesn't even come
close to meeting that goal.

3.2.2.  Quoted characters


     Note: The "\" character may appear in a message where it is not
     part of a quoted-pair.  A "\" character that does not appear in a
     quoted-pair is not semantically invisible.  The only places in
     this specification where quoted-pair currently appears are
     ccontent, qcontent, dcontent, no-fold-quote, and no-fold-literal.

We have already noted that no-fold-quote, and no-fold-literal can go. But,
as I have pointed out in a separate thread, you would remove a severe
interoperability problem with Netnews if you removed it from <dcontent> as
well (allowing just a "\" to appear as a normal character).


As I believe I stated in an earler response, I am opposed to removing it. It is
simply not possible to know everything that's out there and just because we
don't know about something is no excuse to break it.  I could live with moving
it to the obsolete syntax but that's as far as I'll go.

3.2.3.  Folding white space and comments

  Strings of characters enclosed in parentheses are considered comments
  so long as they do not appear within a "quoted-string", as defined in
  section 3.2.5.  Comments may nest.

That is not strictly correct, as a "\)" may appear in a <comment> without
closing the comment.


But that's a literal, no an enclosing parentheses. I really don't think making
the text explicit about this case will clarify things, but again, if you have
text to suggest...

ctext           =       NO-WS-CTL /     ; Non white space controls
                                       ;
                       %d33-39 /       ; The rest of the US-ASCII
                       %d42-91 /       ;  characters not including "(",
                       %d93-126        ;  ")", or "\"

Do you _really_ want to permit NO-WS-CTL in a <comment>?


RFC 2822 did, so the question becomes one of do we want to change
this away from what 2822 said?

Like it or not, control characters have long been allowed in a lot of places
where they really don't belong. I remain to be convinced that this one
narrow case is worth worrying about.

  Throughout this specification, where FWS (the folding white space
  token) appears, it indicates a place where folding, as discussed in
  section 2.2.3, may take place.  Wherever folding appears in a message
  (that is, a header field body containing a CRLF followed by any WSP),
  unfolding (removal of the CRLF) is performed before any further
  lexical analysis is performed on that header field according to this
  specification.  That is to say, any CRLF that appears in FWS is
  semantically "invisible."

Eh? That seems to be confusing "lexical analysis" with "semantic
analysis". If it had said "before any semantic analysis is performed" I
would have understood it.


I agree - "semantic" would better here.

3.2.5.  Quoted strings

qtext           =       NO-WS-CTL /     ; Non white space controls
                                        ;
                        %d33 /          ; The rest of the US-ASCII
                        %d35-91 /       ;  characters not including "\"
                        %d93-126        ;  or the quote character

Again, do you really want to permit NO-WS-CTL in a <quoted-string>?


Same response.

3.2.6.  Miscellaneous tokens

  Three additional tokens are defined, word and phrase for combinations
  of atoms and/or quoted-strings, and unstructured for use in
  unstructured header fields and in some places within structured
  header fields.

  word            =       atom / quoted-string

  phrase          =       1*word / obs-phrase

  utext           =       NO-WS-CTL /     ; Non white space controls
                          %d33-126        ; The rest of US-ASCII

  unstructured    =       (*([FWS] utext) *WSP) / obs-unstruct

<phrase>s, <unstructured>s and <comment>s are the places where RFC 2047
raises its ugly head. It is the most confusingly written RFC I have
encountered (and it could be considered as separate from the rest of
the MIME standards, since it can be used without the MIME-Version header).

For a truly outrageous suggestion, we might incorporate the whole of RFC
2047 into here, cleaning it up in the process. No, that is too much to
propose at this juncture, but there are a couple of lesser things we might
do to help:

1. Include <encoded-word> in the syntax at all the proper places (which
might at least encourage inventors of new extension headers to follow
suit). It would need a convincing explanation, of course.


I am strongly opopsed to this. If RFC 2047 is confusing, the time to argue that
is when it is revised. We cannot fix it's problems (assuming there actually are
any to fix) by incorporating some subset of references to it in another
document.

2. And if that is a step too far, we could still point out that sequences
of the form "=? ... ? ... ? ... ?=" have a special significance within RFC
2047 (whether they exceed that 76 character limit or not), and that such
sequences SHOULD NOT be used within <phrase>s, <unstructured>s and
<comment>s unless that special significance is intended.


An informational reference to RFC 2047 would be OK with me.

3.3.  Date and Time Specification

  A date-time specification MUST be semantically valid.  That is, the
  day-of-week (if included) MUST be the day implied by the date, the
  numeric day-of-month MUST be between 1 and the number of days allowed
  for the specified month (in the specified year), the time-of-day MUST
  be in the range 00:00:00 through 23:59:60 (the number of seconds
  allowing for a leap second; see [RFC1305]), and the zone MUST be
  within the range -9959 through +9959.

why not "within the range -2359 through +2359"?


I have no objection to restricting the range, but whatever we do needs to agree
with other specifications that deal in time zones. RFC 3339 appears to allow
-2459 through +2459.

3.4.  Address Specification

  When it is desirable to treat several mailboxes as a single unit
  (i.e., in a distribution list), the group construct can be used.
   ...
  Because the list of mailboxes can be empty, using the group construct
  is also a simple way to communicate to recipients that the message
  was sent to one or more named sets of recipients, without actually
  providing the individual mailbox address for each of those
  recipients.

s/each of/any of/ or s/each of/some of/


Agreed.

3.4.1.  Addr-spec specification

  An addr-spec is a specific Internet identifier that contains a
  locally interpreted string followed by the at-sign character ("@",
  ASCII value 64) followed by an Internet domain.  The locally
  interpreted string is either a quoted-string or a dot-atom.  If the
  string can be represented as a dot-atom (that is, it contains no
  characters other than atext characters or "." surrounded by atext
  characters), then the dot-atom form SHOULD be used and the quoted-
  string form SHOULD NOT be used.  Comments and folding white space
  SHOULD NOT be used around the "@" in the addr-spec.  A liberal syntax
  for the domain portion of addr-spec is given here; it is left to
  other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
  [I-D.klensin-rfc2821bis]) to give more precise limitations on the
  syntax.

Can we strengthen that by saying that the 'liberal syntax' MUST be further
restricted to conform to some published specification such as the ones you
have listed (without precluding further such specifications in the future,
of course)?


No, because that would usurp the perogative of other specifications to
specify what conformances criteria apply to their additional restrictions.

More generally, it strikes me as a really bad idea to have a MUST for which
compliance can only be determined by consulting an incomplete and open-ended
series of other specifications. This in effect would make it impossible to be
able to claim you have a compliant implementation since you cannot possibly
know when another specfication would be added to this list.

dcontent        =       dtext / quoted-pair

dtext           =       NO-WS-CTL /     ; Non white space controls
                                       ;
                       %d33-90 /       ; The rest of the US-ASCII
                       %d94-126        ;  characters not including "[",
                                       ;  "]", or "\"

I have already pointed out, in a separate thread, the severe
interoperability problems with Netnews of this definition of <dcontent>
(at least insofar as its use within <msg-id> is concerned). The
troublesome items are <quoted-pair>, NO-WS-CTL, SP and ">" (though "\" in
dtext would be OK). My suggestion for restricting the 'liberal syntax'
above was also directed at mitigating this problem.


And IMO you failed to achieve sufficient support to result in a specification
change. As I said previously, I can live with making the use of quoted-pairs in
dtext part of the obsolete syntax, but that's as far as I can see us going.

The alternative approach I actually favor is one I have previously described:
Add some text that says that domain literals in message ids should be generated
using the most restrictive syntax and with well-defined semantics, i.e. an IPv4
or IPv6 literal. To mind the bigger problem here is that someone will generate
something like [foobar] here instead of putting in an actual global IPv4 or
IPv6 address. If we encourage people to use domain literals with defined
semantics we solve several problems at one go.

  ....  In both cases, how addressing is
  used and how messages are transported to a particular host is covered
  in [I-D.klensin-rfc2821bis].  These mechanisms are outside of the
  scope of this document.

There may be other transport mechanisms than I-D.klensin-rfc2821bis. So it
would be better to say "is covered in separate documents such as
[I-D.klensin-rfc2821bis]".


I don't feel strongly about this either way.

3.5.  Overall message syntax

  A message consists of header fields, optionally followed by a message
  body.  Lines in a message MUST be a maximum of 998 characters
  excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
  characters excluding the CRLF. ...

Again, that is a matter of BCP. "recommended" would be quite strong
enough.


Again, such recommendaations are what SHOULDs and RECOMMENDED are for and I'm
opposed to making such a change at this point.

3.6.  Field definitions

  The header fields of a message are defined here.  All header fields
  have the same general syntactic structure: A field name, followed by
  a colon, followed by the field body.  The specific syntax for each
  header field is defined in the subsequent sections.

I have already pointed out, in a separate thread, the severe
interoperability problem that arises with Netnews if you do not require a
SP after the colon. Since every MUA I am aware of routinely inserts that
SP, I cannot see that anything would be lost by requiring it here.


And as I commented previously, not everything that has an submissiion client in
it is an MUA. There are quick and dirty submission clients embedded in all
sorts of places - one of the advantages of SMTP is that you can code a quick
and dirty client very easily - and leaving out every possible character is
exactly the sort of things these gizmos do. Heck, they  even do it when they've
actually got plenty of space to space - unnecessary optimization is RAMPANT in
the embedded systems world.

I am therefore strongly opposed to disenfranchising such systems.

  |                |        |            |                            |
  | keywords       | 0      | unlimited  |                            |
  |                |        |            |                            |

Why is Keywords unlimited (in Netnews it is 1)? It is no big deal since
this field is so seldom used. But its presumed intended use of indexing
collections of email messages using this field would be simplified if only
one occurrence was allowed (the obs syntax would still allow multiple
occurrences, of course, and wording similar to that in 4.5.3 could be
used).


I could live with such a change, but OTOH I'm not sure I see much point in
making it. The same set of arguments that it doesn't have an impact also make a
case for leaving it alone.

3.6.2.  Originator fields

  The originator fields indicate the mailbox(es) of the source of the
  message.  The "From:" field specifies the author(s) of the message,
  that is, the mailbox(es) of the person(s) or system(s) responsible
  for the writing of the message....

Are those sentences intended to be normative, BCP (or even deliberately
vague :-) ).


Don't see any capitalized words there, do you? So I guess there are
no compliance implications.

For example, some people 'munge' their From: addresses in order to appear
anonymous, or to confuse address harvesters. Whether that is a desirable
practice or not is none of our business, but a normative interpretation of
those words would seem to rule it out. I might well agree that it is not
BCP, but it happens.

The wording currently proposed by the USEFOR WG for this is:

   Contrary to [RFC2822], which implies that the mailbox or mailboxes in
   the From header field should be that of the poster or posters, a
   poster who does not, for whatever reason, wish to use his own mailbox
   MAY use any mailbox ending in the top level domain ".invalid"
   [RFC2606].

But if RFC2822 does not actually imply that, then we might have to think
again.


And IMO it should not imply that. The last thing email needs at this point is
more license to use invalid addresses.

Or maybe this issue really belongs under Security Considerations?

  In all cases, the "From:" field SHOULD NOT contain any mailbox that
  does not belong to the author(s) of the message.  See also section
  3.6.3 for more information on forming the destination addresses for a
  reply.

Hmmm! That has a distinctly normative feel to it :-( .


There are capitalized words there so it isn't just a feeling. And IMO those
words are, if anything, too weak.

3.6.3.  Destination address fields

  The destination fields specify the recipients of the message.  Each
  destination field may have one or more addresses, and each of the
  addresses indicate the intended recipients of the message.  The only

              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              indicates an intended recipient


Seems reasonable.

  difference between the three fields is how each is used.

3.6.4.  Identification fields

  The "Message-ID:" field contains a single unique message identifier.
  The "References:" and "In-Reply-To:" field each contain one or more
  unique message identifiers, optionally separated by CFWS.

                                ^^^^^^^^^^

Interoperability with Netnews would be improved without that "optionally".


Perhaps, but like it or not Netnews compatibility is not our primary goal here.

AFAICS all current MUAs routinely include some WSP there (probably
because they are following the lead of RFC 1036).


At one point I actually had to change our parser to handle the case where there
was no space there, so I'm fairly sure that not everything out there inserts
the spaces.

 no-fold-literal =       "[" *dcontent "]"

As already mentioned, the present definition of dcontent causes severe
interoperability problems with Netnews.


Yes, and your repetition of the point without further concrete arguments in
favor or making the change is if anything weakening your case, not
strengthening it.

  The "In-Reply-To:" and "References:" fields are used when creating a
  reply to a message.  ..., while the
  "References:" field may be used to identify a "thread" of

                        ^^^^^^
                       is often

  conversation.

  The "References:" field will contain the contents of the parent's
  "References:" field (if any) followed by the contents of the parent's
  "Message-ID:" field (if any).  If the parent message does not contain
  a "References:" field but does have an "In-Reply-To:" field
  containing a single message identifier, then the "References:" field
  will contain the contents of the parent's "In-Reply-To:" field
  followed by the contents of the parent's "Message-ID:" field (if
  any).  If the parent has none of the "References:", "In-Reply-To:",
  or "Message-ID:" fields, then the new message will have no
  "References:" field.

It would be useful to mention that when the References field gets too long
it MAY be pruned (the minimum requirement being to retain the first and
the last two entries - including the one just being added). I have known
of cases where References fields grew to such a length (and MUAs in the
followup chain had failed to introduce folding, or even removed folding
already present) that the 998 limit was breached with disastrous
consequences.


Adding such a suggestion would be fine with me were it not for the context of
this effort - every such change increases the likelihood of a problem getting
to draft.

  The message identifier (msg-id) itself MUST be a globally unique
  identifier for a message.  ...

It would be useful to say here that two msg-ids can always be compared for
equality by a simple octet-by-octet comparison (but, of course, one would
first have to ensure that property was true).


But that's exactly the problem, isn't it? The property is NOT guaranteed
at the present time.

  .....  Since the msg-id has
  a similar syntax to addr-spec (identical except that comments and
  folding white space are not allowed), a good method ...

                       ^
           and quoted-strings
          (as already noted)


Seems OK to make this change.

3.6.5.  Informational fields

  The informational fields are all optional.  The "Subject:" and
  "Comments:" fields are unstructured fields as defined in section
  2.2.1, and therefore may contain text or folding white space.  The
  "Keywords:" field contains a comma-separated list of one or more
  words or quoted-strings.

That last sentence is redundant, because a <word> cam already be a
<quoted-string>.


The problem is that it isn't clear whether this refers to the generic idea of a
word or the specific ABNF production for word. So, while it may be somewhat
redundant, I thknk it is actually clearer the way it is.

  subject         =       "Subject:" unstructured CRLF

  comments        =       "Comments:" unstructured CRLF

  keywords        =       "Keywords:" phrase *("," phrase) CRLF

Why did Organization get left out? It is a defined field within Netnews,
but it is also already widely used within Email.


Organzation was not defined in either RFC 822 or RFC 2822. According the IANA
registry it is only defined in Netnews, but regardless of other places it is
defined not is absolutely not the time to be adding it to the 822 family of
specifications.

  ....  When used in a reply, the field body MAY start with the
  string "Re: " (from the Latin "res", in the matter of) followed by
  the contents of the "Subject:" field body of the original message.

If we are going to discuss Latin Grammar, then please let us to so
correctly. "Res" is the nominative form of the fifth declension noun
meaning "thing", "matter", "issue", etc.  "Re" is an abbreviation of the
phrase "in re" meaning "in the matter of", and in which "re" is the
ablative form of the same noun (the preposition "in" is always followed by
an ablative in static cases such as this, though it takes the accusative
form - e.g. "in rem" - in dynamic cases where the meaning is "into").

so if, instead of
    the string "Re: " (from the Latin "res", in the matter of)
you write
    the string "Re: " (an abbreviation of the Latin "in re", meaning "in
    the matter of")
all will be correct.


Yep, now that I think about it you're correct. This is a reasonable change.
Alternately, the  whole thing about the Latin could be omitted.

  ....  The
  "Keywords:" field contains a comma-separated list of important words
  and phrases that might be useful for the recipient.

Actually, its main intent was to be picked up by search engines and the
like (agreed, the search engine might belong to the recipient).


Um, not hardly. The definition and use of keywords in email dates back to the
mid 70s, long before storage was cheap enough to waste on compiling indices of
interpersonal or even list mail. At that point the intent really was to aid the
recipient in classifying messages, nothing more and nothing less.

In any case, since "useful to the recipient" seems to cover search engines as
well as countless other uses, I really don't see much need for additional
elaboration here.

3.6.6.  Resent fields

  When resent fields are used, the "Resent-From:" and "Resent-Date:"
  fields MUST be sent.  The "Resent-Message-ID:" field SHOULD be sent.
  "Resent-Sender:" SHOULD NOT be used if "Resent-Sender:" would be
  identical to "Resent-From:".

Why the SHOULD for Resent-Message-ID? What evil befalls if it is omitted?


I'm not a big fan of how resent- has been handled in more recent
specifications, but in this particular case I think the intent is obvious: the
message is in some sense "new" to the transport system so having a new id is a
good idea. It's not like there's an id shortage...

3.6.7.  Trace fields

  The trace fields are a group of header fields consisting of an
  optional "Return-Path:" field, and one or more "Received:" fields.
  The "Return-Path:" header field contains a pair of angle brackets
  that enclose an optional addr-spec.  The "Received:" field contains a
  (possibly empty) list of tokens followed by a semicolon and a date-
  time specification.  Each token must be a word, angle-addr, addr-
  spec, or a domain.  Further restrictions are applied to the syntax of
  the trace fields by specifications that provide for their use, such
  as [I-D.klensin-rfc2821bis].

Can be find a better word instead of "token" here? "Token" usually means
some sort of keyword (e.g. as used in the MIME standards). "Item" or
"sub-field" are possible alternatives.


I would prefer item as well. But I don't feel strongly about it.

3.6.8.  Optional fields

  Fields may appear in messages that are otherwise unspecified in this
  document.  They MUST conform to the syntax of an optional-field.
  This is a field name, made up of the printable US-ASCII characters
  except SP and colon, followed by a colon, followed by any text which
  conforms to unstructured.

  The field names of any optional-field MUST NOT be identical to any
  field name specified elsewhere in this document.

 optional-field  =       field-name ":" unstructured CRLF

 field-name      =       1*ftext

 ftext           =       %d33-57 /               ; Any character except
                         %d59-126                ;  controls, SP, and
                                                 ;  ":".

  For the purposes of this specification, any optional field is
  uninterpreted.

This is misleading, because it has to cover all new header fields
introduced by extensions and these will be, in general, structured. For
example, does the use of <unstructured> here imply that RFC 2047 may be
used freely on any header field not defined by this document - clearly
2047 was not intended to imply that, but it would not be hard to read it
that way. Better to invent some new term <foobar> with the same syntax as
<unstructured>, and then to say in that last sentence that any internal
structure of a <foobar> has to be defined elsewhere.

4.  Obsolete Syntax

  Earlier versions of this specification allowed for different (usually
  more liberal) syntax than is allowed in this version.  Also, there
  have been syntactic elements used in messages on the Internet whose
  interpretation have never been documented.  Though some of these

    ^^^^^^^^^^^^^^                                     ^^^^
    interpretations                             Eh? I thought none of them
                                                was to be generated.


These both seems like good changes to make to me.

  syntactic forms MUST NOT be generated according to the grammar in
  section 3, they MUST be accepted and parsed by a conformant receiver.

which begs the question "what is a 'receiver'?" See earlier discussion.


I think this is reasonably clear and I don't see much value in  trying to get
into defining the overall email architecture here. We have another document for
that.

  ....  This
  allowed many complex forms that have proven difficult for some
  implementations to parse.

Which sounds like a good reason for no longer requiring agents to parse
them :-( .


Perhaps if this case had been made effectively six years ago. Now is not the
time.

4.1.  Miscellaneous obsolete tokens

     Note: The "period" (or "full stop") character (".") in obs-phrase
     is not a form that was allowed in earlier versions of this or any
     other specification.  Period (nor any other character from
     specials) was not allowed in phrase because it introduced a
     parsing difficulty distinguishing between phrases and portions of
     an addr-spec (see section 4.4).  It appears here because the
     period character is currently used in many messages in the
     display-name portion of addresses, especially for initials in
     names, and therefore must be interpreted properly.  In the future,
     period may appear in the regular syntax of phrase.

But this is not an "obsolete" construct. We discussed this around 12
months ago, and the consensus then was that it ought to be renamed as an
<extended-phrase>, and moved out of the Obsolete Syntax.


I don't recall the discussion, but even if that was the consensus at the time
we were unaware of the constraints of this revision exercise. I am therefore
opposed to making this change at this point.

Observe also that threat that it might one day be promoted to the regular
syntax. Are we ready yet to implement that threat? Probably not, but its
day will surely come. But it needs to be said in a more conspicuous place,
not hidden away in a section with "obsolete" in its name.


I agree with the overall sentiment but now is not the time...

Likewise for <obs-phrase-list), wich is only used in <obs-keywords>. But
do we really care whether or not you are allowed to write
    Keywords: Joe Q. Public ?



obs-qp          =       "\" (%d0-127)

obs-body        =       *((*LF *CR *(obs-text *LF *CR)) / CRLF)

obs-text        =       %d0-9 / %d11-12 /       ; %d0-127 except CR and
                       %d14-127                ;  LF

obs-unstruct    =       *((*LF *CR *(obs-utext *LF *CR)) / FWS)

obs-utext       =       %d0-8 / %d11-12 /       ; %d0-127 except CR, LF,
                       %d14-31 / %d33-127      ; and white space

The syntax given for these obs-constructs includes also the syntax for
their regular counterparts, which makes it very hard work to discover
exactly where the difference lies because of the huge redundancy that is
introduced. For example, if you had written

   obs-qp        =       "\" %d0

nothing would have changed, but it would be immediately obvious what the
difference was. Most of the obs-syntax could be re-written in such a
non-redundant manner, though I grant you there are a few cases which would
be difficult.


The argument about fixing really obscure issues once again applies.

4.3.  Obsolete Date and Time

  The syntax for the obsolete date format allows a 2 digit year in the
  date field and allows for a list of alphabetic time zone
  specifications that were used in earlier versions of this
  specification.  It also permits comments and folding white space
  between many of the tokens.

 obs-day-of-week =       [CFWS] day-name [CFWS]

 obs-year        =       [CFWS] 2*DIGIT [CFWS]

 obs-day         =       [CFWS] 1*2DIGIT [CFWS]

 obs-hour        =       [CFWS] 2DIGIT [CFWS]

 obs-minute      =       [CFWS] 2DIGIT [CFWS]

 obs-second      =       [CFWS] 2DIGIT [CFWS]

 obs-zone        =       (( "+" / "-" ) [CFWS] 4DIGIT) /

                           ..............

This lot was particularly difficult to spot the differences. I believe
that in <obs-hour> the first [CFWS] is unnecessary,


Yes, it appears to be because the trailing [CFWS] in obs-year takes
care of it. I'm not sure that it is worth eliminating, however - it
is entirely possible I've missed something.

and the [CFWS] at the
end of <obs-second> might be better moved to the start of <obs-zone>,
since that reflects better the way FWS is treated in the regular syntax.


Again, seems reasonable, but a bit late in the game to make such changes.

4.4.  Obsolete Addressing

  There are three primary differences in addressing.  First, mailbox
  addresses were allowed to have a route portion before the addr-spec
  when enclosed in "<" and ">".  The route is simply a comma-separated
  list of domain names, each preceded by "@", and the list terminated
  by a colon.  Second, CFWS were allowed between the period-separated
  elements of local-part and domain (i.e., dot-atom was not used).  In
  addition, local-part is allowed to contain quoted-string in addition

                         ^^                                  ^^^^^^^^^^^
                        was

  to just atom.  Finally, ....

    ^^^^^^^^^^^^
    in lieu of any of those period-separated atoms


Agreed.

5.  Security Considerations

  Care needs to be taken when displaying messages on a terminal or
  terminal emulator.  Powerful terminals may act on escape sequences
  and other combinations of ASCII control characters with a variety of
  consequences.  They can remap the keyboard or permit other
  modifications to the terminal which could lead to denial of service
  or even damaged data.  They can trigger (sometimes programmable)
  answerback messages which can allow a message to cause commands to be
  issued on the recipient's behalf.  They can also affect the operation
  of terminal attached devices such as printers.  Message viewers may
  wish to strip potentially dangerous terminal escape sequences from
  the message prior to display.  However, other escape sequences appear
  in messages for useful purposes (cf. [ISO.2022.1994], [RFC2045],
  [RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]) and therefore
  should not be stripped indiscriminately.

Eh? what "other escape sequences" do you have in mind? I am not aware of
any meaningful use of NO-WS-CTLs in any of those standards. OTOH, if that
list is correct, should RFC 2231 be added to it?


You appear to have NO-WS-CTLs on the brain. This section is pretty clearly
focused on control characters in body text, not headers. And since MIME allows
the use of alternate charsets, some of which make extensive use of ISO 2022
escape sequences, the point about not stripping them indecriminantly seems
especially well taken. Indeed, if memory serves, it is not all that uncommon to
find Asian text in one of these charsets that's been damaged by control
character removal, so much so fixup utilities exist to try and correct such
damage.

  Many implementations use the "Bcc:" (blind carbon copy) field
  described in section 3.6.3 to facilitate sending messages to
  recipients without revealing the addresses of one or more of the
  addressees to the other recipients.  Mishandling this use of "Bcc:"
  may disclose confidential information which could eventually lead to
  security problems through knowledge of even the existence of a
  particular mail address.  For example, if using the first method
  described in section 3.6.3, where the "Bcc:" line is removed from the
  message, blind recipients have no explicit indication that they have
  been sent a blind copy, except insofar as their address does not
  appear in the message header section.  Because of this, one of the
  blind addressees could potentially send a reply to all of the shown
  recipients and accidentally reveal that the message went to the blind
  recipient.  When the second method from section 3.6.3 is used, the
  blind recipient's address appears in the "Bcc:" field of a separate
  copy of the message.  If the "Bcc:" field sent contains all of the
  blind addressees, all of the "Bcc:" recipients will be seen by each
  "Bcc:" recipient.  Even if a separate message is sent to each "Bcc:"
  recipient with only the individual's address, implementations still
  need to be careful to process replies to the message as per section
  3.6.3 so as not to accidentally reveal the blind recipient to other
  recipients.

But how could such revelation be avoided? Surely, if a blind recipient
replies to any message, his identity will be given away by the From header
of the reply? I see nothing in 3.6.3 that helps with this problem.


In the case of reply-to-all, sure. But the case this appears to be addressing
is when the reply is being sent to the originator only.  The originator
presumably knows who he or she bcc'ed so nothing is directly revealed by doing
this. However, if the reply manages to promote a bcc address to a Cc or To
field, a reply to that reply could easily reveal the original blind carbon
action.

More generally, it just seems like a good idea to try and keep bcc and other
reipient addresses separate when possible.

6.  IANA Considerations

  This document has no actions for IANA.

Oh yes it does!

RFC 3864 requires IANA to maintain a registry of header fields within
Email, News and HTTP. Anybody who defines a header field is obliged to
provide a template for use in that registry.

In the case of RFC 2822, Graham Klyne wrote a set of templates in RFC
4021. But these all reference RFC 2822 as the relevant specification
document, and so will become outdated as soon as 2822bis appears. On top
of that, they are all (IMHO) unnecessarily verbose.

So you need to write a full set of revised templates in here. I suggest
you look at draft-ietf-usefor-usefor-12 for how we laid out those
templates for USEFOR (carefullly avoiding such verbosities :-) ).


I'm very much afraid that you are correct about this. I see little choice but
to include such templates in this revision. I'd really rather not if there
was a way out, but I just don't see one.

Appendix A.  Example messages

  Messages are delimited in this section between lines of "----".  The
  "----" lines are not part of the message itself.

That is indeed an excellent notation. The Bad News is that you have
nowhere used it :-( .


Yes, and so this sentence would best be removed.

Appendix A.1.2.  Different types of mailboxes

  This message includes multiple addresses in the destination fields
  and also uses several different forms of addresses.

  From: "Joe Q. Public" <john(_dot_)q(_dot_)public(_at_)example(_dot_)com>
  To: Mary Smith <mary(_at_)x(_dot_)test>, jdoe(_at_)example(_dot_)org, 
Who? <one(_at_)y(_dot_)test>
  Cc: <boss(_at_)nil(_dot_)test>, "Giant; \"Big\" Box" 
<sysservices(_at_)example(_dot_)net>
  Date: Tue, 1 Jul 2003 10:52:37 +0200
  Message-ID: <5678(_dot_)21-Nov-1997(_at_)example(_dot_)com>

  Hi everyone.

  Note that the display names for Joe Q. Public and Giant; "Big" Box
  needed to be enclosed in double-quotes because the former contains
  the period and the latter contains both semicolon and double-quote
  characters (the double-quote characters appearing as quoted-pair
  construct).  ...

    ^^^^^^^^^
    constructs


Agreed.

Appendix A.1.3.  Group addresses

From: Pete <pete(_at_)silly(_dot_)example>
To: A Group:Chris Jones 
<c(_at_)a(_dot_)test>,joe(_at_)where(_dot_)test,John 
<jdoe(_at_)one(_dot_)test>;
Cc: Undisclosed recipients:;
Date: Thu, 13 Feb 1969 23:32:54 -0330
Message-ID: <testabcd(_dot_)1234(_at_)silly(_dot_)example>

Testing.

  In this message, the "To:" field has a single group recipient named A

                                                                       ^
                                                                       "

  Group which contains 3 addresses, and a "Cc:" field with an empty

         ^
         "

  group recipient named Undisclosed recipients.

Wouldn't it be better to show a Bcc: header for the "Undisclosed
recipients" example?


Not necesarily - it depends on what's intended, and the document is silent on
that. A cc: field with "undisclosed recipients: ;" is  intended to convey the
fact that there are other recipients to everyone who gets the message. Putting
this in a bcc field, which often as not is only included in some message copies
but not others, could be intended to convey that informaation to only a subset
of the recipients. Or it could have have gone to everyone. There's no way
to tell.

Appendix A.4.  Messages with trace fields

  As messages are sent through the transport system as described in
  [I-D.klensin-rfc2821bis], trace fields are prepended to the message.
  The following is an example of what those trace fields might look
  like.  Note that there is some folding white space in the first one

                                   ^^^^^^^
                                   folded

  since these lines can be long.


No, this is a term of art, not a description of what's there.

Appendix A.5.  White space, comments, and other oddities

  White space, including folding white space, and comments can be
  inserted between many of the tokens of fields.  Taking the example
  from A.1.3, white space and comments can be inserted into all of the
  fields.

From: Pete(A wonderful \) chap) <pete(his account)@silly.test(his host)>
To:A Group(Some people)
    :Chris Jones <c@(Chris's host.)public.example>,
        joe(_at_)example(_dot_)org,
 John <jdoe(_at_)one(_dot_)test> (my dear friend); (the end of the group)
Cc:(Empty list)(start)Undisclosed recipients  :(nobody(that I know))  ;
Date: Thu,
     13
       Feb
         1969
     23:32
              -0330 (Newfoundland Time)
Message-ID:              <testabcd(_dot_)1234(_at_)silly(_dot_)test>

Testing.

  The above example is aesthetically displeasing, but perfectly legal.

Though legal, you should point out that it contains things that are
deprecated by 3.3 and by 3.4.1, and that agents might well choke on them.


I don't oppose it but OTOH I can't get excited about repeating the warning in
this context.

7.  References

7.2.  Informative References

  [RFC1036]  Horton, M. and R. Adams, "Standard for interchange of
             USENET messages", RFC 1036, December 1987.

RFC 1036 is not actually referenced anywhere in the document. Moreover, if
you DO have a need to reference it (please do :-) ) then you ought to
refer to draft-ietf-usefor-usefor-12 instead.


The RFC Editor will yank any orphaned (Or is it widowed? Who cares?) references
so this really isn't worth worrying about. If we want to do soemthing about it,
let's remove it entirely.

                                Ned