Re: Comments on draft-resnick-2822upd-02.txt


In <01MK8ESGFC4M005BGY(_at_)mauve(_dot_)mrochek(_dot_)com> 
ned+ietf-822(_at_)mrochek(_dot_)com writes:

1.  Introduction

1.1.  Scope

  This document specifies a syntax only for text messages.  In
  particular, it makes no provision for the transmission of images,
  audio, or other sorts of structured data in electronic mail messages.
  There are several extensions published, such as the MIME document
  series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
  for the transmission of such data through electronic mail,........

No mention of RFC 2047, or of RFC 2231?

RFC 2047 specifies a means to include non-ASCII text in email headers and has
essentially nothing to do with the transmission of structured data through
email. So why should it be mentioned in this context?


OK, if this paragraph is intended for Content-Type stuff in bodies. And
there is a mention of RFC 2047 later on in a more relevant context (but
then RFC 2231 should problem get a mention at that place).

1.2.  Notational conventions

1.2.3.  Structure of this document

Can we be clear about the _intent_ of this obs-syntax?

Is the intent to be able to read/display/print ancient messages which
people still have on file? In which case, please can we say that there is
no longer any expectation that obs messages can still be transmitted and
delivered (by RFC2821 or otherwise), and hence only MUAs (but not MTAs)
are REQUIRED to accept them.

Or, alternatively, is the intent that some ancient software still
generates messages using the obs-syntax, and hence MTAs MUST still accept
them? In which case, for how much longer?

I see little if any justification for additional elaboration of the intent
here. As far as I'm oncerned the intent covers both your "alternatives' and
quite a few other things as well.

More generally, the problem with trying to nail down intent is that once you do
so the ability of the construct to meet other, as-yet-unplanned needs may be
compromised. See below for an example of another possible use of the obs-
syntax - helping with interop requirements - should we make our own lives
harder down the road simply because this document didn't say that one
intent of this was to make feature enumeration easier?


It's a question of the remote possibility that some future benefit of
keeping the obs-syntax in place will appear; versus the certainty that
implementors will for ever be REQUIRED to accept some things that are
notoriously difficult to parse (such as bare CR or LF or NULL) and which
will surely never be encountered. Keeping these things off the wire and
with no requirement for agents that take things off the wire to continue
to accept them would be a good start.

Being cautious in what changes you introduce at Draft Standard stage is a
fine thing, but it has to be set against the fact that this is your last
chance to remove things which have outlived their usefulness and would not
be missed. Beyond Draft Standard, the concrete in which these things are
set becomes so hard that it can NEVER be broken.

2.  Lexical Analysis of Messages

2.1.1.  Line Length Limits

  There are two limits that this specification places on the number of
  characters in a line.  Each line of characters MUST be no more than
  998 characters, and SHOULD be no more than 78 characters, excluding
  the CRLF.

Can we de-emphasise that SHOULD, and make it clear that this is a matter
of good practice (in the sense of BCP) rather than a normative feature?


...............

Besides, I think a SHOULD is actually appropriate here. SHOULD means you should
do it unless you have a really good reason not to.

Perhaps s/SHOULD/should/? Too many agents have used this as an excuse to
rewrite lines en route (maybe there should be a SHOULD NOT for that).

If so, they are relying on a flagrant misreading of the document. Attempting to
prevent such exercises is itself and exercise in futility - the best you can
ever do is point out that the claim isn't supported by the actual language.


But people DO flagrantly misread documents, and have done so in this case.
So that SHOULD has caused actual harm. RFC 2119 itself admits that the
interpretation of MUST and SHOULD might well be different in BCP and other
informational documents, but this document is clearly intended to be
normative, except where it chooses to make it clear that it is only giving
advice.

As for having a SHOULD NOT about agents altering messages in transit, IMO the
place for that - assuming it makes sense to do - would be the SMTP
specification, not here.


That would be a good thing to say in 2821bis. But it should apply equally
to ANY transport mechanism - old or new.

  The more conservative 78 character recommendation is to accommodate
  the many implementations of user interfaces that display these
  messages which may truncate, or disastrously wrap, the display of
  more than 78 characters per line, in spite of the fact that such
  implementations are non-conformant to the intent of this
  specification (and that of [I-D.klensin-rfc2821bis] if they actually

Where did that '78' come from? I am aware of lots of systems that do
horrid things such as you mention if there are 80 characters in a line,
but I am aware of none where problems arise with exactly 79. In other fora
where I have seen this discussed, the consensus was that exceeding '79'
was the signal for troubles to start.

I've always felt that the 78 character limit was one byte lower than it really
needed to be. But I a long way from convinced that now is the time to change
this.


It's your last chance :-( . And existing systems that use 78 would still
be compliant, or at least as compliant as they were before.

2.2.3.  Long Header Fields

  ......  Each header field should be
  treated in its unfolded form for further syntactic and semantic

                                             ^^^^^^^^^

  evaluation.

'Semantic' yes, but why is that 'syntactic' there?

Don't you have to parse things like address fields in order to then perform
semantic analysis? .........


Ah! you mean that you have

   To: <a-very-long-name-such-as-frederickickickick
        @example.com>

(which is ugly but allowed). So you have to unfold before you can
recognize that you have an <addr-spec>. Point taken.

3.2.2.  Quoted characters


     Note: The "\" character may appear in a message where it is not
     part of a quoted-pair.  A "\" character that does not appear in a
     quoted-pair is not semantically invisible.  The only places in
     this specification where quoted-pair currently appears are
     ccontent, qcontent, dcontent, no-fold-quote, and no-fold-literal.

.... But,
as I have pointed out in a separate thread, you would remove a severe
interoperability problem with Netnews if you removed it from <dcontent> as
well (allowing just a "\" to appear as a normal character).

As I believe I stated in an earler response, I am opposed to removing it. It is
simply not possible to know everything that's out there and just because we
don't know about something is no excuse to break it.  I could live with moving
it to the obsolete syntax but that's as far as I'll go.


Indeed, moving it to obs-syntax is all I am asking for (though allowing
"\" in <dtext> might be tricky - I shall respond to Pete's remarks on
that).

3.2.3.  Folding white space and comments

Do you _really_ want to permit NO-WS-CTL in a <comment>?

RFC 2822 did, so the question becomes one of do we want to change
this away from what 2822 said?

Like it or not, control characters have long been allowed in a lot of places
where they really don't belong. I remain to be convinced that this one
narrow case is worth worrying about.


It (and other similar cases) is a good candidate for the obs-syntax, then.
There may be a few cases where they may be meaningful in the protocol (so
it is up to 2821bis to make the final pronouncement), but this is not one
of them. We had a purge on them in USEFOR, notably in Message-ID where
characters that are not visible on the screen could provide a golden
opportunity for all sorts of scams.

<phrase>s, <unstructured>s and <comment>s are the places where RFC 2047
raises its ugly head. It is the most confusingly written RFC I have
encountered (and it could be considered as separate from the rest of
the MIME standards, since it can be used without the MIME-Version header).

For a truly outrageous suggestion, we might incorporate the whole of RFC
2047 into here, cleaning it up in the process. No, that is too much to
propose at this juncture, but there are a couple of lesser things we might
do to help:

1. Include <encoded-word> in the syntax at all the proper places (which
might at least encourage inventors of new extension headers to follow
suit). It would need a convincing explanation, of course.

I am strongly opopsed to this. If RFC 2047 is confusing, the time to argue that
is when it is revised. We cannot fix it's problems (assuming there actually are
any to fix) by incorporating some subset of references to it in another
document.


Yes, I didn't expect that one to fly ;-( .

2. And if that is a step too far, we could still point out that sequences
of the form "=? ... ? ... ? ... ?=" have a special significance within RFC
2047 (whether they exceed that 76 character limit or not), and that such
sequences SHOULD NOT be used within <phrase>s, <unstructured>s and
<comment>s unless that special significance is intended.

An informational reference to RFC 2047 would be OK with me.


"Within 'comment's, 'phrase's and 'unstructured's, sequences of the form
"=? ...  ? ... ? ... ?=" have a special significance within RFC 2047 for
encoding characters outside the range of US-ASCII. Such sequences SHOULD
NOT therefore be used unless that special significance is intended."

3.2.6 might be a possible home for such a remark. Possibly as a Note.

3.3.  Date and Time Specification

why not "within the range -2359 through +2359"?

I have no objection to restricting the range, but whatever we do needs to agree
with other specifications that deal in time zones. RFC 3339 appears to allow
-2459 through +2459.


That would be fine (apparently funny things can happen aroung the Date
Line). Though I didn't actually find anything about that in RFC 3339.

3.4.1.  Addr-spec specification

  .......  A liberal syntax
  for the domain portion of addr-spec is given here; it is left to
  other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
  [I-D.klensin-rfc2821bis]) to give more precise limitations on the
  syntax.

Can we strengthen that by saying that the 'liberal syntax' MUST be further
restricted to conform to some published specification such as the ones you
have listed (without precluding further such specifications in the future,
of course)?

No, because that would usurp the perogative of other specifications to
specify what conformances criteria apply to their additional restrictions.

I have already pointed out, in a separate thread, the severe
interoperability problems with Netnews of this definition of <dcontent>
(at least insofar as its use within <msg-id> is concerned).....

And IMO you failed to achieve sufficient support to result in a specification
change. As I said previously, I can live with making the use of quoted-pairs in
dtext part of the obsolete syntax, but that's as far as I can see us going.


Yes, that would be the best way IMHO.

The alternative approach I actually favor is one I have previously described:
Add some text that says that domain literals in message ids should be generated
using the most restrictive syntax and with well-defined semantics, i.e. an IPv4
or IPv6 literal. To mind the bigger problem here is that someone will generate
something like [foobar] here instead of putting in an actual global IPv4 or
IPv6 address. If we encourage people to use domain literals with defined
semantics we solve several problems at one go.


Sure, but that was the intent of my suggested "'liberal syntax' MUST be further
restricted to conform to some published specification" which you did not
like. But I see that Pete has suggested a possible text which achieves
much the same effect.

3.6.  Field definitions

I have already pointed out, in a separate thread, the severe
interoperability problem that arises with Netnews if you do not require a
SP after the colon. Since every MUA I am aware of routinely inserts that
SP, I cannot see that anything would be lost by requiring it here.

And as I commented previously, not everything that has an submissiion client in
it is an MUA. There are quick and dirty submission clients embedded in all
sorts of places - one of the advantages of SMTP is that you can code a quick
and dirty client very easily - and leaving out every possible character is
exactly the sort of things these gizmos do. Heck, they  even do it when they've
actually got plenty of space to space - unnecessary optimization is RAMPANT in
the embedded systems world.


But surely anybody writing a script to do a "dirty submission" is going to
write something like:

   sprintf(buffer, "Foo-Header: %s, %d, ...\n", stuff1, stuff2)

either in C, or in one of the many scripting languages (e.g. Perl) which
supports constructs like that, because that is the easiest way to generate
such things. And they will tend to put that SP in simply because that is
how they always expect to see headers. It would take a conscious
decision to be "different" for them to do otherwise.

3.6.2.  Originator fields

  The originator fields indicate the mailbox(es) of the source of the
  message.  The "From:" field specifies the author(s) of the message,
  that is, the mailbox(es) of the person(s) or system(s) responsible
  for the writing of the message....

Are those sentences intended to be normative, BCP (or even deliberately
vague :-) ).

Don't see any capitalized words there, do you? So I guess there are
no compliance implications.

For example, some people 'munge' their From: addresses in order to appear
anonymous, or to confuse address harvesters. ...

The wording currently proposed by the USEFOR WG for this is:

   Contrary to [RFC2822], which implies that the mailbox or mailboxes in
   the From header field should be that of the poster or posters, a
   poster who does not, for whatever reason, wish to use his own mailbox
   MAY use any mailbox ending in the top level domain ".invalid"
   [RFC2606].

But if RFC2822 does not actually imply that, then we might have to think
again.

And IMO it should not imply that. The last thing email needs at this point is
more license to use invalid addresses.


No, you misread what I said. "If RFC does not actually imply that
munged_addresses_etc_are_disallowed" (and you seem to be saying that it
does not), then the USEFOR WG needs to review that wording which says
"Contrary to [RFC2822]...", because it would not be contrary.

But, since people seem to be reading if both ways, it needs to be
clarified.

3.6.4.  Identification fields

  The "Message-ID:" field contains a single unique message identifier.
  The "References:" and "In-Reply-To:" field each contain one or more
  unique message identifiers, optionally separated by CFWS.

                                ^^^^^^^^^^

Interoperability with Netnews would be improved without that "optionally".

Perhaps, but like it or not Netnews compatibility is not our primary goal here.


It is a very common practice to gateway message-lists into Netnews (you
can find this list on Usenet if you look for it). And it is also common
for people to both mail and post the same message/article. Therefore,
interoperability with Netnews is an important goal, especially where it
can be achieved with zero or minimal disruption to present practices.

Actually, the CFWS in the References header is one of the less urgent
problems. A "SHOULD include" it would be strong enough, given that USEFOR
felt able to say that its absence "SHOULD be accepted", so that eventually
the two will come into line.

  The "References:" field will contain the contents of the parent's
  "References:" field (if any) followed by the contents of the parent's
  "Message-ID:" field (if any). ...

It would be useful to mention that when the References field gets too long
it MAY be pruned (the minimum requirement being to retain the first and
the last two entries - including the one just being added). I have known
of cases where References fields grew to such a length (and MUAs in the
followup chain had failed to introduce folding, or even removed folding
already present) that the 998 limit was breached with disastrous
consequences.

Adding such a suggestion would be fine with me were it not for the context of
this effort - every such change increases the likelihood of a problem getting
to draft.


All it needs is a MAY. Given that lack of it has actually caused breakage
in the past, and certainly does no harm, it would seem wise to allow it.
USEFOR intends to allow (and even to encourage) it.

3.6.5.  Informational fields

  ....  When used in a reply, the field body MAY start with the
  string "Re: " (from the Latin "res", in the matter of) followed by
  the contents of the "Subject:" field body of the original message.

If we are going to discuss Latin Grammar, then please let us to so
correctly. "Res" is the nominative form of the fifth declension noun
meaning "thing", "matter", "issue", etc.  "Re" is an abbreviation of the
phrase "in re" meaning "in the matter of", and in which "re" is the
ablative form of the same noun (the preposition "in" is always followed by
an ablative in static cases such as this, though it takes the accusative
form - e.g. "in rem" - in dynamic cases where the meaning is "into").

so if, instead of
    the string "Re: " (from the Latin "res", in the matter of)
you write
    the string "Re: " (an abbreviation of the Latin "in re", meaning "in
    the matter of")
all will be correct.

Yep, now that I think about it you're correct. This is a reasonable change.
Alternately, the  whole thing about the Latin could be omitted.


I think it is needed because two many people imagine that it is an
abbreviation of "reference", and then try to use an abbreviation of an
equivelent word in their own language. And that definitely causes things
to break.

Appendix A.  Example messages

  Messages are delimited in this section between lines of "----".  The
  "----" lines are not part of the message itself.

That is indeed an excellent notation. The Bad News is that you have
nowhere used it :-( .

Yes, and so this sentence would best be removed.


I would prefer that the feature be used as intended.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 Clerewood Ave, 
CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5