ietf-822
[Top] [All Lists]

Re: Comments on draft-resnick-2822upd-02.txt

2007-08-16 21:27:40

I just got through reading Ned's response. Glad I don't have to polish my own responses, as his are almost entirely spot-on. To things that Ned didn't address (or to which I have additions):

On 8/15/07 at 9:44 PM +0000, Charles Lindsey wrote:

 >   messages.  This specification is a revision Request For Comments
                                               ^
                                               of

Got it.

That last sentence seems wrong/confusing, given that the document specifies everything in terms of US-ASCII, presumably with the intent that US-ASCII should be the normal means of interchange over the 'wire' between agents (in the absence of explicit agreement otherwise).

You can have a protocol that sends messages over the wire in UCS-2 or UCS-4 (or a file format that so stores them). So long as they use only code points 1-127, they are legal 2822 messages. (Over the wire in US-ASCII might imply septets instead of octets. We certainly don't want that.)

 >   There are two limits that this specification places on the number of
 >   characters in a line.  Each line of characters MUST be no more than
 >   998 characters, and SHOULD be no more than 78 characters, excluding
 >   the CRLF.

Can we de-emphasise that SHOULD, and make it clear that this is a matter of good practice (in the sense of BCP) rather than a normative feature?

It's not just good practice. Some agents screw up the display of long lines as to make them unreadable to the user, and that's an interoperation problem. I believe some old ones actually choked on long lines (see below).

Where did that '78' come from? I am aware of lots of systems that do horrid things such as you mention if there are 80 characters in a line, but I am aware of none where problems arise with exactly 79. In other fora where I have seen this discussed, the consensus was that exceeding '79' was the signal for troubles to start.

My memory (and you may wish to search through the DRUMS archive; I'm not so motivated at the moment) was that there were some old systems that had fixed 80 character records which had room for 78 plus the CR plus the LF. 78 was considered the safest.

 >   Each header field should be
 >   treated in its unfolded form for further syntactic and semantic
                                             ^^^^^^^^^
   evaluation.

'Semantic' yes, but why is that 'syntactic' there?

OK, I see what you're asking. You're saying that if you want to syntactically see whether something is an address, it may contain folding (syntactically), so there's no need to unfold to do "syntactic evaluation". I was thinking of, "You can't just randomly choose some line in a message and see if it's syntactically a legitimate field, because that line might be the result of a fold". (*Shrug*) I can't get excited about making a change.

 >3.2.2.  Quoted characters

We have already noted that no-fold-quote, and no-fold-literal can go.

No-fold-quote is gone in message-id (though still accepted in the obs- syntax). I am still not sure what to do about no-fold-literal.

But, as I have pointed out in a separate thread, you would remove a severe interoperability problem with Netnews if you removed it from <dcontent> as well (allowing just a "\" to appear as a normal character).

There are too many implementations that have a dcontent (and qcontent and ccontent) parser that will not deal with free "\" in any such construct. So the only thing we could do would be to abolish "\" completely in dcontent. And this is a path that I think would be terrible to start down. So, no, I don't think we can make this change.

 >   within the range -9959 through +9959.

why not "within the range -2359 through +2359"?

I invite you to write up the review of the DRUMS discussion on this, provide text, and tell us why we should change it.

 >   Because the list of mailboxes can be empty, using the group construct
 >   is also a simple way to communicate to recipients that the message
   was sent to one or more named sets of recipients, without actually
   providing the individual mailbox address for each of those
   recipients.

s/each of/any of/ or s/each of/some of/

Done.

 >   A liberal syntax
 >   for the domain portion of addr-spec is given here; it is left to
 >   other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
 >   [I-D.klensin-rfc2821bis]) to give more precise limitations on the
   syntax.

Can we strengthen that by saying that the 'liberal syntax' MUST be further restricted to conform to some published specification such as the ones you have listed (without precluding further such specifications in the future, of course)?

Like Ned, I'm opposed to the MUST, but would this suffice (and get us out of having to change the syntax for dcontent for message-id if we do a similar thing there)?

"Note: A liberal syntax for the domain portion is given here. However, the domain portion of addr-spec contains addressing information used in other protocols (e.g., [RFC1034], [RFC1035], [RFC1123], [I-D.klensin-rfc2821bis]). It is therefore incumbent upon implementations to conform to the syntax of addresses for the context in which they are used."

It's relatively strong language, but stops short of a compliance statement that, as Ned said, could only be satisfied by consulting an incomplete and open-ended series of other specifications.

There may be other transport mechanisms than I-D.klensin-rfc2821bis. So it would be better to say "is covered in separate documents such as [I-D.klensin-rfc2821bis]".

No problem.

Why is Keywords unlimited (in Netnews it is 1)?

I don't know, and I don't know for Comments either. Anyone? Is it worth changing?

...some people 'munge' their From: addresses in order to appear anonymous, or to confuse address harvesters. Whether that is a desirable practice or not is none of our business, but a normative interpretation of those words would seem to rule it out.
[...]
 >   In all cases, the "From:" field SHOULD NOT contain any mailbox that
   does not belong to the author(s) of the message.  See also section
   3.6.3 for more information on forming the destination addresses for a
 >   reply.

Wanting to appear anonymous or confuse address harvesters seems squarely in the category of "there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label." [RFC 2119] So, the normative language seems good as well as the potential violation.

 >   The destination fields specify the recipients of the message.  Each
   destination field may have one or more addresses, and each of the
   addresses indicate the intended recipients of the message.  The only
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              indicates an intended recipient

Got it.

 >   "References:" field may be used to identify a "thread" of
                        ^^^^^^
                       is often

Why?

It would be useful to mention that when the References field gets too long it MAY be pruned (the minimum requirement being to retain the first and the last two entries - including the one just being added). I have known of cases where References fields grew to such a length (and MUAs in the followup chain had failed to introduce folding, or even removed folding already present) that the 998 limit was breached with disastrous consequences.

I am loathe to put in pruning instructions at this point, and without such instructions, I don't see what else to say.

It would be useful to say here that two msg-ids can always be compared for equality by a simple octet-by-octet comparison (but, of course, one would first have to ensure that property was true).

I also don't want to put threading instructions into this document, which is the path the above starts down.

so if, instead of
    the string "Re: " (from the Latin "res", in the matter of)
you write
    the string "Re: " (an abbreviation of the Latin "in re", meaning "in
    the matter of")
all will be correct.

OK.

 >   The "Received:" field contains a
 >   (possibly empty) list of tokens followed by a semicolon and a date-
   time specification.  Each token must be a word, angle-addr, addr-
 >   spec, or a domain.

Can be find a better word instead of "token" here? "Token" usually means some sort of keyword (e.g. as used in the MIME standards).

I kinda like "token". "Lexeme" seems too syntactic. "Item" seems too generic.

 >3.6.8.  Optional fields

 >   Fields may appear in messages that are otherwise unspecified in this
 >   document.  They MUST conform to the syntax of an optional-field.
   This is a field name, made up of the printable US-ASCII characters
 >   except SP and colon, followed by a colon, followed by any text which
 >   conforms to unstructured.

This is misleading, because it has to cover all new header fields introduced by extensions and these will be, in general, structured.

That's not what that says. It says that it will conform to unstructured syntax.

 >4.  Obsolete Syntax

   Earlier versions of this specification allowed for different (usually
   more liberal) syntax than is allowed in this version.  Also, there
   have been syntactic elements used in messages on the Internet whose
   interpretation have never been documented.  Though some of these
    ^^^^^^^^^^^^^^                                     ^^^^
    interpretations                             Eh? I thought none of them
                                                was to be generated.

OK. I'll fix those.

 >      Note: The "period" (or "full stop") character (".") in obs-phrase

But this is not an "obsolete" construct. We discussed this around 12 months ago, and the consensus then was that it ought to be renamed as an <extended-phrase>, and moved out of the Obsolete Syntax.

There was no such consensus; you were the only one who ever suggested it on this list. And I still see no reason to change it (as I stated back then).

The syntax given for these obs-constructs includes also the syntax for their regular counterparts, which makes it very hard work to discover exactly where the difference lies because of the huge redundancy that is introduced. For example, if you had written

   obs-qp        =       "\" %d0

nothing would have changed, but it would be immediately obvious what the difference was.

I will try to fix some of these. Certainly obs-qp is easy. But only the obvious ones.

 >4.3.  Obsolete Date and Time

This lot was particularly difficult to spot the differences.

I'm not sure I understand why. I'd prefer to leave it as is. There have been enough bugs in this section already that occurred by trying to over-simplify the syntax.

 >   addition, local-part is allowed to contain quoted-string in addition
                         ^^                                  ^^^^^^^^^^^
                        was

"Is" allowed in this syntax.

 >   to just atom.  Finally, ....
    ^^^^^^^^^^^^
    in lieu of any of those period-separated atoms

That's incorrect. You can mix atoms and quoted-strings.

 >6.  IANA Considerations

   This document has no actions for IANA.

Oh yes it does!

Oy. Let me see what I can do about that.

 >   Messages are delimited in this section between lines of "----".  The
   "----" lines are not part of the message itself.

That is indeed an excellent notation. The Bad News is that you have nowhere used it :-( .

I'll try figure out how to do something useful in xml2rfc.

 >   characters (the double-quote characters appearing as quoted-pair
   construct).  ...
    ^^^^^^^^^
    constructs

Got it.

 >   In this message, the "To:" field has a single group recipient named A
                                                                       ^
                                                                       "
   Group which contains 3 addresses, and a "Cc:" field with an empty
         ^
         "

Yup.

Wouldn't it be better to show a Bcc: header for the "Undisclosed recipients" example?

I don't understand what you mean.

 >   The above example is aesthetically displeasing, but perfectly legal.

Though legal, you should point out that it contains things that are deprecated by 3.3 and by 3.4.1

Nope. A.5 does not (or shouldn't unless I missed something) contain anything not perfectly permissible in 3.3 and 3.4.1.

RFC 1036 is not actually referenced anywhere in the document.

Removed.

pr
--
Pete Resnick <http://www.qualcomm.com/~presnick/>
QUALCOMM Incorporated - Direct phone: (858)651-4478, Fax: (858)651-1102