I just got through reading Ned's response. Glad I don't have to
polish my own responses, as his are almost entirely spot-on. To
things that Ned didn't address (or to which I have additions):
On 8/15/07 at 9:44 PM +0000, Charles Lindsey wrote:
> messages. This specification is a revision Request For Comments
^
of
Got it.
That last sentence seems wrong/confusing, given that the document
specifies everything in terms of US-ASCII, presumably with the
intent that US-ASCII should be the normal means of interchange over
the 'wire' between agents (in the absence of explicit agreement
otherwise).
You can have a protocol that sends messages over the wire in UCS-2 or
UCS-4 (or a file format that so stores them). So long as they use
only code points 1-127, they are legal 2822 messages. (Over the wire
in US-ASCII might imply septets instead of octets. We certainly don't
want that.)
> There are two limits that this specification places on the number of
> characters in a line. Each line of characters MUST be no more than
> 998 characters, and SHOULD be no more than 78 characters, excluding
> the CRLF.
Can we de-emphasise that SHOULD, and make it clear that this is a
matter of good practice (in the sense of BCP) rather than a
normative feature?
It's not just good practice. Some agents screw up the display of long
lines as to make them unreadable to the user, and that's an
interoperation problem. I believe some old ones actually choked on
long lines (see below).
Where did that '78' come from? I am aware of lots of systems that do
horrid things such as you mention if there are 80 characters in a
line, but I am aware of none where problems arise with exactly 79.
In other fora where I have seen this discussed, the consensus was
that exceeding '79' was the signal for troubles to start.
My memory (and you may wish to search through the DRUMS archive; I'm
not so motivated at the moment) was that there were some old systems
that had fixed 80 character records which had room for 78 plus the CR
plus the LF. 78 was considered the safest.
> Each header field should be
> treated in its unfolded form for further syntactic and semantic
^^^^^^^^^
evaluation.
'Semantic' yes, but why is that 'syntactic' there?
OK, I see what you're asking. You're saying that if you want to
syntactically see whether something is an address, it may contain
folding (syntactically), so there's no need to unfold to do
"syntactic evaluation". I was thinking of, "You can't just randomly
choose some line in a message and see if it's syntactically a
legitimate field, because that line might be the result of a fold".
(*Shrug*) I can't get excited about making a change.
>3.2.2. Quoted characters
We have already noted that no-fold-quote, and no-fold-literal can go.
No-fold-quote is gone in message-id (though still accepted in the
obs- syntax). I am still not sure what to do about no-fold-literal.
But, as I have pointed out in a separate thread, you would remove a
severe interoperability problem with Netnews if you removed it from
<dcontent> as well (allowing just a "\" to appear as a normal
character).
There are too many implementations that have a dcontent (and qcontent
and ccontent) parser that will not deal with free "\" in any such
construct. So the only thing we could do would be to abolish "\"
completely in dcontent. And this is a path that I think would be
terrible to start down. So, no, I don't think we can make this change.
> within the range -9959 through +9959.
why not "within the range -2359 through +2359"?
I invite you to write up the review of the DRUMS discussion on this,
provide text, and tell us why we should change it.
> Because the list of mailboxes can be empty, using the group construct
> is also a simple way to communicate to recipients that the message
was sent to one or more named sets of recipients, without actually
providing the individual mailbox address for each of those
recipients.
s/each of/any of/ or s/each of/some of/
Done.
> A liberal syntax
> for the domain portion of addr-spec is given here; it is left to
> other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
> [I-D.klensin-rfc2821bis]) to give more precise limitations on the
syntax.
Can we strengthen that by saying that the 'liberal syntax' MUST be
further restricted to conform to some published specification such
as the ones you have listed (without precluding further such
specifications in the future, of course)?
Like Ned, I'm opposed to the MUST, but would this suffice (and get us
out of having to change the syntax for dcontent for message-id if we
do a similar thing there)?
"Note: A liberal syntax for the domain portion is given here.
However, the domain portion of addr-spec contains addressing
information used in other protocols (e.g., [RFC1034], [RFC1035],
[RFC1123], [I-D.klensin-rfc2821bis]). It is therefore incumbent upon
implementations to conform to the syntax of addresses for the context
in which they are used."
It's relatively strong language, but stops short of a compliance
statement that, as Ned said, could only be satisfied by consulting an
incomplete and open-ended series of other specifications.
There may be other transport mechanisms than I-D.klensin-rfc2821bis.
So it would be better to say "is covered in separate documents such
as [I-D.klensin-rfc2821bis]".
No problem.
Why is Keywords unlimited (in Netnews it is 1)?
I don't know, and I don't know for Comments either. Anyone? Is it
worth changing?
...some people 'munge' their From: addresses in order to appear
anonymous, or to confuse address harvesters. Whether that is a
desirable practice or not is none of our business, but a normative
interpretation of those words would seem to rule it out.
[...]
> In all cases, the "From:" field SHOULD NOT contain any mailbox that
does not belong to the author(s) of the message. See also section
3.6.3 for more information on forming the destination addresses for a
> reply.
Wanting to appear anonymous or confuse address harvesters seems
squarely in the category of "there may exist valid reasons in
particular circumstances when the particular behavior is acceptable
or even useful, but the full implications should be understood and
the case carefully weighed before implementing any behavior described
with this label." [RFC 2119] So, the normative language seems good as
well as the potential violation.
> The destination fields specify the recipients of the message. Each
destination field may have one or more addresses, and each of the
addresses indicate the intended recipients of the message. The only
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
indicates an intended recipient
Got it.
> "References:" field may be used to identify a "thread" of
^^^^^^
is often
Why?
It would be useful to mention that when the References field gets
too long it MAY be pruned (the minimum requirement being to retain
the first and the last two entries - including the one just being
added). I have known of cases where References fields grew to such a
length (and MUAs in the followup chain had failed to introduce
folding, or even removed folding already present) that the 998 limit
was breached with disastrous consequences.
I am loathe to put in pruning instructions at this point, and without
such instructions, I don't see what else to say.
It would be useful to say here that two msg-ids can always be
compared for equality by a simple octet-by-octet comparison (but, of
course, one would first have to ensure that property was true).
I also don't want to put threading instructions into this document,
which is the path the above starts down.
so if, instead of
the string "Re: " (from the Latin "res", in the matter of)
you write
the string "Re: " (an abbreviation of the Latin "in re", meaning "in
the matter of")
all will be correct.
OK.
> The "Received:" field contains a
> (possibly empty) list of tokens followed by a semicolon and a date-
time specification. Each token must be a word, angle-addr, addr-
> spec, or a domain.
Can be find a better word instead of "token" here? "Token" usually
means some sort of keyword (e.g. as used in the MIME standards).
I kinda like "token". "Lexeme" seems too syntactic. "Item" seems too generic.
>3.6.8. Optional fields
> Fields may appear in messages that are otherwise unspecified in this
> document. They MUST conform to the syntax of an optional-field.
This is a field name, made up of the printable US-ASCII characters
> except SP and colon, followed by a colon, followed by any text which
> conforms to unstructured.
This is misleading, because it has to cover all new header fields
introduced by extensions and these will be, in general, structured.
That's not what that says. It says that it will conform to unstructured syntax.
>4. Obsolete Syntax
Earlier versions of this specification allowed for different (usually
more liberal) syntax than is allowed in this version. Also, there
have been syntactic elements used in messages on the Internet whose
interpretation have never been documented. Though some of these
^^^^^^^^^^^^^^ ^^^^
interpretations Eh? I thought none of them
was to be generated.
OK. I'll fix those.
> Note: The "period" (or "full stop") character (".") in obs-phrase
But this is not an "obsolete" construct. We discussed this around 12
months ago, and the consensus then was that it ought to be renamed
as an <extended-phrase>, and moved out of the Obsolete Syntax.
There was no such consensus; you were the only one who ever suggested
it on this list. And I still see no reason to change it (as I stated
back then).
The syntax given for these obs-constructs includes also the syntax
for their regular counterparts, which makes it very hard work to
discover exactly where the difference lies because of the huge
redundancy that is introduced. For example, if you had written
obs-qp = "\" %d0
nothing would have changed, but it would be immediately obvious what
the difference was.
I will try to fix some of these. Certainly obs-qp is easy. But only
the obvious ones.
>4.3. Obsolete Date and Time
This lot was particularly difficult to spot the differences.
I'm not sure I understand why. I'd prefer to leave it as is. There
have been enough bugs in this section already that occurred by trying
to over-simplify the syntax.
> addition, local-part is allowed to contain quoted-string in addition
^^ ^^^^^^^^^^^
was
"Is" allowed in this syntax.
> to just atom. Finally, ....
^^^^^^^^^^^^
in lieu of any of those period-separated atoms
That's incorrect. You can mix atoms and quoted-strings.
>6. IANA Considerations
This document has no actions for IANA.
Oh yes it does!
Oy. Let me see what I can do about that.
> Messages are delimited in this section between lines of "----". The
"----" lines are not part of the message itself.
That is indeed an excellent notation. The Bad News is that you have
nowhere used it :-( .
I'll try figure out how to do something useful in xml2rfc.
> characters (the double-quote characters appearing as quoted-pair
construct). ...
^^^^^^^^^
constructs
Got it.
> In this message, the "To:" field has a single group recipient named A
^
"
Group which contains 3 addresses, and a "Cc:" field with an empty
^
"
Yup.
Wouldn't it be better to show a Bcc: header for the "Undisclosed
recipients" example?
I don't understand what you mean.
> The above example is aesthetically displeasing, but perfectly legal.
Though legal, you should point out that it contains things that are
deprecated by 3.3 and by 3.4.1
Nope. A.5 does not (or shouldn't unless I missed something) contain
anything not perfectly permissible in 3.3 and 3.4.1.
RFC 1036 is not actually referenced anywhere in the document.
Removed.
pr
--
Pete Resnick <http://www.qualcomm.com/~presnick/>
QUALCOMM Incorporated - Direct phone: (858)651-4478, Fax: (858)651-1102