Re: Comments on draft-resnick-2822upd-02.txt


In <p0625011dc2e94f8089a2(_at_)[74(_dot_)134(_dot_)5(_dot_)163]> Pete Resnick 
<presnick(_at_)qualcomm(_dot_)com> writes:

On 8/15/07 at 9:44 PM +0000, Charles Lindsey wrote:

That last sentence seems wrong/confusing, given that the document 
specifies everything in terms of US-ASCII, presumably with the 
intent that US-ASCII should be the normal means of interchange over 
the 'wire' between agents (in the absence of explicit agreement 
otherwise).

You can have a protocol that sends messages over the wire in UCS-2 or 
UCS-4 (or a file format that so stores them). So long as they use 
only code points 1-127, they are legal 2822 messages. (Over the wire 
in US-ASCII might imply septets instead of octets. We certainly don't 
want that.)


Point taken.

 >   There are two limits that this specification places on the number of
 >   characters in a line.  Each line of characters MUST be no more than
 >   998 characters, and SHOULD be no more than 78 characters, excluding
 >   the CRLF.

Can we de-emphasise that SHOULD, and make it clear that this is a 
matter of good practice (in the sense of BCP) rather than a 
normative feature?

It's not just good practice. Some agents screw up the display of long 
lines as to make them unreadable to the user, and that's an 
interoperation problem....


Then such agents are clearly broken, because there are some situations
where long lines cannot be avoided (long URLs, for example - the
recommended method for splitting URLs seems to be seldom implemented, so
people tend not to risk using it).

 >   Each header field should be
 >   treated in its unfolded form for further syntactic and semantic
                                             ^^^^^^^^^

   evaluation.


'Semantic' yes, but why is that 'syntactic' there?

OK, I see what you're asking. You're saying that if you want to 
syntactically see whether something is an address, it may contain 
folding (syntactically), so there's no need to unfold to do 
"syntactic evaluation". I was thinking of, "You can't just randomly 
choose some line in a message and see if it's syntactically a 
legitimate field, because that line might be the result of a fold". 
(*Shrug*) I can't get excited about making a change.


I think John has persuaded me that it should stay.

 >3.2.2.  Quoted characters

We have already noted that no-fold-quote, and no-fold-literal can go.

No-fold-quote is gone in message-id (though still accepted in the 
obs- syntax). I am still not sure what to do about no-fold-literal.

But, as I have pointed out in a separate thread, you would remove a 
severe interoperability problem with Netnews if you removed it from 
<dcontent> as well (allowing just a "\" to appear as a normal 
character).

There are too many implementations that have a dcontent (and qcontent 
and ccontent) parser that will not deal with free "\" in any such 
construct. So the only thing we could do would be to abolish "\" 
completely in dcontent. And this is a path that I think would be 
terrible to start down. So, no, I don't think we can make this change.


I agree that free "\" would not work in qcontent and ccontent.

But if it were allowed in dcontent, then the only thing allowed that is
not allowed currently would be domain-literals of the form [..........\].

Apart from that, it is only the semantic meaning that would change. And
the semantic meaning is of no interest internally within Email. It should
only arise within the semantics attached to whatever new addressing or
routeing protocol might choose to use it, and any special significance of
"\" within such a protocol could be defined by that protocol. Any
such domain-literal seen within Email should be regarded as an opaque
passenger - not to be interpreted or tinkered with, but simply to be
passed to agents implementing the new protocol as and when needed.

But if you do not want to make such a change to dcontent for use in
addr-specs, then just make the change within no-fold-literal where, for
sure, it will never be required to be passed to any other such new
protocol.

 >   within the range -9959 through +9959.

why not "within the range -2359 through +2359"?

I invite you to write up the review of the DRUMS discussion on this, 
provide text, and tell us why we should change it.


It seems that the archive at ftp://cs.utk.edu/pub/drums/mail-archive/ is
no longer accessible. In any case, there is no means to search through an
ftp archive. If is is available in html somewhere, then Google could be used
to find stuff in it.

 >   A liberal syntax
 >   for the domain portion of addr-spec is given here; it is left to
 >   other specifications (e.g., [RFC1034], [RFC1035], [RFC1123],
 >   [I-D.klensin-rfc2821bis]) to give more precise limitations on the

   syntax.


Can we strengthen that by saying that the 'liberal syntax' MUST be 
further restricted to conform to some published specification such 
as the ones you have listed (without precluding further such 
specifications in the future, of course)?

Like Ned, I'm opposed to the MUST, but would this suffice (and get us 
out of having to change the syntax for dcontent for message-id if we 
do a similar thing there)?

"Note: A liberal syntax for the domain portion is given here. 
However, the domain portion of addr-spec contains addressing 
information used in other protocols (e.g., [RFC1034], [RFC1035], 
[RFC1123], [I-D.klensin-rfc2821bis]). It is therefore incumbent upon 
implementations to conform to the syntax of addresses for the context 
in which they are used."


Yes, that would indeed help. But I would still prefer to fix it
syntactically, at least within no-fold-literal.

...some people 'munge' their From: addresses in order to appear 
anonymous, or to confuse address harvesters. Whether that is a 
desirable practice or not is none of our business, but a normative 
interpretation of those words would seem to rule it out.
[...]
 >   In all cases, the "From:" field SHOULD NOT contain any mailbox that

   does not belong to the author(s) of the message.  See also section
   3.6.3 for more information on forming the destination addresses for a

 >   reply.

Wanting to appear anonymous or confuse address harvesters seems 
squarely in the category of "there may exist valid reasons in 
particular circumstances when the particular behavior is acceptable 
or even useful, but the full implications should be understood and 
the case carefully weighed before implementing any behavior described 
with this label." [RFC 2119] So, the normative language seems good as 
well as the potential violation.


It is clearly impossible to stop people from doing this, both in Usenet
and also in highly visible mailing lists. Recognizing this, our main
concern in USEFOR was to discourage people from accidentally using an
address that might actually turn out to belong to someone else, and also
to reduce any possible load on the root DNS servers. So the SHOULD NOT you
quote above is fine for that as far as it goes.

My main concern was the earlier wording in 3.6.2 which can be (and has
been) interpreted as forbidding using a bogus address in From or Sender
(it would be stupid in Reply-To, of course). Whilst the practice should be
discouraged, of course, I think that bit needs to be reworded so make it
clear that it is not laying down a normative requirement.

Even just saying "The "From:" field normally specifies the author(s) of
the message ..." (and similarly for "Sender:") would suffice the clear up
any possible misunderstanding.

 >   "References:" field may be used to identify a "thread" of
                        ^^^^^^
                       is often

Why?


Because it better describes the actual way that field is currently used,
and increases (slightly, but every bit counts) the incentive for people
to create that field correctly.

It would be useful to mention that when the References field gets 
too long it MAY be pruned (the minimum requirement being to retain 
the first and the last two entries - including the one just being 
added). I have known of cases where References fields grew to such a 
length (and MUAs in the followup chain had failed to introduce 
folding, or even removed folding already present) that the 998 limit 
was breached with disastrous consequences.

I am loathe to put in pruning instructions at this point, and without 
such instructions, I don't see what else to say.


The wording currently proposed for USEFOR is:

   If the resulting References header field would, after unfolding,
   exceed 998 characters in length (including its field name but not the
   final CRLF), it MUST be trimmed (and otherwise MAY be trimmed).
   Trimming means removing any number of message identifiers from its
   content, except that the first message identifier and the last two
   MUST NOT be removed.

I am not asking for wording that strong (though you can see that the
USEFOR WG takes the correct usage of References as a matter of highest
importance, because threading is universally regarded as an important and
desirable feature within Usenet). A MAY would be quite strong enough to
show that such pruning is _allowed_.

And in fact, the case I am aware of where a References chain got so long
that software actually choked on it was in a mailing list, and not in
Usenet, although the possibility can easily arise in both.

It would be useful to say here that two msg-ids can always be 
compared for equality by a simple octet-by-octet comparison (but, of 
course, one would first have to ensure that property was true).

I also don't want to put threading instructions into this document, 
which is the path the above starts down.


Sure, that needn't be said, even though it is the only situation where
such comparisons are routinely used in Email (actually, I believe there do
also exist email archives that can be indexed by msg-id).

 >   The "Received:" field contains a
 >   (possibly empty) list of tokens followed by a semicolon and a date-

   time specification.  Each token must be a word, angle-addr, addr-

 >   spec, or a domain.

Can be find a better word instead of "token" here? "Token" usually 
means some sort of keyword (e.g. as used in the MIME standards).

I kinda like "token". "Lexeme" seems too syntactic. "Item" seems too generic.


But Ned likes "item". Anybody else want to play? There are probably other
words that people might like to suggest.

 >3.6.8.  Optional fields

 >   Fields may appear in messages that are otherwise unspecified in this
 >   document.  They MUST conform to the syntax of an optional-field.

   This is a field name, made up of the printable US-ASCII characters

 >   except SP and colon, followed by a colon, followed by any text which
 >   conforms to unstructured.

This is misleading, because it has to cover all new header fields 
introduced by extensions and these will be, in general, structured.

That's not what that says. It says that it will conform to unstructured syntax.


Indeed, but most of the headers that have been defined (e.g. all of
Content-* from MIME) are structured, and software that tries to treat them
as "unstructured" (as 2822 claims them to be) can easily do the Wrong
Thing. So better to remove any possible excuse for doing that.

 >      Note: The "period" (or "full stop") character (".") in obs-phrase

But this is not an "obsolete" construct. We discussed this around 12 
months ago, and the consensus then was that it ought to be renamed 
as an <extended-phrase>, and moved out of the Obsolete Syntax.

There was no such consensus; you were the only one who ever suggested 
it on this list. And I still see no reason to change it (as I stated 
back then).


OK, I have reviewed the discussion, and there was less of it than I seemed
to remember. I suggested it. Bruce Lilly agreed. You disagreed.

But I still think that renaming it as "extended-phrase" would better
reflect the realities of the situation.

 >4.3.  Obsolete Date and Time

This lot was particularly difficult to spot the differences.

I'm not sure I understand why. I'd prefer to leave it as is. There 
have been enough bugs in this section already that occurred by trying 
to over-simplify the syntax.

 >   addition, local-part is allowed to contain quoted-string in addition
                         ^^                                  ^^^^^^^^^^^
                        was

"Is" allowed in this syntax.

OK

 >   to just atom.  Finally, ....
    ^^^^^^^^^^^^
    in lieu of any of those period-separated atoms

That's incorrect. You can mix atoms and quoted-strings.


Which is what I said. In the regular syntax you can have a list of
period-separated atoms (by virtue of the newly-invented dot-atom). So I
said that, in the obs-syntax, any of those atoms can, instead, be a
quoted-string.

But it is not a huge deal.

 >   Messages are delimited in this section between lines of "----".  The

   "----" lines are not part of the message itself.


That is indeed an excellent notation. The Bad News is that you have 
nowhere used it :-( .

I'll try figure out how to do something useful in xml2rfc.


Please do.

Wouldn't it be better to show a Bcc: header for the "Undisclosed 
recipients" example?

I don't understand what you mean.


Just more typical of the way the group syntax is used in practice. I have
often seen that "Undisclosed recipient" in a (deliberately emaciated)
Bcc:, but I have never seen it in a Cc:. Surely someone who wanted to do
that wold be using a Bcc: in the first place.

Again, not a huge deal.

 >   The above example is aesthetically displeasing, but perfectly legal.

Though legal, you should point out that it contains things that are 
deprecated by 3.3 and by 3.4.1

Nope. A.5 does not (or shouldn't unless I missed something) contain 
anything not perfectly permissible in 3.3 and 3.4.1.


3.3 and 3.4.1 permit them, but also discourage/deprecate them.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 Clerewood Ave, 
CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5