Re: ambiguities in RFC 2821 regarding end of mail data
Keith Moore wrote:
The second ambiguity relates to the first, and it concerns the
question: When I receive a period alone in the first line of mail data
should I consider that the end of the data?
you mean, you receive the sequence
44 41 54 41 0D 0A 2E 0D 0A
Yes, I suppose that would be the sequence which I am asking about.
But the way we do our parsing of data I do not see it all together
that way, since the first OD OA are consumed by a routine which reads
command lines (which are recognized as lines because they end with OD
OA). When DATA is recognized in a command line then the InputStream
is handed to another parsing routine, specialized for reading data,
which starts reading where the command-line parser left off, after the
first OD OA.
no it's not the end of mail data, since the CR LF after DATA
cannot be part of the end-of-data marker. OTOH it's also an SMTP
This exchange has stimulated me to do the homework which I probably
should have done before I started. I have just seen this text in RFC
2821, section 18.104.22.168, second paragraph:
"The mail data is terminated by a line containing only a period, that
is, the character sequence "<CRLF>.<CRLF>" (see section 4.5.2). This
is the end of mail data indication. Note that the first <CRLF> of
this terminating sequence is also the <CRLF> that ends the final line
of the data (message text) or, if there was no data, ends the DATA
I note two things from this. First, "if there was no data, ends the
DATA command itself", allows the possibility that there may be no
data. And it seems to answer my second question: a period alone in
the first line of data ends the data.
Second, turning back now to the first question which I asked, notice
that the first sentence of that paragraph from RFC 2821 contradicts
itself. The character sequence "<CRLF>.<CRLF>" is not "a line
containing only a period". It a CRLF -- put there to show that we
have really reached the end of the prior line -- followed by a line
containing only a period. So this sentence gives us two possible "end
of mail data indication"s. One is "<CRLF>.<CRLF>" and the other is
".<CRLF>" alone in a line. They are not the same. But I think the
later is more consistent with other material in this RFC.