I am writing code to parse incoming SMTP bytes, and I have encountered
two ways in which RFC 2821 seems ambiguous regarding handling of the
end of mail data indicator. I hope that this list is the appropriate
place for me to make this report.
The first ambiguity concerns this question: When I receive this
sequence CRLF.CRLF, should the data which I capture include the first
CRLF or not?
I believe that the first CRLF should be part of the data, according to
my reading of RFC 2821, sections 2.3.7, 3.3, and 184.108.40.206. That CRLF
is part of the last line of data, since, as I read 2.3.7 you don't
have a "line" of data unless it is terminated by CRLF.
But the RFC leaves room for confusion. This phrase from section 3.3
might be interpreted both ways, "...the SMTP server ... considers all
succeeding lines up to but not including the end of mail data
indicator to be the message text." What is "the end of mail data
indicator"? Section 220.127.116.11 calls the whole CRLF.CRLF the "end of
mail data indication".
I noticed this ambiguity because the MTA with which I am familiar,
james.apache.org, uses a parsing routine which presently does not
return the first CRLF as part of the mail data. In conformity with
one of the ways that the RFC can be understood, it checks for the
whole end of mail data indicator (CRLF.CRLF) and sends EOF without
returning any of that indicator including the first CRLF.
The second ambiguity relates to the first, and it concerns the
question: When I receive a period alone in the first line of mail data
should I consider that the end of the data?
I believe that I should consider the sequence .CRLF to be the end of
the data if those are the first three bytes received by the data
parsing routine. But again the RFC seems unclear to me. The parsing
routine presently used by the MTA with which I am familiar does not
treat a period alone in the first line as the end of data; it waits
for the sequence CRLF.CRLF which could only be in the second line or