I am writing code to parse incoming SMTP bytes, and I have encountered
two ways in which RFC 2821 seems ambiguous regarding handling of the
end of mail data indicator. I hope that this list is the appropriate
place for me to make this report.
The first ambiguity concerns this question: When I receive this
sequence CRLF.CRLF, should the data which I capture include the first
CRLF or not?
strictly speaking, no. though you'll find many MTAs that do.
The second ambiguity relates to the first, and it concerns the
question: When I receive a period alone in the first line of mail data
should I consider that the end of the data?
you mean, you receive the sequence
44 41 54 41 0D 0A 2E 0D 0A
no it's not the end of mail data, since the CR LF after DATA
cannot be part of the end-of-data marker. OTOH it's also an SMTP
protocol error, since if the sender really were trying to transmit
a first line consisting of a single '.' then it would be expected
to double it. given that it wouldn't be a valid message anyway,
I wouldn't try too hard to handle this case - just make sure your
MTA doesn't crash or introduce a security hole when it happens.
if your MTA waits for the rest of the message while the sending
MTA waits for your response to the end of the message and one of
you eventually times out, and you discard the message, the right
thing happens anyway - it just takes longer. but since it shouldn't
occur very often (like maybe ever) then there's no point in optimizing