Re: [ietf-smtp] Stray <LF> in the middle of messages

[ Sorry about being late to the party, I read this list sporadically. ]

On Sat, Jun 06, 2020 at 07:06:29PM +0200, Leo Gaspard wrote:

However, I notice that every single time I have tried to use `netcat` to
send emails for demo purposes, it succeeded *without* sending <CRLF> and
by sending only <LF>. While `telnet` does appear to convert typed <LF>
into <CRLF>, it looks like (my version of) `netcat` does not. So most of
the SMTP servers I have met with appear to consider <LF> as a valid line
ending.


Postfix always sends <CRLF>, but accepts <CR>*<LF> as a line ending.

    /*      smtp_get() reads the named stream up to and including
    /*      the next LF character and strips the trailing CR LF. 

            /*
             * Strip off the record terminator: either CRLF or just bare LF.
             *
             * XXX RFC 2821 disallows sending bare CR everywhere. We remove 
bare CR
             * if received before CRLF, and leave it alone otherwise.
             */
        case '\n':
            vstring_truncate(vp, VSTRING_LEN(vp) - 1);
            while (VSTRING_LEN(vp) > 0 && vstring_end(vp)[-1] == '\r')
                vstring_truncate(vp, VSTRING_LEN(vp) - 1);

Perhaps you tested at least some Postfix servers.

However, there is one case where the semantics is important: should one
escape the <LF>. sequence while in a DATA block?


    https://tools.ietf.org/html/rfc5322#section-2.3

       The body of a message is simply lines of US-ASCII characters.  The
       only two limitations on the body are as follows:

       o  CR and LF MUST only occur together as CRLF; they MUST NOT appear
          independently in the body.
       o  Lines of characters in the body MUST be limited to 998 characters,
          and SHOULD be limited to 78 characters, excluding the CRLF.

          Note: As was stated earlier, there are other documents,
          specifically the MIME documents ([RFC2045], [RFC2046], [RFC2049],
          [RFC4288], [RFC4289]), that extend (and limit) this specification
          to allow for different sorts of message bodies.  Again, these
          mechanisms are beyond the scope of this document.

Since bare LF is invalid, you must not send it.  With Postfix that's
automatic, because the bare LF becomes a line-ending on input, so can
never occur in the output.  Otherwise, Postfix would have to reject
messages with bare LF, and it is easier to just accept these, becase
(e.g. "sendmail -bs" can then tolerate newline-terminated input).

I would guess that the fact that other SMTP servers appear to usually
accept <LF>.<LF> as a terminator indicates that <LF>. should be escaped
even though it is not strictly conforming with the RFC, but… I wanted to
have the opinion of other people on this, before diving too deep in the
implementation?


I think by escaped you mean "transparency":

    https://tools.ietf.org/html/rfc5321#section-4.5.2

In which case the answer is simply that you MUST NOT send either
<LF>.<LF> or the dot-stuffed <LF>..<LF>, because you MUST NOT send a
bare LF in the first place.

Should I understand this paragraph as meaning that if I ever receive
such an ill-formed message, I… can? should? must? accept it and… can?
should? must? convert the <LF> into proper <CRLF>?


You can reject the invalid input, or modify it in transit to send
something valid.  Choices are:

    * Convert <LF> to <CRLF>
    * Strip bare <LF> (and perhaps bare <CR>).
    * Apply a MIME quoted-printable or bases64 encoding to the body,
      if not already encoded.
    * If already base64, you are at liberty to strip extraneous
      non-base64 characters without changing the payload.
    * If already quoted-printable, you could in principle decode
      and re-encode, but saner to either strip or accept as EOL.

On Sat, Jun 06, 2020 at 07:36:19PM +0100, Paul Smith wrote:

(To be honest, I'd be tempted to treat a lone LF as a 99.9999% reliable 
indicator of spam. Similarly with a NULL (0x00) character in the middle 
of an (RFC5322) message. Legitimate mail will just never have it unless 
it was generated by something very dodgy).


It may be worth noting that 0x00 (ASCII NUL), is a US-ASCII character,
and so is in fact allowed in RFC5322 message bodies, per the above
quoted section 2.3 of RFC5322.  The prohibition on NULs is a feature of
MIME (RFC2045):

    https://tools.ietf.org/html/rfc2045#section-2.7
    https://tools.ietf.org/html/rfc2045#section-2.8

And so MIME messages are obliged to apply a non-identity transfer
encoding to message bodies that would otherwise contain ASCII NULs.

On Mon, Jun 08, 2020 at 06:22:47PM +0200, Alessandro Vesely wrote:

CRLF is required for IMAP and POP too.  The inconvenience is having to compute
the length of the message, in octets.  The FS can only tell how many octets 
the
native format takes.  One has to add the number of lines.  Storing that info 
in
the file name is a possibility, if you don't have a dedicated FS.


FWIW, Postfix stores each line (or sufficiently long partial line) of
the mesage as a "record", with each record having a one-byte type and a
variable width length of at least one byte.  As a consequence, the queue
file is approximately the same size as the message with CRLF line
endings and never smaller.  So Postfix just uses the queue file size.

-- 
    Viktor.

_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp