ietf-smtp
[Top] [All Lists]

Re: sequential processing (was Re: Abort data transfer?)

2009-11-18 17:13:12

Dave CROCKER wrote:
Murray S. Kucherawy wrote:
I don't understand how running the filters in series requires that the entire message be in a buffer.

MTA buffers the entire message, sends it to filter #1. Filter #1 changes the body. MTA sends the modified message to #2, including the new body. This
can only happen if they're in series, and I can't see how it would be
possible if there's not a buffer involved.

Here's an even better example: MTA buffers the entire message, sends it to
filter #1.  Filter #1 orders the message to be rejected (or discarded).
Filter #2 is told "nevermind", and never has to go through the processing of the body. For a very large message, this can be a big performance win, and
again can only happen if they're in series.

Strictly speaking either fully-buffered or partial buffering allows processes to be staged in sequence just fine. The only issue is whether the filters are staged in sequence with early filters feeding later ones.

What full buffering does is to allow the current filter to change an earlier part of the message, based on a later part. You can't do that in a partial buffering (hot potato) model whether the processing is done only on the current
chunk and is then passed back.

Simple example would be wanting to a header field to the message, based on
something in the body.

Seems like we have two issues: Series/Parallel and Full/Partial Buffering. Running filters in series is clearly the better strategy when the last filter is very time consuming, and earlier filters can provide a quick reject.

The need for full buffering is a separate issue. It depends on the filter process, of course, but the process Murray originally described (scanning the headers) could be done in a purely serial fashion, thus not requiring a buffer. A statistical filter, on the other hand, requires the entire body in a buffer, so it can go back and do further scanning based on, for example, the number of times the word Viagra appears in the entire body.

The particular filter I had in mind when I started this thread was the simple count of body blocks that I run after each block is received. This process is purely serial. When the count exceeds some limit, the rest of the body is rejected. Until Murray pointed out how Sendmail actually works, I was assuming this would protect my receiver from a broken or abusive transmitter.

I see no reason the buffer couldn't be just ahead of the eom (end-of-message) processing. This would allow the early filters to reject before transferring all the data, while preserving the ability of filters called from eom to work on the entire message at once.

   --> Filter#1 --> Filter#2 --> Buffer --> Filter#3 --> MDA

It seems like Sendmail's filter API (https://www.milter.org/developers/api/index) is trying to accomplish this, but Sendmail itself is not taking full advantage of what the API allows. There are "callbacks" from Sendmail to allow running filters after the DATA command, after each header, at the end of all headers, after each body block, and at the end of all blocks (eom). Certain commands (message modification) cannot be done except at eom. Others, like sending a reject after too many blocks, *appear* to work before eom, but actually do wait for eom.

The one filter I have which *does* require full buffering is SpamAssassin. I use my own buffer for that purpose, and pass it to SpamAssassin at eom. Note: A filter which adds a header does *not* require a full buffer. In an early filter, for example, I gather all the data needed to construct an authentication header. Later, the authentication header is inserted using one of the commands that run at eom.

It bothers me that Sendmail does all this extra processing on messages that should get an early reject. I haven't noticed the problem, because I don't have a large mail load, but there are plans to use this setup on a larger mail system, and now I worry about efficiency. I like Sendmail's ability to let me control everything, but if that is only an illusion, I may need to look for a different MTA program.

************************************************************     *
* David MacQuigg, PhD    email: macquigg at ece.arizona.edu   *  *
* Research Associate                phone: USA 520-721-4583   *  *  *
* ECE Department, University of Arizona                       *  *  *
*                                 9320 East Mikelyn Lane       * * *
* http://purl.net/macquigg        Tucson, Arizona 85710          *
************************************************************     *

<Prev in Thread] Current Thread [Next in Thread>