Re: sequential processing (was Re: Abort data transfer?)


Dave CROCKER wrote:

Murray S. Kucherawy wrote:
I don't understand how running the filters in series requires thatthe entire message be in a buffer.
MTA buffers the entire message, sends it to filter #1. Filter #1changes thebody. MTA sends the modified message to #2, including the new body.This
can only happen if they're in series, and I can't see how it would be
possible if there's not a buffer involved.
Here's an even better example: MTA buffers the entire message, sendsit to
filter #1.  Filter #1 orders the message to be rejected (or discarded).
Filter #2 is told "nevermind", and never has to go through theprocessing ofthe body. For a very large message, this can be a big performancewin, and
again can only happen if they're in series.
Strictly speaking either fully-buffered or partial buffering allowsprocesses tobe staged in sequence just fine. The only issue is whether thefilters are staged in sequence with early filters feeding later ones.
What full buffering does is to allow the current filter to change anearlierpart of the message, based on a later part. You can't do that in apartialbuffering (hot potato) model whether the processing is done only onthe current
chunk and is then passed back.
Simple example would be wanting to a header field to the message,based on
something in the body.

Seems like we have two issues: Series/Parallel and Full/PartialBuffering. Running filters in series is clearly the better strategywhen the last filter is very time consuming, and earlier filters canprovide a quick reject.

The need for full buffering is a separate issue. It depends on thefilter process, of course, but the process Murray originally described(scanning the headers) could be done in a purely serial fashion, thusnot requiring a buffer. A statistical filter, on the other hand,requires the entire body in a buffer, so it can go back and do furtherscanning based on, for example, the number of times the word Viagraappears in the entire body.

The particular filter I had in mind when I started this thread was thesimple count of body blocks that I run after each block is received.This process is purely serial. When the count exceeds some limit, therest of the body is rejected. Until Murray pointed out how Sendmailactually works, I was assuming this would protect my receiver from abroken or abusive transmitter.

I see no reason the buffer couldn't be just ahead of the eom(end-of-message) processing. This would allow the early filters toreject before transferring all the data, while preserving the ability offilters called from eom to work on the entire message at once.


   --> Filter#1 --> Filter#2 --> Buffer --> Filter#3 --> MDA

It seems like Sendmail's filter API(https://www.milter.org/developers/api/index) is trying to accomplishthis, but Sendmail itself is not taking full advantage of what the APIallows. There are "callbacks" from Sendmail to allow running filtersafter the DATA command, after each header, at the end of all headers,after each body block, and at the end of all blocks (eom). Certaincommands (message modification) cannot be done except at eom. Others,like sending a reject after too many blocks, *appear* to work beforeeom, but actually do wait for eom.

The one filter I have which *does* require full buffering isSpamAssassin. I use my own buffer for that purpose, and pass it toSpamAssassin at eom. Note: A filter which adds a header does *not*require a full buffer. In an early filter, for example, I gather allthe data needed to construct an authentication header. Later, theauthentication header is inserted using one of the commands that run at eom.

It bothers me that Sendmail does all this extra processing on messagesthat should get an early reject. I haven't noticed the problem, becauseI don't have a large mail load, but there are plans to use this setup ona larger mail system, and now I worry about efficiency. I likeSendmail's ability to let me control everything, but if that is only anillusion, I may need to look for a different MTA program.


************************************************************     *
* David MacQuigg, PhD    email: macquigg at ece.arizona.edu   *  *
* Research Associate                phone: USA 520-721-4583   *  *  *
* ECE Department, University of Arizona                       *  *  *
*                                 9320 East Mikelyn Lane       * * *
* http://purl.net/macquigg        Tucson, Arizona 85710          *
************************************************************     *