|
Re: sequential processing (was Re: Abort data transfer?)
2009-11-18 17:13:12
Dave CROCKER wrote:
Murray S. Kucherawy wrote:
I don't understand how running the filters in series requires that
the entire message be in a buffer.
MTA buffers the entire message, sends it to filter #1. Filter #1
changes the
body. MTA sends the modified message to #2, including the new body.
This
can only happen if they're in series, and I can't see how it would be
possible if there's not a buffer involved.
Here's an even better example: MTA buffers the entire message, sends
it to
filter #1. Filter #1 orders the message to be rejected (or discarded).
Filter #2 is told "nevermind", and never has to go through the
processing of
the body. For a very large message, this can be a big performance
win, and
again can only happen if they're in series.
Strictly speaking either fully-buffered or partial buffering allows
processes to
be staged in sequence just fine. The only issue is whether the
filters are staged in sequence with early filters feeding later ones.
What full buffering does is to allow the current filter to change an
earlier
part of the message, based on a later part. You can't do that in a
partial
buffering (hot potato) model whether the processing is done only on
the current
chunk and is then passed back.
Simple example would be wanting to a header field to the message,
based on
something in the body.
Seems like we have two issues: Series/Parallel and Full/Partial
Buffering. Running filters in series is clearly the better strategy
when the last filter is very time consuming, and earlier filters can
provide a quick reject.
The need for full buffering is a separate issue. It depends on the
filter process, of course, but the process Murray originally described
(scanning the headers) could be done in a purely serial fashion, thus
not requiring a buffer. A statistical filter, on the other hand,
requires the entire body in a buffer, so it can go back and do further
scanning based on, for example, the number of times the word Viagra
appears in the entire body.
The particular filter I had in mind when I started this thread was the
simple count of body blocks that I run after each block is received.
This process is purely serial. When the count exceeds some limit, the
rest of the body is rejected. Until Murray pointed out how Sendmail
actually works, I was assuming this would protect my receiver from a
broken or abusive transmitter.
I see no reason the buffer couldn't be just ahead of the eom
(end-of-message) processing. This would allow the early filters to
reject before transferring all the data, while preserving the ability of
filters called from eom to work on the entire message at once.
--> Filter#1 --> Filter#2 --> Buffer --> Filter#3 --> MDA
It seems like Sendmail's filter API
(https://www.milter.org/developers/api/index) is trying to accomplish
this, but Sendmail itself is not taking full advantage of what the API
allows. There are "callbacks" from Sendmail to allow running filters
after the DATA command, after each header, at the end of all headers,
after each body block, and at the end of all blocks (eom). Certain
commands (message modification) cannot be done except at eom. Others,
like sending a reject after too many blocks, *appear* to work before
eom, but actually do wait for eom.
The one filter I have which *does* require full buffering is
SpamAssassin. I use my own buffer for that purpose, and pass it to
SpamAssassin at eom. Note: A filter which adds a header does *not*
require a full buffer. In an early filter, for example, I gather all
the data needed to construct an authentication header. Later, the
authentication header is inserted using one of the commands that run at eom.
It bothers me that Sendmail does all this extra processing on messages
that should get an early reject. I haven't noticed the problem, because
I don't have a large mail load, but there are plans to use this setup on
a larger mail system, and now I worry about efficiency. I like
Sendmail's ability to let me control everything, but if that is only an
illusion, I may need to look for a different MTA program.
************************************************************ *
* David MacQuigg, PhD email: macquigg at ece.arizona.edu * *
* Research Associate phone: USA 520-721-4583 * * *
* ECE Department, University of Arizona * * *
* 9320 East Mikelyn Lane * * *
* http://purl.net/macquigg Tucson, Arizona 85710 *
************************************************************ *
|
|