procmail
[Top] [All Lists]

Re: how does the grep work internally?

1997-06-18 00:04:00
On Tue, 17 Jun 97 12:50 EDT, 
process(_at_)qz(_dot_)little-neck(_dot_)ny(_dot_)us 
(Eli the Bearded) wrote:
not something easily tested. 
How much gets fed to the internal "egrep" at a time? When scanning
the headers, is it called once for each header, or once for all the
headers? Or once for some fixed size chunk? And the body, once per
line, once per whole, or once per chunk? If the recipe checks body
and headers, will the egrep ever get them together?

Well, I can answer that one:

 $ cat .prc
 MAILDIR=$HOME/scratch
 DEFAULT=$HOME/scratch/prc.out
 VERBOSE=yeah

 :0HB
 * ^^From(.*$)*Here ya go
 /dev/null

 $ echo "From: and some other random headers
Subject: When will this end?
Oh-Boy: I have my own header field here

This is the body. 
Here ya go" | procmail .prc
 procmail: [7213] Wed Jun 18 09:16:15 1997
 procmail: Match on "^^From(.*$)*Here ya go"
 procmail: Assigning "LASTFOLDER=/dev/null"
 procmail: Opening "/dev/null"
  Subject: When will this end?
   Folder: /dev/null                                                        136

Say I get *.naswers formated FAQ, which always has Keywords: as the
last real header, and always has Archive-Name: as the first secondary
header, would something like this work:

     * ^Keywords:.*^^^^Archive-Name:

I believe you need ^Keywords:(.+$)*$Archive-Name: (this will allow
spurious fields after Keywords: in the header as well. Modify the
parens if you don't want that).

I've been confused at times about how many newlines Procmail thinks
the boundary between the header and the body are, exactly, but I did a
quick test on this one and it does seem to behave as expected ...
given that you have a HB flag on the recipe, or test this with

    * HB ?? ^Keywords:(.+$)*$Archive-Name:

When writing recipes that check for stuff like that, is it the egrep
that does it, or is the RE broken up internally to line fragments to
check? 
If a chunk model is used, is the chunk LINEBUF sized? (If I get a
piece of mail with a 20,000 byte subject, will I have problems
from procmail?)

I'm only speculating here, but as for the LINEBUF part, that should
only affect the lines read from the rc file. (But, as demonstrated by
one of my own recent question, it's apparently not too hard to write a
careless recipe that not only exceeds LINBUF, but crashes Procmail in
the process, given large enough headers.) So I would say that memory
required for the regexps and the input is dynamically allocated, and
restricted only by available memory, but like yourself, I'm having
problems with reading the source (I'm too lazy). 

Let's hope the gurus on this list can answer your questions
definitely. Summoning Philip Guenther ...

/* era */

-- 
Defin-i-t-e-ly. Sep-a-r-a-te. Gram-m-a-r.  <http://www.iki.fi/~era/>
 * Enjoy receiving spam? Register at <http://www.iki.fi/~era/spam.html>

<Prev in Thread] Current Thread [Next in Thread>