procmail
[Top] [All Lists]

how does the grep work internally?

1997-06-17 10:52:00
This might be better for direct mail with people how hack procamil, but...

In its famously accurate way, the man page for procmail explains
it has full 'egrep' compatibility, even as it then goes on to
document how it is only vaguely compatible.  My question today is
not something easily tested. 

How much gets fed to the internal "egrep" at a time? When scanning
the headers, is it called once for each header, or once for all the
headers? Or once for some fixed size chunk? And the body, once per
line, once per whole, or once per chunk? If the recipe checks body
and headers, will the egrep ever get them together?

Say I get *.naswers formated FAQ, which always has Keywords: as the
last real header, and always has Archive-Name: as the first secondary
header, would something like this work:

        * ^Keywords:.*^^^^Archive-Name:

When writing recipes that check for stuff like that, is it the egrep
that does it, or is the RE broken up internally to line fragments to
check? 

If a chunk model is used, is the chunk LINEBUF sized? (If I get a
piece of mail with a 20,000 byte subject, will I have problems
from procmail?)

I have tried reading the procmail source, but it is highly frustrating.
I don't think I have even see IOCCC entries with so many gotos. I
tried feeding it to indent (gnu version 1.6) to improve the whitespace
situation but that produced spectacularly broken results.

Elijah
------
Please do not CC me when replying to the list.  It is not my responsibility to
prove to you my mail is not spam, if mail to you bounces it will not be resent.

<Prev in Thread] Current Thread [Next in Thread>