Jason Marshall asked,
| My question is, if I'm trying to match a lot of items (about 1900 of them,
| and growing), is it "better" to have one recipe for each of the 1900
| items, or one recipe with all items |'ed together?
|
| A small example... Would it be "better" this way: (sorry about the long
| lines)
|
| :0 h
| * ^(To:|From:|Reply-To:|Comments:
Authenticated).*@(validreturn.com|.*\.validreturn.com)
| procmail/spamfile
|
| :0 h
| * ^(To:|From:|Reply-To:|Comments:
Authenticated).*@(audioforum.com|.*\.audioforum.com)
| procmail/spamfile
|
| ...insert 1900+ more individual recipes here...
|
|
| Or would it be better this way:
|
| :0 h
| * ^(To:|From:|Reply-To:|Comments: Authenticated)(_dot_)*(_at_)\
| (validreturn.com|.*\.validreturn)|\
| (audioforum.com|.*\.audioforum)|\
| ...insert 1900+ more lines here..
| procmail/spamfile
It's better to combine as many as you can, within the limits of $LINEBUF.
One thing that will cut the length almost in half is to change "@" at the
end of th first line to "@(.*\.)?" -- that way you can reduce all those
(dom\.ain|.*\.dom\.ain)|
(not that you need the parentheses even now) to simply
dom\.ain|
| Would this cause the indentation to be included in the ingredients for this
| recipe?
No; indentation is ignored unless you stick a backslash or a pair of empty
parentheses in front of it (or parentheses around it).
| My kudos must go out to Stephen, who's written one kick-butt piece of
| software!
Absolutely.
If the recipe length does get unwieldy, you can break it up (more frequently
spamming domains in earlier recipes for efficiency). Keeping a list of do-
mains in a separate file would end up running two outside processes on each
piece of suspected spam: formail to gather all those header lines into one
text to search and fgrep to check it against the blacklisted domains file.
[Well, the formail call could be avoided with multiple extraction recipes.]