Re: Cyberpromo named boot - list of domains to filter.

At 11:41 AM 8/17/97 -0500, Philip Guenther wrote:

:0 hir
* ?egrep -isFf $BLOCKDOM
/dev/null


Here's the variant I'm using:

($FORMAIL is defined to the path to the formail executable, $FGREP to the
fgrep executable, and $SPAMLIST to the file, with spam domains on
individual lines):

:0
* ? $FORMAIL -ISubject: | $FGREP -i -f $SPAMLIST
/dev/null

(with the recent addition of the more or less complete cyberpromo domain
list, my spamlist (domains) alone is at 843 entries.  Among other lists
matched in a similar fashion, I also have a twitlist - which is just
addresses/name components to be matched against address-type headers)

This matches everything occurring in the headers except for the subject
(that is, when looking for a match, the contents of the subject aren't
considered - this keeps us from matching on subjects that might contain
references to some spammer domain - such as occurs when discussing spam),
and doesn't give a rats arse about the CaSe of the strings.

Performance-wise (minus my additional overhead of formail), is there a big
difference between the two invocations of (f)grep?  I know that I'm taking
a big performance hit by grepping out all the basic spam domains (OTOH,
look at all the extra disk space! :) ).

My spam domain list is not currently comprised of regexp'd items.  I'm
considering changing it so that all items are matched on a preceeding word
break or period only. (" <.(@" mostly), since not doing so could be a
problem with some domains which end up being a shorter form of another
domain (menioned here in this group have been usa.net vs netusa.net - if I
filter for the first one, I'll whack the second).

---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395