procmail
[Top] [All Lists]

oversimplification bites (was: problem with fgrep)

1997-09-16 07:13:37
I just by accident discovered the discard of a legitimate message.
There was an In-Reply-To: 
<(_dot_)(_dot_)(_dot_)(_dot_)(_dot_)(_at_)triceratops(_dot_)com> which

* formail -ISubject: | fgrep -i -f llv.domains

TjL:
Can't this just be changed to

* formail -ISubject: -IIn-Reply-To:| fgrep -i -f llv.domains

(and perhaps `References' line in case someone CCs you on a Usenet post?)

That's probably the cheapest way (in terms of human effort) to have
prevented this particular item.  But it would still discard items with
triceratops in a Received: line.

JM:
Substrings vs. regular expressions! 

Right.  fgrep does not use regexp

Is there a way to do the perlish /\btriceratops.com\b/ ?

Directly in procmail there is.  fgrep was nice because someone else is
maintaining the list of domains registered by the jerks^H^H^H^H^Hspammers
and I didn't have to modify it (until now).  Otherwise, the overhead would
have been absurd.

Does any listee have a general purpose (and not too expensive)
hook to use perl instead of fgrep?  

Almost as much overhead, plus you have to convert the list into the perl
format.  Might as well convert it into the procmail format.  (Which
someone at concordia or acordia or something-like-that has done.)

Eli:
You just noticed this? 

No, I was expecting it and watching for it in the backups folder.  I was
surprised that after several weeks it hadn't happened yet--till
yesterday.  "by accident" was misleading, sorry.  This one was "by
accident" in the sense that I found it when I was looking for something
else.

Where have you been every time I go off on
people who check spam domains without reguard to word boundaries?

Must have been before I joined this list.  :-)  Or maybe I just missed
them--weeding through this list has become so time-consuming that I'm
thinking of unsubscribing.  The number of spams on my worst day has never
been as high as the recent daily counts of legit messages here.

I know that I have seen some messages from you, so I'm NOT discarding
"netusa"...  Just consider my post a boost to your crusade.

Folks may also note that the recipes recommended by the provider of the
domain lists will have the same problem, since they only put OR bars
between the domains.

GNU fgrep has a -w flag to check for word boundaries.

Now if this works, it will be great!  No transformation of the list needed!
If it doesn't work, then perhaps 

:0
* (`sed 's/^/\\</g' llv.domains | sed 's/$/\\>/g' | tr '\012' '|'`)

The moral of the story is:  oversimplification is OK,  IF you can live with
the resultant inaccuracies.

<Prev in Thread] Current Thread [Next in Thread>
  • oversimplification bites (was: problem with fgrep), W. Wesley Groleau x4923 <=