procmail
[Top] [All Lists]

*****SPAM***** RE: [...] Some emails baypass Recipe

2003-10-16 11:19:31
On Tue, Oct 14, 2003 at 11:32:11AM -0400, Jacoub Sbaiti wrote:

Sometimes an email bypases the recipe, and when I forward the email
again to myself the recipe captures it!
For example a message had a word prescriptions  or Enlargement or Penis
in the Body of the message did bypass my recipe even though I am
filtering those words in the recipe BUT when I reply or forward those
messages to my email they get filtered.

Here is my recipe part where  I filter for bad emails:
:0HB
*

.*penis|.*p-e-n-i-s|.*p,e,n,i,s|.*enlargement|(_dot_)*vi(_at_)gr@|.*v-i-a-g-r-a|.*v-i
-(_at_)-g
-r-@|.*v,i,a,g,r,a|.*viagra|.*v1agra|.*v i a g r
a|.*fuck|.*tits|.*suck|.*vicodi

n|.*prescription|.*horny|.*amateur|.*XXX|.*pharmacy|.*pussy|.*sex|.*\<debt

elimi
nation\>|.*\<young girl\>|.*\<teen girl\>|.*\<bang gang\>
/dbc/Procmail/bad_Emails

Your regex needs lots of help, I'm afraid.


Here the log file:
             This log for a message that included the word
Presscription in the body

Please don't send 30 lines of log (times two log entries) about the
virussnag INCLUDERC when it has absolutely nothing to do with your
question or any possible relation to what's wrong.

[snipped]
procmail: No match on ".*^A^A^A"
procmail: No match on
".*penis|.*p-e-n-i-s|.*p,e,n,i,s|.*enlargement|(_dot_)*vi(_at_)gr@|.*
v-i-a-g-r-a|(_dot_)*v-i-(_at_)-g-r-@|.*v,i,a,g,r,a|.*viagra|.*v1agra|.*v i a 
g r
a|.*fuck|.

*tits|.*suck|.*vicodin|.*prescription|.*horny|.*amateur|.*XXX|.*pharmacy|.
*pussy
|.*sex|.*\<debt elimination\>|.*\<young girl\>|.*\<teen girl\>|.*\<bang
gang\>"


     This log for a message that included the word Penis and
Enlargement in the body

[snipped]

There is very little we can tell from your question as posed, despite
its length.  We have nothing but your word that those words were
in the mail in question and spelled according to the regexes in your
condition.  Against your word, we have procmail's that they weren't.
So something is fishy.

Rather than send long *unrelated* log extracts (the related lines are
fine), it might help to send an *excised* section of the emails in
question, containing the words.  Your report of what happened is too
liable to distortion or misapprehension of what's going on, is why we
need the actual text you're claiming didn't match.  For example, you
wrote up above that the "messages . . . included the word Presscription
[sic] in the body."  If it really said "Presscription", then I would
not expect your recipe condition that only looks for one S to catch the
word.  I suspect with dismay, however, that you simply mistyped in your
problem report.  If we saw the actual text, rather than your summary of
it, we'd know for sure.

I don't know off-hand what about that recipe snippet would cause the
problem you state.  But I do know of several problems with the recipe
that you should fix.  Perhaps doing so will also make your problem
go away.

First of all, do not use the HB options in the initial line of
recipes running under the current version of procmail.  There is
a known problem with H in that it later does not turn itself off
in subsequent recipes.  See list archives for a discussion of
this.  Use the alternative syntax you will find in such discussions.
I highly doubt that is the cause of your particular problem here,
however.  It is merely a general caveat.

Second, and this is important, you do not need or want ".*" over
and over in ORed body-grep regexes.  Procmail is already, at your
direction, looking through the entire body of each mail for each
regex phrase you have ORed.  You do not need to tell procmail
redundantly again each time to look across each line.  You do
not have a left-anchor to your phrase, so there is no reason
procmail won't, on its own, scan the entire width of each line
without the ".*".  Try it in your test harness (also called sandbox),
which you do have, right? :)

Anyway, ".*suck" (for one example of several) will almost surely give
you lots of false positives.  You would match on any of these words,
among other possible ones that aren't in the dictionary I'm using
to generate the list:

 bloodsuck bloodsucker bloodsucking bogsucker bullsucker cowsucker
 goatsucker hamesucken haysuck honeysuck honeysucker honeysuckle
 honeysuckled insucken lumpsucker mudsucker outsuck outsucken resuck
 sapsuck sapsucker seersucker suck suckable suckabob suckage suckauhock
 sucken suckener sucker suckerel suckerfish suckerlike suckfish
 suckhole sucking suckle suckler suckless suckling suckstone undersuck
 unsucked unsuckled upsuck waesuck windsucker

".*horny" (or even just "|horny|") is going to block, among others,

 hawthorny horny hornyhanded hornyhead semihorny thorny unhorny unthorny

And, frankly, I sure wouldn't want anybody blocking mail to me that
contained the word "amateur."  I am an amateur pianist, for example,
and it says so in my resume that you might block were I to send it
to you.  Ditto "enlargement", a word it would be foolish to block
in general email.


Third, one point of regexes is the ability to group logical phrases
with concision.  Your coding will improve dramatically when you
learn to implement that ability in what you're expressing in your
own ORed regexes.  I'll give an example below.

But ultimately, the idea that you are going to be able to stay
one step ahead of the spammers by thinking up all the dumb permutations
of words they like to use, is a fatuous one.  If you write "vi(_at_)g@ra",
they will send "v1agara".  if you write "p,e,n,i,s", they will send
"p,e.n,i,s".  And so on, until you check yourself in to the nearest
mental hospital.  :)  That is why I am philosophically opposed to
body-grepping for specific bad words in general.  But that's another
topic for another post.

I do find it ironic that SpamAssassin, which ran somewhere upstream
from me on your post, tagged it as spam, when your own flawed recipes
didn't.  Perhaps you should simply run SpamAssassin on your server.

Okay, look, you need to do something sort of like this, to keep your
sanity if for no other reason:

 PUNKY   = '[!-,:-?[-`{-~]'  # ordered range of select punctuation marks
 A       = "[A(_at_)]"
 PENIS   = "p$PUNKY?e$PUNKY?n$PUNKY?i$PUNKY?s"
 VIAGARA = "v$PUNKY?i$PUNKY?$A$PUNKY?g$PUNKY?($A$PUNKY?)?r$PUNKY?$A"
 # etc.

 * $  HB ?? ()\<($PENIS|$VIAGARA|$whatever)\>

Try that in your test harness.  Meanwhile, note two things; one of
which is that I have surrounded the words with the "\<" and "\>"
markers, to limit matches sanely.

The second thing to notice, looking back on your original condition,
I now believe to be the source of your problem: you have failed to
surround your ORed conditions with parentheses, as syntax requires.
The parens are in my example above.

P.S.  Just for grins, I tried the recipe snippet above on my last-
100 saved spams.  It got 6!  Not bad at all.

--
dman


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • *****SPAM***** RE: [...] Some emails baypass Recipe, Dallman Ross <=