procmail
[Top] [All Lists]

Re: procmail seems to not work consistently

2003-10-27 10:57:47
procmail(_at_)deliberate(_dot_)net wrote:

<> => VICODIN="[v(\\/)][1il\|][c\(][0o]d[1il\|]n"
<> 
<>      Great idea to use a variable, however you may have some confusion
<> about the syntax within character classes. How about this definition:
<> 
<>      VICODIN = "(v|\\/)[1il|][c(][0o]d[1il|]n"

Thanks. I think either approach works -- I chose to be consistant in
using alternation.  Because alternation works on single charactes, I end
up wrapping those cutesie spammer/l33t variants in parenthesis to make
them a "single"

The above is actually a variant of how I do it in my own recipes.  I
have a list of character variables similar to what Ruud proposed and
build up strings based on that. I use two sets of variables; one is
only the lamer versions of a letter (eg., "0" for "o") the other
includes the letter itself. I do the latter mainly for ease of reading.

  # Vowels
  lame_a="[(/\)@]"
  a="[a(/\)@]"
  lame_e="3"
  e="[e3]"

  (And so on for both vowels and consonants)

I added the "lame_<foo>" variables because one of my recipes wants to
give extra weight to obfuscated variants of words, because they are
uniformly a (sad) attempt to slip by content filters.

The spam recipe itself is:

  # $Id: bogus_subj_drugs.rc,v 2.3 2003/10/12 17:00:34 rali Exp $

  # Filter for spams hawking various drugs ... 
  # $SUBJECT is extracted in defines.rc
  # ${a} ... and ${lame_a} ... are defined in defines.rc

  
misspelled="((p(re|er)scrip|medica)t(${lame_i}o|i${lame_o})n|d${lame_i}scount|xan(nax|axx)|vall?i(am|umm)|(v(${lame_i}a|i${lame_a}|ia[-\.])gra|viagr${lame_a})|phenterm${lame_i}ne|v(icdoin|icod${lame_i}n|${lame_i}codin))"

  
drugs="(v${i}${a}gr${a}|ph${e}nt${e}rm${i}n${e}|u${l}tr${a}m|${a}mb${i}${e}n|${a}d${i}p${e}x|b${o}ntr${i}${l}|p${a}x${i}${l}|x${a}n${a}x|v${a}${l}${l}?${i}um|pr${o}z${a}c|x${e}n${i}c${a}l|${a}t${i}v${a}n|s${o}m${a}|pr${o}p${e}c${i}${a}|v${i}c${o}d${i}n|v${i}${o}xx|c${a}r${i}s${o}pr${o}d${o}l|l${e}v${i}tr${a})"

  otc="(weight loss|diet pill|anti([-$wsp])?(aging|depressant)|pain 
relief|(block|stop)s? fat|sildenafil)"

  :0
  * -15^0
  * $ 16^1 SUBJECT ?? $misspelled
  * $ 16^1 SUBJECT ?? (l${lame_o}se we${lame_i}ght)
  * $ 13^1 SUBJECT ?? $drugs
  * $ 11^1 SUBJECT ?? $otc
  * $  9^1 SUBJECT ?? (pharmac(y|eutical)|ha${lame_l}f off)
  * $  6^1 SUBJECT ?? (generic|prescri(bed|ption)|med(s|ication))
  * $  5^0 SUBJECT ?? (Receive (your )?[$alpha$wsp]+order) 
  * $  4^1 SUBJECT ?? (doctor)
  * $  3^1 SUBJECT ?? (save|low as|(1/2|½|half) (off|price))
  * $  1^1 SUBJECT ?? (order|online|shipped to your door|overnight shipping)
  {
    # Do whatever you might do with an email that exceeds the
    # threshold.  I add it to a cumulative score and use that to
    # determine spamminess.
  }

Regards,

Reto
-- 
Reto Lichtensteiger     | Contrary to popular belief, Unix is user friendly.
rali(_at_)tifosi(_dot_)com              | It just happens to be very selective 
about who it
                        | decides to make friends with.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


<Prev in Thread] Current Thread [Next in Thread>