procmail(_at_)deliberate(_dot_)net wrote:
<> => VICODIN="[v(\\/)][1il\|][c\(][0o]d[1il\|]n"
<>
<> Great idea to use a variable, however you may have some confusion
<> about the syntax within character classes. How about this definition:
<>
<> VICODIN = "(v|\\/)[1il|][c(][0o]d[1il|]n"
Thanks. I think either approach works -- I chose to be consistant in
using alternation. Because alternation works on single charactes, I end
up wrapping those cutesie spammer/l33t variants in parenthesis to make
them a "single"
The above is actually a variant of how I do it in my own recipes. I
have a list of character variables similar to what Ruud proposed and
build up strings based on that. I use two sets of variables; one is
only the lamer versions of a letter (eg., "0" for "o") the other
includes the letter itself. I do the latter mainly for ease of reading.
# Vowels
lame_a="[(/\)@]"
a="[a(/\)@]"
lame_e="3"
e="[e3]"
(And so on for both vowels and consonants)
I added the "lame_<foo>" variables because one of my recipes wants to
give extra weight to obfuscated variants of words, because they are
uniformly a (sad) attempt to slip by content filters.
The spam recipe itself is:
# $Id: bogus_subj_drugs.rc,v 2.3 2003/10/12 17:00:34 rali Exp $
# Filter for spams hawking various drugs ...
# $SUBJECT is extracted in defines.rc
# ${a} ... and ${lame_a} ... are defined in defines.rc
misspelled="((p(re|er)scrip|medica)t(${lame_i}o|i${lame_o})n|d${lame_i}scount|xan(nax|axx)|vall?i(am|umm)|(v(${lame_i}a|i${lame_a}|ia[-\.])gra|viagr${lame_a})|phenterm${lame_i}ne|v(icdoin|icod${lame_i}n|${lame_i}codin))"
drugs="(v${i}${a}gr${a}|ph${e}nt${e}rm${i}n${e}|u${l}tr${a}m|${a}mb${i}${e}n|${a}d${i}p${e}x|b${o}ntr${i}${l}|p${a}x${i}${l}|x${a}n${a}x|v${a}${l}${l}?${i}um|pr${o}z${a}c|x${e}n${i}c${a}l|${a}t${i}v${a}n|s${o}m${a}|pr${o}p${e}c${i}${a}|v${i}c${o}d${i}n|v${i}${o}xx|c${a}r${i}s${o}pr${o}d${o}l|l${e}v${i}tr${a})"
otc="(weight loss|diet pill|anti([-$wsp])?(aging|depressant)|pain
relief|(block|stop)s? fat|sildenafil)"
:0
* -15^0
* $ 16^1 SUBJECT ?? $misspelled
* $ 16^1 SUBJECT ?? (l${lame_o}se we${lame_i}ght)
* $ 13^1 SUBJECT ?? $drugs
* $ 11^1 SUBJECT ?? $otc
* $ 9^1 SUBJECT ?? (pharmac(y|eutical)|ha${lame_l}f off)
* $ 6^1 SUBJECT ?? (generic|prescri(bed|ption)|med(s|ication))
* $ 5^0 SUBJECT ?? (Receive (your )?[$alpha$wsp]+order)
* $ 4^1 SUBJECT ?? (doctor)
* $ 3^1 SUBJECT ?? (save|low as|(1/2|½|half) (off|price))
* $ 1^1 SUBJECT ?? (order|online|shipped to your door|overnight shipping)
{
# Do whatever you might do with an email that exceeds the
# threshold. I add it to a cumulative score and use that to
# determine spamminess.
}
Regards,
Reto
--
Reto Lichtensteiger | Contrary to popular belief, Unix is user friendly.
rali(_at_)tifosi(_dot_)com | It just happens to be very selective
about who it
| decides to make friends with.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail