procmail
[Top] [All Lists]

Re: Subject line

2002-06-05 10:56:16
At 12:14 2002-06-05 -0400, Monah Baki wrote:

:0
* ^Subject: .*(*fun|*bills|*weight|*credit|*casino|*paid|*FREE|*save|*congrats|\
*free|*CREDIT|*Free|*save|*PAMELA|*win|*rent|*online|*sex|*money|*income)
/dev/null

I was wandering if this will work???

To to do what?

The regex syntax is hosed: * should follow a character or a wildcard '.' If you have .* at the beginning, putting * inside each or'd condition is rather excessive in any event.

I hope you don't get any messages like "I have a problem with X windows", "what's the function", "saving cycles on complex regexp", etc. Consider how broadly your keywords will apply since they'll match as substrings in other words.

See my disclaimer for a link to information on a sandbox configuration - a basic procmail wrapper into which you can include test recipes, then throw messages at them and examine a log -- all while leaving your REAL email safely alone until you're confident that a recipe will work properly. A sandbox will answer many questions for yourself.

You should consider using scoring ('man procmailsc'): you can count up points for each time a certain word appears in the subject (or anywhere else for that matter), and if the total points exceeds some threshold, you ditch the message. You can carry over scoring from one recipe to another, so you could add up the score for subject keywords, stuff it into a variable, then do other tests and add the subject score to THOSE tests, and if the total exceeds your threshold, you toss the message.

A simple example follows (note that for various reasons elsewhere in my procmail config, I extract the subject into a discreet variable, so unless you're doing the same, this won't work as-is for you). There is additional material excluded from this, and this is just one of _many_ spam tests I run. Specific scoring can be adjusted to preference - see the sandbox config I present - use formail to split an existing mailbox through into this in a sandbox config and tweak as necessary.

Some keywords and phrases you might normally ditch right away aren't high-scored here merely because on a forum in which I participate, some of the members discuss junkmail, and in doing so, they not infrequently utilize some of the keywords in the subject.

Note particularly the relatively low scoring value for a slew of common keywords at the beginning - these individually won't trigger as spam, but they'll _ADD_ to the total, and they start to add up quickly if some of those words appear multiple times.

# Start with a negative credit.
:0
* -135^0
* -10000^0 SUBJECT ?? Some-specific-string-used-in-spam-reporting.
* 120^0 SUBJECT ?? [    ][      ]+(\[\(<)?[0-9][0-9][0-9]+(\]\)>)?[     ]*$
* 25^1.7 SUBJECT ?? (hardcore|affordable|better|insurance|adult|picture|\
        gallery|\<new\>|\<works\>|looking|database|report|search engine|\
        library|internet|web(\-|\ |)hosting|domains|\<need\>|\<quick\>|\
        forget|remind|unique|\<play\>|\<pay\>|dominate|\<beat\>|\<heat\>|\
        announcement|\<deep\>|\<call\>|\<pressure\>|worldwide|\<only\>|\
        \<help\>|\<big\>|solution|\<attend\>|invitation|invited|perfect|\
        system|package|consumer|effective|affordable|extension|deadline|\
        anything|anyone|results|potential|traffic|travel|welcome|attract|\
        material|forensic)
* 45^2 SUBJECT ?? (for\ more|b2b|too\ much|for\ the\ price|your\ interest|credibility|your\ homepage|health(\ |\-|)care|is\ now\ live|PR\ package|learn\ how|order\ online|girls) * 250^2 SUBJECT ?? (\<sex|\<xxx\>|porn|\<gay\>|erotic|orgy|\<hiv\>|\<aids\>|viagra|sperm|\<jiz|\<jism|\<cum\>|orgasm|lesbian|cum\ *shot|get\ it\ up|sex\ drive|lingerie|(adult|nude|live)\ (streaming\ |)(video|feed)|live\ *(show|chat|sex)|get\ off|(adult|date)\ (line|site)|over\ *21|adults\ *only|phone\ *sex)
* 90^2 SUBJECT ?? (hottest|bigger|harder|subliminal|everyday)
* 50^2 SUBJECT ?? (tattoo|ugliest|babe|wiggle|jiggle|\<tight\>|\<ass(hole)?\>|\<huge\>|\<tits\>|\<cock\>|\<wet\>|lust|\<farm\>|suck|swallow|choke|nuts) * 200^2 SUBJECT ?? (aphrodisiac|pheromone|androstendione|androstenedione|dhea|sexual power|steroid|enlargement|impotency|instant sex appeal) * 200^2 SUBJECT ?? ((barely\ *legal|nude|wet|young|live|hot|shaved|hairless)*\ *(teen|pussy|cunt|slut|whore))
* 200^2 SUBJECT ?? ((attract\ (and|\&)\ seduce|pick(\-|\ )?up)\ women)
* 200^2 SUBJECT ?? (stronger\ ((and|\&)\ multiple\ )?orgasms)
* 200^2 SUBJECT ?? (dressing\ room|(hidden|voyeur)\ cam(|s|era)|grandmother.*fuck) * 100^2 SUBJECT ?? (abduction|forced|unwilling|kidnapped|abused|rape|incest|violated) * 50^2 SUBJECT ?? (toilet|beastiality|(animal|zoo)\ sex|golden\ showers|urine|\<pee\>) * 50^2 SUBJECT ?? (mischievous|forbidden|outlawed|illegal|havoc|steal|drug|fraud)
* 45^1 SUBJECT ?? (zaprosz|zaproszen|oferta)
* 90^2 SUBJECT ?? (\<free\>|wholesale|today|plus|discount|clearance|gift)
* 60^1 SUBJECT ?? (priority|portal|compete|placement|immigration|attention|alert)
* 60^3 SUBJECT ?? (affiliate|referal|program)
* 100^2 SUBJECT ?? (<win\>|offshore|\<prize\>)
* 200^2 SUBJECT ?? (You\ Won\ \$)
* 200^2 SUBJECT ?? (casino|lotto|lottery|gambling|betting|playoff|beat\ the\ slots|slot\ *machine)
* 500^2 SUBJECT ?? (casino\ *software)
* 250^2 SUBJECT ?? (\<AD(V|)\>)
* 45^2 SUBJECT ?? (psychic)
* 90^2 SUBJECT ?? (advert|delete|bulk(\-|\ )*email|promotion|call\ now)
* 90^2 SUBJECT ?? (stock|investigat(or|ion|e)|secret|confidential|weapon|Internet\ Spy|background|password)
* 90^2 SUBJECT ?? (stock\ (tip|market|offer))
* 75^1 SUBJECT ?? (\<hot\>|pricing|expand|offer|exciting|revolutionary|important|information|unlimited|limited\ time|easiest|fantastic|ultimate|unlock|affordable|flat\ rate|universal)
* 50^3 SUBJECT ?? (\<save\>|\<slash\>|\%)
* 60^0 SUBJECT ?? (E\ N\ O\ U\ G\ H)
* 45^3 SUBJECT ?? (\<cell(ular|)\>|reception|range)
* 100^1 SUBJECT ?? (india|china|taiwan)
* 50^3 SUBJECT ?? (online|advertising)
* 75^1 SUBJECT ?? (buy(ing|)\ on(-|\ |)line|pre-registration|lowest\ price)
* 500^2 SUBJECT ?? (homebiz|ca\$h|zero\ down|Home(\-|\ )Based\ (Biz|business)\work\ (at|from)\ home|financial\ freedom|downline|mlm) * 500^2 SUBJECT ?? (You\ Have\ Won|You\ Have\ Been\ Chosen|would\ you\ like\ to|don't\ want\ you\ to\ know|open\ this\ letter|change\ your\ life|\easy\ way)
* 200^2 SUBJECT ?? ((toner|printer)\ (supplies|cartridges))
* 100^2 SUBJECT ?? (accept(ing|)\ (credit|checks)|merchant\ account|toner|credit|get\ paid|pay\ you|mortgage)
* 75^3 SUBJECT ?? (targeted\ e(\-)*mail|campaign)
* 100^2 SUBJECT ?? (revenue|lifetime|guaranteed(\ (results|return))|growth\ potential) * 90^2 SUBJECT ?? (special|sponsor|supplies|cash|improve|cost|increase|reciprocal|UNBELIEVABLE|savings)
* 90^4 SUBJECT ?? (\<invest(ment|or|ing|)\>|business|income|opportunity)
* 90^4 SUBJECT ?? (biz\ op|venture)
* 45^0 SUBJECT ?? (wealth|virtual|value)
* 75^2 SUBJECT ?? (Wait\ Is\ Over|voted \#1)
* 75^2 SUBJECT ?? ((easy|free|earn|extra|)\ *money|need\ cash)
* 75^2 SUBJECT ?? (revenue|expense|profit|\<earn\>|purchas(e|ing)|prospects|expert|powerful|recruiting|contact|survey|partner|positive) * 200^1 SUBJECT ?? (residual|please\ read|can't\ lose|expand\ your|don\'t\ delete|free\ info|(truly|really)\ works|money\ making|traffic\ builder|(as\ |)seen\ on|Advertising\ that\ works)
* 100^1 SUBJECT ?? (The\ Contrarian|congratulation)
* 120^1 SUBJECT ?? (dollars|million|thousand|\.INFO|\.NAME|\<4\ *U\>)
* 200^1 SUBJECT ?? (make\ (lots\ of\ )*money|debt\ free|out\ of\ debt|great\ credit|credit\ history|pre\ paid\ legal|private\ (and|\&)\ confidential) * 100^1 SUBJECT ?? (no\ cost|making\ money|email\ addresses|get\ yours|internet\ marketing) * 250^1 SUBJECT ?? (Weight\ *Loss|lose\ *weight|non(\-|\ |)smoker|homeopathic|all\ natural) * 150^1 SUBJECT ?? (global\ friends|domain\ extensions|keyword\ analysis|magazine\ subscription) * 200^1 SUBJECT ?? ((urgent\ *(\&|and)|very)\ *confidential|web\ portal|within\ our\ portal|find\ out|\one\ of\ a\ kind|JUST\ RELEASED)
* 100^1 SUBJECT ?? (link\ exchange)
* 100^0.75 SUBJECT ?? (Platinum|eMarketing|e(-|)biz|sales\ (\&|and)\ marketing|easy\ (\&|and)\ safe|for\ your\ clients)
* 120^1 SUBJECT ?? (\,000|0000|\.00|\.95|\.99|\%\ off|mega|on\ *cd(-|)rom)
* 75^1 SUBJECT ?? (monthly|weekly|/mo|/yr|/wk|(per|every|each|paid) (month|year|week|quarter))
* 250^1 SUBJECT ?? ([0-9]+\ *(cpm|(c/|cent(s|)\ *(/|per))\ *(min|mn)))
* 25^1 SUBJECT ?? (cent)
* 120^1 SUBJECT ?? (long\ distance|cable\ TV|satellite|\<dss\>|descrambler)
{
        LOG="SPAM: Subject Scoring match $=$SPAMVER"

        :0:
        |gzip -9fc>>$MAILDIR/spam.gz
}



 Is procmail case sensitive?

No. FTR, this is made quite clear in the manpage if you take the time to read them.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>