procmail
[Top] [All Lists]

Re: when plaintext differs from html

2004-01-26 18:33:39
gus wrote:

<> also - to catch these blocks of random words, calculate the average word 
<> size of the spam and see if it's far in excess of the normal (4-5 
<> characters per word).  but only do this in cases where the word count 
<> exceeds a certain number so as to avoid false positives.

This comment from gus triggered an idea.  Here's my first pass at it;
comments and refinements welcomed:

  # $Id$

  MULTIPLIER="10"

  :0
  * $ 1^1 B ?? [$wsp]
  { 
    TMP="$="
    SPACES=`echo $(( $TMP*$MULTIPLIER ));`
    TMP
  }

  :0
  * $ 1^1 B ?? [$alpha]
  { ALPHA="$=" }

  :0
  * $ -${SPACES}^0
  * $  ${ALPHA}^0
  {
  
  #Do something
  
  }

  MULTIPLIER

I haven't tested this against anything like my full spam and clean
corpi to see if it's broken.  I will be off doing that ...

Reto
-- 
R A Lichtensteiger      rali(_at_)tifosi(_dot_)com

 You smug-faced crowds with kindled eye Who cheer when soldier lads march by,
 Sneak home and pray you'll never know The hell where youth and laughter go.
 - Siegfried Sassoon

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail