procmail
[Top] [All Lists]

Re: rule to catch a certain number of characters

2007-05-26 17:31:56
On 26-May-2007, at 14:48, wolfgang wrote:
Why $SPACE$TAB? Doesn't that mean a space followed by a tab?

[^$WS] = "Not in the character class $WS" which means NEITHER space  
nor TAB jsut as [rice] means anyone of r, i, c, or, e and not "rice"

  :0:

  * ^Content-Type:.*/html
  *   B ?? > 100000
  * $ B ?? $xWS1152.*$*.*<\title>
  spampile

Is <\title> a typo for </title>?

Yes.

Personally, I would stop at 64 non white space characters.

The only drawback I can think of is that I think $xWS1152.*$*.*< 
\title> will scan the entire line, regardless of length, and scanning  
100,000 characters is expensive.

it might be better to search for

* $ B ?? ()<title>.*$?[$WS]*${xWS64}

assuming that syntax is correct, I think that should only scan for  
the first 64 characters of the title not containing a space before  
the condition is matched and processing ends.  Of course, if 64 is  
too short for comfort, increase it to 128 or 256 or whatever number  
you want.

This version also allows for a version where there is no EOL after  
the title tag.

this might work?

* 9876543210^0 $ B ?? ()<title>.*$?[$WS]*${xWS64}

I just can't remember if you can combine $ B ?? and scoring like that.

-- 
And, while it was regarded as pretty good evidence of criminality to  
be living in a slum, for some reason owning a whole street of them  
merely got you invited to the very best social occasions.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail