At 21:14 2004-03-02 +0000, Alan Clifford wrote:
I have a 10% rule on that one:
# 10% of chars are &#
:0 BD
* -1^1 .
* 10^1 [&#]
action as spam line here
Uh, unless the entire message is encoded this way (well, more like 50% of
it), you're unlikely to get much of a hit here - the technique seems to be
used by spammers for URLs. To top it off, you're flagging &# as individual
characters rather than as a pair, so programming lists are sure to run into
trouble. # is used as a comment char in shell scripts and perl, as well as
a decorative separator (i.e. in large blocks). & is a logic and bitshift
operator in many (programming) languages.
I ran a scan against my captured spew from february. Several hundreds of
spams (hey, most of it gets avoided via DNSBLs, otherwise, I'd have >10K
spams a month), but ONLY the following matched the construct (not the
weighting, just ANY match for an HTML ordinal escape, as a character pair):
4:3271 (used to signify bullets in an HTML list)
92:8544 (* an honest-to-goodness-ordinalized spam URL)
1:6308 (furrin character)
139:1687 (* ordinalized random characters of nearly every word in the body
of the message)
3:5200 (ordinals for unicode characters like elipses)
1:1963 (x3, ditto)
Note that this doesn't include checking the legit messages, and the
majority of the above hits are for legitimate use (even if found in spam
messages). This doesn't provide a large enough sampling, but judging from
the above, the following would seem to establish a reasonable weighting:
:0
* 1^1 .
* 100^1 &#
{
#action
}
Note that besides ascii, ordinalized codes can be hex (  for
instance), or may specify a unicode character (… for instance).
What sort of performance diff you you realize with the D flag here when
you're not checking anything where case would be significant?
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail