I suspect you are absolutely correct. Still 4 of 10000 false positives
ain't bad. The pattern is picking up stuff from a number of "known"
spammers (ie, domains in the From field that are included in my list of
spammers.) And it we can identify a "tool" (as Sean suggests) that'll
work also.
- fleet -
On Tue, 4 Feb 2003, Don Hammond wrote:
On 4 Feb, fleet(_at_)teachout(_dot_)org wrote:
| egrep -i "message-id: [<][0-9]{10}[(_dot_)][0-9]{4}[(_at_)]" file(s)
|
| gives me no false hits. The terminal @ appears to be necessary. All the
| msgid strings begin with 104??? The second part (just before the @) can be
| four or five digits. This also got hits in an old spam folder (Aug 02).
|
I'd bet they all begin with 104 because the first 10 digits are epoch
seconds. I'd further bet all the hits from your 8/02 folder begin with
102, up until ~ 8/22 03:06:40 when it changes to 103. Of course that
would be the time on the sending end, not at your end. I'd *guess* the
part after the dot is a pid and could be 3-5 digits.
I don't have a spam collection handy to run it through, but it did
match 4 legitimate messages out of just over 10,000. They were notices
from sellers of ebay "wins", and the part after @ was paypal.com in all
4 cases.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail