Can you tell me how do you measure your false positive rate??
Initially I reviewed the 'sendmail' log for about two weeks.
I had pre-built a whitelist and SFP fallback/override rule set
(see my old posts) for all important correspondents. I don't
remember seeing any false positives during the initial
high-scrutiny phase.
Now I define a false positive as when the sender of a bounced
message calls me on the telephone. If they are too clueless or
unmotivated to comprehend the bounce message and/or find my
phone number on the web-site, I didn't want to hear from them
anyway.
And how do you plan to interpret the whois data, what with all
the different formats etc out there.
One of the CPAN 'whois' modules normalizes most of the standard
fields into a nifty structure. This will be useful for checking
the registration dates and registrar. The rest I'll do ad-hoc.
For example, private registration domains at Network solutions
have a contact address like so:
aq3zb2cp7bs(_at_)networksolutionsprivateregistration(_dot_)com
That's pretty easy to match out with a perl regex. It's obvious
that spammers will gravitate to the ultra-low-cost registrars
like Godaddy. I don't know ANYONE I want to correspond with who
needs to buy their domain for $9.95 ($3 for bulk purchased
domains; I'll bet $1 can be negotiated if you're willing to buy
1000+ domains). If one turns up, I can whitelist it after giving
the correspondent hard time for registering with the bums! A
fairly limited number of registrars fall into this category; I'd
guess about ten or twenty at most--a manageable size for a
blacklist. Happily it costs about $50k to $100k, takes several
months, and requires complete disclosure of one's identity to
become a registrar. It seems rather unlikely that throwaway
registrars will ever develop as a problem.
David