Re: How to use SPF to reject spam

Commerco WebMaster wrote:

Mr. Hociung,

Very nice work - a fairly long but entirely worthy read.


Thank you.

At some point it would be interesting to form an aggregated list ofscores from various individual reputation scoring data, in a way similarto the way the DShield.Org system works aggregating data on IP problems.

Absolutely. It will be very intereting to compare the various databases,and especially to analyze the mismatched stats. (ie, why does "serviceA" think that "opportunities.com" is a good-guy, when "service B" thinksotherwise)?

As regards, your remark "the main differentiator of these services willbe who they use as rating agents. I'd be perfectly happy with a servicethat uses cisco.com's feedback, but I don't care for one where Dick andHarry can have a say in." - I have no qualms with any data source, aslong as the data source is disclosed and weighted by appropriate metrics.
If a source is unreliable, a data aggregator should adjust theirindividual metrics to show this and bring that source's net influence inthe aggregated trust scoring down. Done properly, Dick and Harry could(and should) have a say if they are legitimate, but would have a nearzero say if they are not. It is certainly not to say that I don't trustCisco (I worked with one of their senior VPs and the wife of anotherbefore there ever was a Cisco - they are great professional people andwe love their products around here), but I think that any trust systemshould be open to all parties (not just the biggest ones), weightingeach initially the same and allowing the data from each source todetermine future weightings given to any given data source.

Well, I thought that having a short list of trusted agents would beappropriate for three reasons:


1. It is the easiest way (therefore prefered when possible)

2. Agents with a well-known and generally known as reputable brand nameare a worthwhile feature for the service. If a service lists smallshops, it is likely that potential clients will think "shady" and preferanother service with an obviously more serious list.

3. It will be desireable for the list of agents to be very stable intime. If it changes on a weekly basis (as it would when smaller playersare used), it will become questionable. Also, when a small player isremoved from the list, the database can be consired tainted, using theassumption that the small player was removed for falsifying info.(technically you can avoid this by keeping separate databases for eachreporting agent, and only publishing the aggregate results to the world,but the PR issues may be a lot more difficult to overcome).

Also, I think you may want to use the feedback of a mail operator thatdeals with tons of email, such that they have a significant sampling. Asmall shop does not have to opportunity for a significant sampling. Forinstance, if in a company of 10, one employee signs up to a smallmailing list for jokes run by someone at hotmail. Later he figures thatthe jokes are too offensive, and clicks the 'this is spam' button.Eventually he will find all jokes effensive, and filter the hotmailstuff altogether. The problem is that suddenly this outfit's opinion ofhotmail is 100% spam. This is only due to an insignificant sample size.On the other hand, there are probably at least 3000 employees withhotmail friends. If 300 of those think of their hotmail friends asidiots, the spam figure that cisco will report is 99% ham, 1% spam, amuch more significant report, given the much larger sample size.


Also note two important side-effects:

1. If cisco were to be used as a trusted reporting agent, all existentemail lists that pro spammers use will be instantly cleaned to removecisco destinations. This is a great incentive for a company like ciscoto participate in, as it will save them lots of money that theycurrently spend on spam-related infrastructure. They will still have tomaintain some, but the volume of spam they will have to handle will benoticeably smaller.

2. Due to the effect above, the reputation database will automaticallybe skewed, as it will no longer contain stats on new spammer domain names.

Ideally, #2 can be avoided if the service requires the reporting agent(Cisco in this case) to operate a honey-pot, and include it's results inthe reports. In light of the #1 side effect, this means "spend a littleto save a lot". The names of the honeypotted results need not bepublished, but their operation should not be delegated unless to mailoperator of equal reputation to Cisco's (or higher if it is possible).


Radu.