ietf-asrg
[Top] [All Lists]

RE: [Asrg] 7. Best Practices - DNSBLs - Article

2003-08-13 12:42:04
At 11:56 AM -0400 2003/08/13, Paul Judge wrote:

 2. As you mentioned, with blacklists you need the list of IP addresses. The
 problem is that the list of IP addresses in the headers will often include
 IPs of internal mail servers that organizations do not wish to reveal. So,
 you often have to reduce this to the set of IP addresses that come before
 the recipient's organization in order to make this data public.

For larger organizations, you may pass through multiple different network blocks. I submit that it won't be programmatically possible to detect and eliminate all of them. IMO, the best you can hope to do is to avoid the last hop in the "Received:" headers, and anything else on that same network.

And that's assuming that there isn't internally generated spam being sent by one customer of the ISP to another of the same ISP.

        Then there are RFC 1918 network blocks to be considered (or eliminated).


I think it might be easier to solve this problem by comparing the "candidate spam" IP addresses against the "candidate ham" IP addresses, and see if there are any duplicates. If there are, then they get removed from the "candidate spam" list (to try to avoid additional false positives).

 There are many intricacies here. The SpamAssassin guys have experienced them
 and within Spam Archive we've experienced them. It's just not as simple as
 you initially thought. It's far from impossible, but just requires some
 thoughtfulness. That is why I was outlining these three paths as potential
 paths for individuals to spend some time pursuing.

Indeed. Lots of nuances. And we've only started to begin to consider scratching the surface.


Speaking of information sources, it strikes me that we might be able to get the complete archives of relatively large numbers of mailing lists, most of which should either have a high percentage of "ham", or be something that can be processed according to modern anti-spam methods and sorted into "candidate spam" vs. "candidate ham".

For example, I know the listmaster at Apple, and he might be able to help us. Through the mailman mailing list (which Chuq also helps to run), we might be able to get archives of other large sources, especially including any of the lists hosted at python.org. I might also be able to dig up some contacts at AOL for their ListServ box.

        Would any of these information sources be of potential interest?

I mean, we're talking mailing lists with cumulative hundreds of thousands and maybe even millions of users, which should result in extremely large quantities of messages that could potentially be examined. Indeed, most of them are probably already publicly available via archives, it would just be a matter of getting more convenient access to them.

--
Brad Knowles, <brad(_dot_)knowles(_at_)skynet(_dot_)be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg