Re: [Asrg] 2.c.1 and 6 - The importance of being Earnest


Kee Hinckley wrote:
At 8:22 AM -0200 4/10/03, Kurt Magnusson wrote:

> these URL's and phone numbers and shrunk my indata to > ca 1500 entries,but it got far more effective. From ca 20 spam

...........

I agree that it's a good filtering technique. However there are an infinitenumber of email addresses and web urls. Plus there are

Skip the email addresses, we know they are fake. But in the spamrepositories mentioned in an earlier thread "Spam Corpuses", if they havethe full body, we can today block over 90 % of the existing websites andcall centers, without collect any new and possible tainted data. And it isprobably a ratio 100000:1 for e-mailaddresses to URLs/numbers. I.e. the datawill be much less than any mail address dbs.

Almost every present adverticers would need to register new domains andcontract new call centers or change phone no. As I wrote, if this need tohappen once or twice, the less economic stable adverticers(sometime=spammers) will start to wonder if spam do pay.

many many ways to encode those, making the filtering process time consumingunless it takes place on the end-user machine. It is also a

I looked at the data I have since one year back and there are 4 basicURL-types, normal, undisguised; IP-numbers (easy to identify); IP-numberswith decimal coding (or what it called - also easy to compare) and then the"web disguised" URLs, with web coded chars, non readable.

The first 3 types is not a problem, it is just to get spams and extract themfor a match. The last is never used by ordinary e-mailcommunication, so whenURLs with several web/mime char combinations with %, ; , & and = isencountered (as the =20 combination) assume spam and dump on site. We arenot talking web, but e-mail with web addresses in them.

reactive solution. Spam gets through until someone someone adds the newdata to the database. And that has to be a manual process.

I agree, that is the biggest disadvantage of the "earnest" method. But howis the present spamlist do it today? Are they really fully automatic.

I havent done it today, but as I run it I could let my URL/phone patternfilter process extract the data and update my database automaticly. I docatch some unknown URLs because I still have email domains, used in earlierspams, that way. I have not had one false positive since my last bugcorrection some time ago, but ca 1 "new" spam a day is not caught.

Just now I have a process based on visual filtering, but it could be basedon me forwarding to an extra account doing this filtering and updating mylocal data.

As you mentioned, if on a "global" basis, it could be tainted, by spammerssending in mails, with correct domains, as well as people annoyed by a netuser or a company/organization.


A solution to this is the ISPs.

Their abuse-centers could have a spam mailbox, you could send spam relatedmail to. The mailbox would do a first sorting against existing data andbaysian filters to remove the obvious ones, having maybe a minor number eachday to examin visually. They should also doing the first sorting with onlinereporting to the block list maintainers, so the first report of a newaddress, stops any further manual reviewing at other ISPs.

This function will only be for the ISP's, rest of us have to wait to whenthe list maintainer releases a new list, once a day? It is also secure, tonot allow spammers to taint new data.

Spammers frequently include links to legit sites in their spam--you don'twant to accidentally blacklist those. Finally, there's a

Yes, I agree, this is not solved, but with the steps above, it should be alesser issue (in a global sense) and there have to be a process where alegit site can contact the list maintainer, get a copy of the spam andexplain or start legal proceedings against the forger (if identified). As oftoday, if you can explain the mail or prove its falsity, you get of thehook.

question of how you share this information, and how you trust what getsshared. Spammers could pollute a database with valid URLs, thus

Why do we trust the present blacklists, Verisign or Hotmail. The process issimilar to the present blacklisting, but with other data and _hopefully_some support and finansing from the large ISPs/mail suppliers as well as thelarger national bodies, handling domain naming.

making people less likely to use it. And even the best intentioned usersscrew up--that's one reason why people don't universally use IPblacklists--too many false positives.

As I wrote, my idea is not perfect, it need other functions, some outlinedabove. But, as I found out, it decreases the number of mail to processfurther.

As I have it today, I have 50% of my spam from Korea, Japan and Taiwan,these I loose to 100%, since they include ASCII-8 chars, not in english,danish, norwegian, swedish, finnish, "baltics", islandic, german, french,greenlandic or any other language I do not know, but could have a letterwith, because I identified some of these asian chars, loosing them beforethey get to the spamfilter.

If you loose 90 %, the rest could be attacked more effectively andforceibly, without extraordinary resources.

I think the technique is a great addition to a BCP for filtering. But Ican't see a way that it can be used universally. (Is anyone doing a BCP forfiltering?)


Hope some of the above answer this issue.

Regards Kurt



_________________________________________________________________

MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.http://join.msn.com/?page=features/virus


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg