ietf-asrg
[Top] [All Lists]

Re: [Asrg] 2.c.1 and 6 - The importance of being Earnest

2003-04-11 04:32:21

Kee Hinckley wrote:
At 8:22 AM -0200 4/10/03, Kurt Magnusson wrote:
> these URL's and phone numbers and shrunk my indata to > ca 1500 entries, but it got far more effective. From ca 20 spam
...........

I agree that it's a good filtering technique. However there are an infinite number of email addresses and web urls. Plus there are

Skip the email addresses, we know they are fake. But in the spam repositories mentioned in an earlier thread "Spam Corpuses", if they have the full body, we can today block over 90 % of the existing websites and call centers, without collect any new and possible tainted data. And it is probably a ratio 100000:1 for e-mailaddresses to URLs/numbers. I.e. the data will be much less than any mail address dbs.

Almost every present adverticers would need to register new domains and contract new call centers or change phone no. As I wrote, if this need to happen once or twice, the less economic stable adverticers (sometime=spammers) will start to wonder if spam do pay.

many many ways to encode those, making the filtering process time consuming unless it takes place on the end-user machine. It is also a

I looked at the data I have since one year back and there are 4 basic URL-types, normal, undisguised; IP-numbers (easy to identify); IP-numbers with decimal coding (or what it called - also easy to compare) and then the "web disguised" URLs, with web coded chars, non readable.

The first 3 types is not a problem, it is just to get spams and extract them for a match. The last is never used by ordinary e-mailcommunication, so when URLs with several web/mime char combinations with %, ; , & and = is encountered (as the =20 combination) assume spam and dump on site. We are not talking web, but e-mail with web addresses in them.

reactive solution. Spam gets through until someone someone adds the new data to the database. And that has to be a manual process.

I agree, that is the biggest disadvantage of the "earnest" method. But how is the present spamlist do it today? Are they really fully automatic.

I havent done it today, but as I run it I could let my URL/phone pattern filter process extract the data and update my database automaticly. I do catch some unknown URLs because I still have email domains, used in earlier spams, that way. I have not had one false positive since my last bug correction some time ago, but ca 1 "new" spam a day is not caught.

Just now I have a process based on visual filtering, but it could be based on me forwarding to an extra account doing this filtering and updating my local data.

As you mentioned, if on a "global" basis, it could be tainted, by spammers sending in mails, with correct domains, as well as people annoyed by a net user or a company/organization.

A solution to this is the ISPs.

Their abuse-centers could have a spam mailbox, you could send spam related mail to. The mailbox would do a first sorting against existing data and baysian filters to remove the obvious ones, having maybe a minor number each day to examin visually. They should also doing the first sorting with online reporting to the block list maintainers, so the first report of a new address, stops any further manual reviewing at other ISPs.

This function will only be for the ISP's, rest of us have to wait to when the list maintainer releases a new list, once a day? It is also secure, to not allow spammers to taint new data.

Spammers frequently include links to legit sites in their spam--you don't want to accidentally blacklist those. Finally, there's a

Yes, I agree, this is not solved, but with the steps above, it should be a lesser issue (in a global sense) and there have to be a process where a legit site can contact the list maintainer, get a copy of the spam and explain or start legal proceedings against the forger (if identified). As of today, if you can explain the mail or prove its falsity, you get of the hook.

question of how you share this information, and how you trust what gets shared. Spammers could pollute a database with valid URLs, thus

Why do we trust the present blacklists, Verisign or Hotmail. The process is similar to the present blacklisting, but with other data and _hopefully_ some support and finansing from the large ISPs/mail suppliers as well as the larger national bodies, handling domain naming.

making people less likely to use it. And even the best intentioned users screw up--that's one reason why people don't universally use IP blacklists--too many false positives.

As I wrote, my idea is not perfect, it need other functions, some outlined above. But, as I found out, it decreases the number of mail to process further.

As I have it today, I have 50% of my spam from Korea, Japan and Taiwan, these I loose to 100%, since they include ASCII-8 chars, not in english, danish, norwegian, swedish, finnish, "baltics", islandic, german, french, greenlandic or any other language I do not know, but could have a letter with, because I identified some of these asian chars, loosing them before they get to the spamfilter.

If you loose 90 %, the rest could be attacked more effectively and forceibly, without extraordinary resources.

I think the technique is a great addition to a BCP for filtering. But I can't see a way that it can be used universally. (Is anyone doing a BCP for filtering?)

Hope some of the above answer this issue.

Regards Kurt



_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>