Re: [Asrg] 2.c.1 and 6 - The importance of being Earnest
2003-04-11 04:32:21
Kee Hinckley wrote:
At 8:22 AM -0200 4/10/03, Kurt Magnusson wrote:
> these URL's and phone numbers and shrunk my indata to > ca 1500 entries,
but it got far more effective. From ca 20 spam
...........
I agree that it's a good filtering technique. However there are an infinite
number of email addresses and web urls. Plus there are
Skip the email addresses, we know they are fake. But in the spam
repositories mentioned in an earlier thread "Spam Corpuses", if they have
the full body, we can today block over 90 % of the existing websites and
call centers, without collect any new and possible tainted data. And it is
probably a ratio 100000:1 for e-mailaddresses to URLs/numbers. I.e. the data
will be much less than any mail address dbs.
Almost every present adverticers would need to register new domains and
contract new call centers or change phone no. As I wrote, if this need to
happen once or twice, the less economic stable adverticers
(sometime=spammers) will start to wonder if spam do pay.
many many ways to encode those, making the filtering process time consuming
unless it takes place on the end-user machine. It is also a
I looked at the data I have since one year back and there are 4 basic
URL-types, normal, undisguised; IP-numbers (easy to identify); IP-numbers
with decimal coding (or what it called - also easy to compare) and then the
"web disguised" URLs, with web coded chars, non readable.
The first 3 types is not a problem, it is just to get spams and extract them
for a match. The last is never used by ordinary e-mailcommunication, so when
URLs with several web/mime char combinations with %, ; , & and = is
encountered (as the =20 combination) assume spam and dump on site. We are
not talking web, but e-mail with web addresses in them.
reactive solution. Spam gets through until someone someone adds the new
data to the database. And that has to be a manual process.
I agree, that is the biggest disadvantage of the "earnest" method. But how
is the present spamlist do it today? Are they really fully automatic.
I havent done it today, but as I run it I could let my URL/phone pattern
filter process extract the data and update my database automaticly. I do
catch some unknown URLs because I still have email domains, used in earlier
spams, that way. I have not had one false positive since my last bug
correction some time ago, but ca 1 "new" spam a day is not caught.
Just now I have a process based on visual filtering, but it could be based
on me forwarding to an extra account doing this filtering and updating my
local data.
As you mentioned, if on a "global" basis, it could be tainted, by spammers
sending in mails, with correct domains, as well as people annoyed by a net
user or a company/organization.
A solution to this is the ISPs.
Their abuse-centers could have a spam mailbox, you could send spam related
mail to. The mailbox would do a first sorting against existing data and
baysian filters to remove the obvious ones, having maybe a minor number each
day to examin visually. They should also doing the first sorting with online
reporting to the block list maintainers, so the first report of a new
address, stops any further manual reviewing at other ISPs.
This function will only be for the ISP's, rest of us have to wait to when
the list maintainer releases a new list, once a day? It is also secure, to
not allow spammers to taint new data.
Spammers frequently include links to legit sites in their spam--you don't
want to accidentally blacklist those. Finally, there's a
Yes, I agree, this is not solved, but with the steps above, it should be a
lesser issue (in a global sense) and there have to be a process where a
legit site can contact the list maintainer, get a copy of the spam and
explain or start legal proceedings against the forger (if identified). As of
today, if you can explain the mail or prove its falsity, you get of the
hook.
question of how you share this information, and how you trust what gets
shared. Spammers could pollute a database with valid URLs, thus
Why do we trust the present blacklists, Verisign or Hotmail. The process is
similar to the present blacklisting, but with other data and _hopefully_
some support and finansing from the large ISPs/mail suppliers as well as the
larger national bodies, handling domain naming.
making people less likely to use it. And even the best intentioned users
screw up--that's one reason why people don't universally use IP
blacklists--too many false positives.
As I wrote, my idea is not perfect, it need other functions, some outlined
above. But, as I found out, it decreases the number of mail to process
further.
As I have it today, I have 50% of my spam from Korea, Japan and Taiwan,
these I loose to 100%, since they include ASCII-8 chars, not in english,
danish, norwegian, swedish, finnish, "baltics", islandic, german, french,
greenlandic or any other language I do not know, but could have a letter
with, because I identified some of these asian chars, loosing them before
they get to the spamfilter.
If you loose 90 %, the rest could be attacked more effectively and
forceibly, without extraordinary resources.
I think the technique is a great addition to a BCP for filtering. But I
can't see a way that it can be used universally. (Is anyone doing a BCP for
filtering?)
Hope some of the above answer this issue.
Regards Kurt
_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
http://join.msn.com/?page=features/virus
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
|
|