ietf-asrg
[Top] [All Lists]

RE: [Asrg] 2. - Spam Characterization - Possible Measurements (wa s : RE: Two ways to look at spam)

2003-07-08 12:32:40

On July 8, 2003 at 06:00 paul(_dot_)judge(_at_)ciphertrust(_dot_)com (Paul 
Judge) wrote:
This relates to the idea that the only reason spammers can 
operate effectively is because they exploit thousands of 
hijacked computers which gives them location mobility (not 
geographic but in ip space.)

If this can be shown to be true via measurement it leads to 
the conclusion that perhaps the problem with spam is not what 
leads to this idea of a "consent" framework as originally 
proposed in this charter, but, instead, shows spam is almost 
entirely a security problem.

The consent framework does not contradict that spam is a security problem. I

I agree that "contradict" is far too absolute a term.

However, if someone was driving thru your otherwise quiet neighborhood
at 3AM blasting ads on a loudspeaker taking away or disabling that
loudspeaker is a lot of progress, maybe all one really needs.

Sure, if the guy gets out of his car (having lost the loudspeaker) and
insists on banging on doors instead build a fence with a locked
gate.

But knocking on every door is still pretty ineffective compared to
driving thru with a loudspeaker, maybe so ineffective that he won't
even try it, can't make a living that way.

Analogously, I've really come to believe that it's the amplification
through the use of thousands of hijacked computers which is at the
heart of the spam problem.

The problem is that nickel-dime spammers can currently utilize much
more hardware and bandwidth than they could ever possibly produce a
business model for and use it to harass, I don't know, 100 million
people, each dozens or hundreds of times per day. One has to get their
mind around the scope of what's going on.

I realize it'd also be nice to be able to have a generalized consent
framework for all kinds of uses. My issue is not one of mutual
exclusivity, only scale.

believe that the unauthorized use of resources (for sending messages as well
as those of the recipients) are grounds for that statement. Additionally,
the potential disruption in availability is another reason for such a view.

While one path is to focus on the 'security' of the hijacked computers, I
think that we have learned over the years that they will continue to find
new means of injecting their messages into the system (relays, dialups,
hijacked machines, proxies, webforms, free mail services, etc.).

I think if MS insists on continuing to produce such trivially hijacked
OS's and no general means is installed to counter-act that problem
(e.g., monitoring for virus-like behavior with automatic slowdown or
shutoff, or fixing the OS) then yes the problem is hopeless.

But thus far I can't even get much recognition (particularly outside
of a few people on this list) that this sort of thing, the hijacking
and amplification, is the real problem, and it's not analogous to
sticking up a "No Solictors, Please" sign but rather more like the
creep with the loudspeaker or someone breaking into cable or satellite
signals.

I think it's fair to say that this generalized problem is not entirely
unique to the internet and technologies have to go through some
changes to prevent widespread abuse of the infrastructure.

For example, blue boxes in the 1970's when the phone system allowed
people to manipulate corporate WATS lines and other long distance
billing services through the use of a few cheap parts from the corner
Radio Shack.

I remember in the late 80's there was some kind of security problem
with the satellite downlink (or feed uplink) for our cable TV system
in Boston and someone would inject prank programming.

Sure you're still left with end-to-end issues of all sorts and they
need to be addressed.

But maybe like telemarketing get rid of the massive fraud and
hijacking and you're mostly left with people who don't hide their
identity and are mostly controllable by more conventional means such
as the FTC's do not call list.

Although one can imagine fancy, intelligent "consent" systems attached
to their phone (and I realize some people have at least simple systems
which utilize caller-id etc) most of the problem is taken care of by
some amount of accountability and security, plus things like
do-not-call lists and general social pressures (imagine if a company
with an actual name and location began ringing everyone's phone every
hour with a pre-recorded ad, even if it was legal they couldn't stand
the backlash for long.)

I guess what I'm saying is that ultimately one has to think in terms
of society and productivity etc.

Having everyone managing their own consent software all day is not the
way to run things.

We have to find a way to get rid of 99% of this crap, and then sure if
they want to block their ex or some vendor they regret ever giving
their phone number to once in a while, fine, it happens.

But right now you could be typing in filtering/consent tuning 9-5 and
then go home.

Therefore, it seems that while some focus on each of these makes sense,
overall we should maintain the broader view of the problem. For example, you
mentioned that spammers exhibit an instability in the IP space. Is there a
way to measure this relative to senders of wanted messages? Can this
relationship be used to detect new sources of unwanted messages or to
determine stable sources of wanted messages? This is useful in the
prevention of unwanted messages and the preservation of wanted messages.
Such a heuristic could be part of reputation system.

Take a corpus of messages, say over a month:

Save the following tuple for each msg:

     <src ip, dest mailbox, first use, last use, spam?>

that's first/last use of that src ip / dest mailbox pair.

Obviously the spam flag has to be derived from some other measure like
does the msg trip (for some value of trip) spamassassin, rate above
some value.

Now do some descriptive statistics on last-first grouped by the spam?
flag, something like a Chi-Square or Student T ought to be sufficient
since you're looking for a discrete result; does last-first predict
(separate) whether it's spam?

What I think will be more interesting are the descriptive stats
themselves, such as what the mean and std deviations are for each
group.

Or maybe we'd find no such result but that's research.

-- 
        -Barry Shein

Software Tool & Die    | bzs(_at_)TheWorld(_dot_)com           | 
http://www.TheWorld.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD
The World              | Public Access Internet     | Since 1989     *oo*

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg