Re: [Asrg] Re: bounces, and anti-spam principles
2007-01-25 18:50:52
I'm grouping together responses to several individual
points on this thread.
[comment #1]
In any case, I still contend that simplistic blocking by
IP address
or domain name is a very poor approach, and for a whole
variety of
reasons.
I will contend that there cannot be a content filter
that can
reliably separate spam from non spam.
It doesn't NEED to be 100.000% accurate.
The bulk of mail most people receive comes from people
they are familiar with, and which fits certain patterns.
A given sender (mailing list etc) will typically have a
signature file, for instance. I know that Aunt Matilda is
NOT going to send me an E-mail containing a JavaScript
decryption routine, or an ActiveX enclosure. She also is
not going to send me an executable attachment. If stuff
like that arrives here, it is safe to presume it is NOT
from her, no matter what the From: address says (and even
if it WAS sent from her computer).
If you know what mail from your Yahoogroups AfricanViolets
mail looks like, you could for example specify to look for
that common content in mail claiming to come from that
mailing list.
Any non-spam message received
by one person would be spam if it were sent to 10
million harvested
addresses.
Sure, and that's why the recipient knowing the sender is
one of the key criteria. Stuff that you might accept from
soomeone you know and trust might be spam if someone you'd
never heard of sent it to you. It's ABSOLUTELY not enough
to test subject/from/IP/domain.
OTOH, taking the definition of spam as
Unsolicited Bulk
Email makes detecting a spamming IP address almost
trivial.
I guess that depends on what you call "bulk", and how you
propose to detect it. Again, whatever rule you put into
effect (on a global-type basis) is going to be discovered
by spammers and they will engineer their sending patterns
to avoid violating it. That's why you need a really
narrow and twisty 'gauntlet' they must negotiate, with
DIFFERENT RULES for different recipients, where they don't
know and basically can not figure out what rules they
would have to comply with to get a message through to a
particular person.
That said, there should be a default set of rules which
will get "safe/small" mail through from unknown senders,
as long as it doesn't "look like" spam (again,
SpamAssassin is not perfect, but it's pretty good once
HTML, scripting, and attachment ruses are denied to them
for the purpose).
The trick
is to stop accepting mail from that IP address only
until it has
cleaned up.
Again, when you have a LOT of users (and possibly MANY
servers) behind a NAT router, denying mail from that IP
address results in simply too much collateral damage.
More to the point, it's a very blunt instrument for the
job, and it's relatively simple to do very much better.
Once the spam is gone there is no need to
block the
address unless it has proven to be a repeat offender
without an
effective process for shutting the spammers down.
What about when the flow of spam is interleaved with all
sorts of good/important traffic as well?
[comment #2]
Speaking as an ISP, what's unrealistic in these utopian
end-user
filtering only arguments is costs.
Suffice it to say that we can add fast, capable mail
servers and see
them flooded in a matter of hours.
I'm not saying that end-user spam filtering is the ONLY
approach that should be used. On the other hand, it is
likely to be the most accurate and least objectionable
from a user standpoint. Plus, it is the most likely to
reject spam in a way that corresponds with how a USER
would decide it's spam. (I will open mail from a friend
with the same subject line that I would discard if it came
from someone I didn't know...)
Pushing all the filtering to the end-user would make
that much worse.
There is a lot of spam which is obvious. That includes
messages which contain links to known-spam-promoted Web
sites (at least in the absence of contradicting factors,
say being from a list discussing spam senders!)
It also includes, for example, messages which are
identical to messages that some number (dozens?
hundreds?) of other recipients at the same ISP have
already reported as being "spam". One would think that
ISPs could locate and perhaps recategorize identical
messages (again, perhaps tempered by a specific recipient
rule) which are still queued and have not yet been
delivered to their remaining customers.
Yahoo, for all their claims, does a pretty fair job of
only sequestering spam messages, although an awful lot of
obvious spam still curiously slips through their filters.
...and a user should be able to selectively prevent
blocking of mail that otherwise would get blocked.
But let me state again (and this is part of what made me
respond, starting this sub-thread) is that it is virtually
NEVER a good idea to send a bounce message
after-SMTP-time, because you can't be sure where to send
it, and most likely you are just harassing another
innocent victim. Far better to just toss the mail. If
you are going to alert anybody, it makes more sense to
offer the offending mail (tagged accordingly) to the
intended recipient so that THEY can make the final
decision on what to do with it.
Being able to "slam the phone down" on miscreant IP
blocks at the
accept() or helo is much, much, less processing than
going thru the
entire SMTP interaction and whatever it takes to pass
processing off to an end-user.
It's true that it costs less, but it's also true that it
blocks a lot of innocent and legitimate mail that might be
originating from the same IP address (NAT router?). There
could be dozens, hundreds, or even thousands of innocent
users affected.
IMHO, such innocent users who found their messages blocked
might have legal recourse against SOMEONE... it's simply
far too blunt an instrument.
Put another way, you can have almost unfiltered access
and
near-perfect spam filtering!
Here's how to do it:
Get your own link to the backbone.
Set up your own mail servers etc.
Hire one or more secretaries to pre-screen your email
according to
rules you have trained them in.
It might cost a few thousand a month, but surely in the
face of all
this expressed urgency about the pitfalls of centralized
filtering
it's a small price to pay.
As more and more businesses become dependent on the
Internet, and timely delivery of communications, such a
cavalier attitude is going to lead to business failures at
ISPs who don't realize that this isn't "just a hobby for
computer geeks" anymore.
[comment #3]
Absolutely, and that's a good reason why blocking by
either IP address or domain name is such a bad solution.
A fine-grained whitelist which specifies ALLOWED
behavior
on a per-sender basis, on the other hand, can easily
allow
or block messages from a given sender ON A
MESSAGE-BY-MESSAGE basis, so that their legitimate
messages get delivered but the (zombie) messages being
sent by their same (infected)machine, using the same
mail
servers and same permissions/certifications but which do
not look the way that sender's messages are expected to
look (by the recipient!) are efficiently and accurately
identified and blocked.
So "rehabilitation" isn't even an issue.
So the zombie becomes unable to emit spam, but there's
no incentive to
fix it so it's still available to the botmaster for use
as a C&C
machine, web/DNS server, and DDoS participant. I'd
prefer that it get
uninfected.
Obviously, that is ideal, but the problem is that after
(first!) SMTP time, the (intermediary, or final) recipient
doesn't really know who they ought to notify...!
Notifying the wrong person, or someone who has no control
over the situation, probably does more harm than good.
Again, I don't believe it is possible to prevent unwanted
mail from being injected into the Internet. What
ultimately will stop it is once its likelihood of success
is SO small that it's simply not worth attempting it.
People don't write viruses for Coleco ADAM computers
simply because there are very few of those connected to
the Internet. The chances of the author's creation
encountering a vulnerable system is simply too low.
[comment #4]
"spam" is a slang word, which is often used to describe
*A SUBSET OF*
unwanted email. Some legal jurisdictions have
legislation that defines
spam very narrowly. If you insist on blocking "spam",
you *WILL* end up
spending a lot of time and money in court cases where...
1) the spammer insists that his spam is "not-spam"
because of some
technicality. Expect to see lots of legal "is not spam;
is so; is not;
is so; is not" being billed at lawyers' regular rates.
And of course,
you can rest assured that the politicians who enact
legislation will
make exemptions for solicitations for campaign
contributions. Any
"spam-filters" that block any "not-spam" *WILL* get hit
with
cease-and-desist orders
That is one further reason why the RECIPIENT should be the
person to judge what they are and are not willing to
receive, and from who. Senders basically have no legal
recourse if somebody chooses to delete that sender's mail
from their Inbox, whether they have read it or not.
2) saying that Joe Blow sends spam is equivalant to
calling him a
spammer. Watch the defamation (libel/slander) lawsuits
fly.
There have already been such suits against blacklist
management organizations.
However, if you block "unwanted email" rather than
"spam"...
1) spammer says "wahhh, wahhh, wahhh, my 'valuable
information' is
'not-spam'" and you can enthusiastically agree. The the
customer still
doesn't want it. "Because I said so" should be
sufficient reason.
Right. And the recipient can reasonably set (even
completely arbitrary!) rules to determine that they do and
don't want delivered to their Inbox.
[snip]
Similarly, don't try to define "the S-word" in
technical terms. A
bunch of geeks sitting at their keyboards are no match
for a nit-picking
lawyer who was the captain of his class debating team.
It's effectively
a pro se defense against high-powered lawyers, and the
results are very
predictable. Don't engage in a battle you can't win.
Go with...
- our customer says he doesn't want your emails. No,
we don't know
why he doesn't want your emails.
- the customer is always right; end of story.
Don't give the spammers' lawyers anything to attack.
Bingo.
- I am a customer of clss.net (Aurora Internet)
- they have a modified Qmail that generates 550
SMTP-stage rejects
(i.e. *NOT* a DSN) based on a customer-configurable
control file in
the customer's home directory. There are separate
rule files for
sub-accounts. E.g. I point my domain MX at their
server. abuse and
postmaster are basically unfiltered compared to this
address.
- step 1 is to declare a whitelist of emails that I
accept
unconditionally
That's good, but I basically want finer control than
that... I want to be able to open up the window (like the
keyway on a lock) to allow the messages in that I expect
from each sender. Even a sender that I would accept an
executable attachment from, I might refuse a message
containing ActiveX or JavaScript.
- I don't want email from residential machines on
dynamic IP addresses
sending direct-to-MX. So I block based on dynamic IP
DNSbls, regexp
filter against rDNS, and obviously block email from
machines with no
rDNS whatsoever.
Obviously you can (and should) set the rules however you
want, as recipient. I wouldn't want, for example, my
ISP(s) forcing those same rules on me.
- I don't talk to myself. I don't want email from
people who lie in
their email, by including "waltdnes.org" in the HELO
or return-path.
So I block those emails.
Certainly reasonable!
- I don't want email from certain countries, so I block
them, using
country-codes in rDNS and return-path
Also reasonable enough, as long as you are setting those
rules for yourself. Personally, I WILL accept
(legitimate) mails from just about any country anywhere
(including particularly countries I have visited, and
that's a list of almost 50 countries). And on my travels,
I have sent E-mails from (say) Beijing. I would be
annoyed if those E-mails had been blocked just because I
happened to have sent them from China.
Again, your Inbox, your rules.
Executive summary...
- blocking email, because it meets some technical
criteria, is easier
on the technical side, but introduces legal problems
- blocking email, because the customer said so, may be
harder
technically, but avoids legal problems
- any complications on the anti-spam side are
outweighed by equivalant
complications on the spammers' end. ISPs will have
to enable end
users to configure their own rules, and everybody's
filters and
whitelists will be slightly different. Imagine how
spammers will
feel knowing that each of several million targets for
a spam-run has
a slightly different defense, that has to be overcome
in order to
deliver the email.
EXACTLY. But also, knowing that all the classical ruses
to avoid spam classification (text as image, embedded
links, attachments, scripting, disguised HTML links, etc
etc) are a priori denied them.... certainly takes a major
bite out of spammers.
And only allowing executable attachments, HTML, and "big"
messages from known/trusted senders basically eliminates
E-mail as a vector for virus/worm propagation, which takes
a big bite out of spambot zombie recruitment. That, all
by itself, is a huge improvement in the spam
detection/blocking situation.
[comment #5]
All I can say is, you are certainly welcome to block any
mail you please,
and no cooperation from other MTA operators is required,
nor is any
meeting of the IETF. The only purpose for the IETF
involvement is to
coordinate cooperative action. Since the IETF is
voluntary, the action
needs to be of benefit to all participants, and that
greatly restricts the
field of actions practical for widespread
implementation. But it doesn't
in any way restrict what you as an individual can do.
That's certainly true, and one advantage of fine-grained
recipient blocking is that it doesn't require any great
worldwide consensus, nor any re-engineering of Internet
infrastructure.
What WOULD be helpful, though, would be a recognition by
the IETF that:
a) such fine-grained per-sender by-recipient blocking
(and hopefully augmented by subsequent content scanning)
is an effective and desirable approach to the problem, and
b) in the general case, blocking of all non-whitelisted
E-mails containing HTML, scripting (probably covered under
HTML... is it possible to put in scripting without HTML?),
or attachments is a "best practice". (It is probably a
good idea to suggest including a maximum message size,
too, as a way of preventing "denial of service" attacks by
sending big E-mails to someone which would be expected to
fill their E-mail inbox to overflowing, blocking
subsequent legitimate E-mails).
That would at least provide a direction forward which
would make for a huge improvement, avoid the legal issues
of blocking e-mails too crudely, and take a big bite out
of spambot zombie recruitment. What's more, (as was
pointed out by another post), having millions of different
target recipient, each with different delivery criteria is
a far more daunting challenge to spammers.
Since your method requires no cooperation from any other
MTA operator, it
doesn't require any endorsement from this group.
Right, no endoresement is NEEDED, but (like the
introduction of the original IBM PC) it would be nice to
have it recognized as a useful direction. Spammers are
far more likely to be dissuaded from attempting to send
HTML-based or attachment-based spam if it is RECOGNIZED
that it's unlikely to be delivered, rather than it just
disappearing down a black hole somewhere and leaving them
believing that it's still a viable technique.
That is fine - it doesn't
make your method illegitimate or anything like that. But
most users wish
for a cooperative anti-spam technique, because they
reasonably expect it
will work better, and they reasonably expect many other
MTA operators to
cooperate with them.
And, if that's enough to satisfy them, chances are good
that the (cooperative!) "default" case (no HTML, no
attachments, messages < some maximum size, and message
passed by SpamAssassin or similar) would already
constitute a MAJOR improvement over existing spam
blocking. The whitelisting capability mostly just gives
the recipients the opportunity to tweak things further,
opening the keyway to allow more risky mail if they so
desire, or to block stuff they don't want that the ISP's
default scanning would still let through.
This has been true in the past -
consider the many
DNSBLs and other activities against spam. When we kept a
list of spamming
IP addresses sending to our MTA, we found after 2 weeks
that only 1% of
the IPs had send more than one message. Our subscription
to Spamhaus kills
about 65% of incoming messages. That is a victory for
cooperation and it
makes us think that more cooperation might be better.
Again, the problem is the degree of collateral damage that
IP-based blocking produces. I consider that to be
unacceptable, and perhaps creating legal liability. Now,
if the USER implements IP-based blocking, that's THEIR
choice and I don't believe any court would rule against
their right to do that. But an ISP is a very different
situation.
It is true that cooperative actions attract lawsuits,
but that is only
because it isn't practical to sue an individual for
refusing mail,
Not only is it not practical, but they have the ABSOLUTE
right to read or not read anything given to them
(certainly at least anything delivered by E-mail!).
[comment #6]
[how users configure their whitelist rules]
The problem being that out of the 60,000 seats here,
perhaps less than
10 of them are able to competently configure a set of
rules like what
you have.
That's a software implementation issue, not an inherent
problem in the approach. I envision a button to click on
that simply says "allow E-mails like this from the same
sender in the future" and where the software will open the
keyway JUST enough to allow that type of message if seen
again from that sender. How that recognition is
accomplished, whether by something crude like simple
GREP-type scanning, or something brain-damaged like RegEx
pattern matching, or something still more sophisticated
like the pattern matching SNOBOL/SPITBOL offers, or even a
different sort of statistical ranking/rating approach like
content scanners use... will vary from one implementation
to another. The final products will probably use a
combination of techniques.
Many of them don't even have a clear notion
of the concept of
"source IP" is, let alone being able to make reasonable
choices of, say,
knowing why you'd want to block dynamic IPs or IPs in
Korea.
Again, I consider IP-based blocking to be inherently
flawed, to the point where I consider it a dead-end.
Furthermore, and with complete irony, I'll note that the
only reason I
read this thread is that my very own, personally
trained, UA bayesian
filtering flung it all in the junk folder! ;-)
:-)
Yeah, I admit that I usually at least cast a cursory
eyeballing of the Yahoo mail "spam" folder too, rather
than just emptying it. Occasionally I -do- find a
non-spam message there. (Although that happens seldom, as
I almost never give that E-mail address to anybody... It's
almost useful as a "personal honeypot" to see what's being
spammed out, before going to my more usual E-mail accounts
and possibly wondering if that curious E-mail just MIGHT
be legitimate).
We're achieving effectiveness rates in excess of 98%
with our "one set
of rules" server based defences. My personal account,
which receives
400-600 emails/day, has 100 or more spams/day filtered
out by the
central server solution. I usually go a week or so
between spams that
get past those central filters - I see _many_ more FPs
with my bayesian
than I see spam getting through.
There will be FPs and spams get through, probably
regardless of what filtering technique you use. The
important thing is that the RECIPIENT controls that, so
they can decide the rule that determines what gets blocked
and what gets through. That way they don't have to wonder
what SHOULD have been delivered to them and wasn't.
My personally trained bayesian filtering has an
absolutely abysmal track
record.
Spammers have gotten good at throwing enough random junk
into E-mails to confuse Bayesian filters.
On the spam aimed at the false positive
handling address, which
by design has _no_ filtering, Bayesian has an
effectiveness rate of
about 50%. Yuck. No amount of personal twiddling,
custom rules,
explicit pattern matching in my UA is going to make much
difference to that.
Some E-mails are going to get through. But making sure
that they are (a) small, and (b) not "dangerous" at least
reduces the impact of those.
And meanwhile, giving the recipient the ability to at
least not see the SAME kind of stuff over and over again,
if they choose to use those features, demonstrates the
ISP's trying to give the user the tools to reduce the
frustration.
Gordon Peterson
http://personal.terabites.com
1977-2007 Thirty year anniversary of local area
networking
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Asrg] Re: bounces, and anti-spam principles,
gep2 <=
|
|
|