Re: Poll: Should mhonarc.org mail archives hide mail addresses
2004-01-02 01:27:19
Finally, Chuq had a good point about requirements changing over
time. In the future, MHonArc may want to move towards encouraging more
semantic markup
The problem with this approach is that it won't work with text-based
browsers. Accessibility is something I try to maintain,
Sure it will. Jeffrey Zeldman has a lot of useful information on how to
be accessible and compliant by degrading gracefully. you can start
here: http://www.happycog.com/lectures/access/ to get a first cut on
this. The idea is to build things that use XHTML/CSS such that if
certain features aren't supported by a browser, the site does the
"right thing" instead of simply breaking, and does it without building
multiple versions with browser sniffing. And accessible means more than
sight-limited, it means alternative browsing tools, like my phone's
mini-browser, and search engines like google.
So accessibility is good. CSS/XHTML is good. and since mHonarc gets
used in so many sites where people have to skin an interface onto it, I
think moving to those models is a great idea (and basically a
no-brainer), once you get past a bunch of the myths about those tools.
I first thought of using libgd to have address changed into CGI
links that generate an image on the fly with showing email address.
I.e. Harvesters would have to use OCR to get the address.
and there's evidence that some harvester are experimenting in that
direction. After all, it's only CPU time, and they're infinitely
patient. Even if they only get a 10-15% hit rate on OCR conversions,
that merely means that have to hit the site 10 times to get everything.
That was the ultimate failure of the slashdot "random" obfuscation
tool: spammers didn't have to break all of them, just enough of them to
get useful data, and then cycle through the site enough times to get
around the versions they didn't crack. took about a week.
Another alternative is to remove linking of addresses, and then
using a obfsucation technique like:
earl<!--
-->@<!--
-->example.com
This way the address renders like "earl(_at_)example(_dot_)com" (and can be
copy-n-pasted by readers to their MUA), but a harverster may not
catch it. Of course, a smart harvester that expands entity references
and deletes comment declarations would.
be very wary of "fixes" that merely make the problem more difficult. As
soon as they have a financial incentive to crack them, they'll be
cracked. you're basically looking to try to implement the "I don't have
to outrun the bear, I just have to outrun you" solution, meaning you
make it tough enough to crack they go harvest someone else's site.
In the case of mHonarc especially, that's a bad design choice. Since so
many sites use mHonarc, any change you make to mHonarc will be a focus
of the spammers to crack. mHonarc doesn't have the option of making it
tough enough for the spammers to go elsewhere. So you risk putting
energy into things that won't fix the problem long (if at all), and
worse, might create a false sense of security for developers and users
of the tools.
My suggestion: don't get involved in any "solution" that merely makes
it "harder" or "causes more work", because they only solve things as
long as the spammers don't feel it's worth it. and if you get into an
arms race with them, you'll lose. So you have to fix things in ways
they can't crack, or you probably shouldn't fix them at all.
half-measures waste time and energy and give people a sense of comfort
that is worse than doing nothing.
I don't believe any obfuscation setup is safe. Period. They may work
today, but if they ever get adopted widely enough to annoy the
spammers, they'll be broken. And with their continuing to build huge
farms of zombied machines for delivery (which is what's hosed over the
RBLs, the spammers have figured out how to hack around them by changing
their delivery methods and using stolen system access), if they can use
a machine for zombie delivery of spam, they can use that machine for
computational work, too, so you should assume the spammers have a
roughly infinitely large cluster of machines they can use to throw
cycles at whatever you build. Because they do.
I read a study dated March 2003 that showed that simple obfsucation
techniques actually work, but I think (and the study even states)
that it likely that it is a matter of time that spammers adapt.
most of them are broken now. basically useless.
Mail-archive.com
uses a POST form to obfsucate addresses, but it is straight-forward
to customize a harvester to defeat it.
anythign with a large enough data-set to warrant the spammer's
attention will get it. mHonarc, sort of by definition, will be high on
their lists.
Obfuscation is a waste of energy. It works only as long as the spammers
don't bother worrying about it. Graphic representations are
non-accessible, crackable (via OCR) and not easily used by end-users,
so they not only don't solve the problem, they create new ones.
javascript-based and POST-based stuff, ditto -- you break in all sorts
of systems today (like phone browsers) where people want access to that
data, and it only holds off the spammers as long as they don't bother
implementing it. those aren't solutions, just delaying tactics. Bad use
of time.
Since text-only
browsers can still read the messages in the archives, is it okay that
they will not have the ability to determine the author's address if
an image-based solution is adobted? Is this an acceptable limitation
weighed against the problem of spam?
I think a "guest" has no demand on access to sensitive data. I don't
allow "guests" open access to private mail lists, for instance, and I
see no reason why they should assume they should have access to it.
I think it's safe to extend that to data I consider sensitive or
private. Just because we've always been open and that data is
accessible doesn't mean there's any requirement it remain so. After
all, there was a time in life when few houses had locks on them, too.
Times change. not only do we lock doors and windows, we build gated
communities.
I think the only safe way to do this is to make sure that this
sensitive data is simply never in the data stream -- it's edited out
before a user can get to it. If it's not there, it can't be
de-obfuscated, it can't be reconstructed, it can't be
reverse-engineered, because it's not there.
If people want more access, including that restricted data, then biuld
a system to let them authenticate in and be granted access. I think
that's more or less beyond the scope of mHonarc, but strongly related
to it. In a perfect world, however you authenticate yourself to the
maling list to prove that "you are you" for purposes of posting or
accepting list mail is how you'd authenticate into the archives, too,
which implies this is probably a list-server operation which pulls data
out of mHonarc, not a mHonarc operation, unless you want to start
tightly coupling all of these different pieces together. Which has
advantages and disadvantages...
I'd probably argue against building data-stripping data into mHonarc,
but perhaps a group of mHonarc folks would be interested in building a
separate-but-equal project (similar to mharc) to handle the
delivery/stripping/authentication piece, with hooks that allow it to
interface into other systems for authentication data, so it could,
perhaps, use Mailman email addresses and passwords, or Sympa user data
to simplify things for the users a bit.
|
|