Re: Poll: Should mhonarc.org mail archives hide mail addresses

Finally, Chuq had a good point about requirements changing over
time. In the future, MHonArc may want to move towards encouraging more
semantic markup


The problem with this approach is that it won't work with text-based
browsers.  Accessibility is something I try to maintain,

Sure it will. Jeffrey Zeldman has a lot of useful information on how tobe accessible and compliant by degrading gracefully. you can starthere: http://www.happycog.com/lectures/access/ to get a first cut onthis. The idea is to build things that use XHTML/CSS such that ifcertain features aren't supported by a browser, the site does the"right thing" instead of simply breaking, and does it without buildingmultiple versions with browser sniffing. And accessible means more thansight-limited, it means alternative browsing tools, like my phone'smini-browser, and search engines like google.

So accessibility is good. CSS/XHTML is good. and since mHonarc getsused in so many sites where people have to skin an interface onto it, Ithink moving to those models is a great idea (and basically ano-brainer), once you get past a bunch of the myths about those tools.

I first thought of using libgd to have address changed into CGI
links that generate an image on the fly with showing email address.
I.e. Harvesters would have to use OCR to get the address.

and there's evidence that some harvester are experimenting in thatdirection. After all, it's only CPU time, and they're infinitelypatient. Even if they only get a 10-15% hit rate on OCR conversions,that merely means that have to hit the site 10 times to get everything.That was the ultimate failure of the slashdot "random" obfuscationtool: spammers didn't have to break all of them, just enough of them toget useful data, and then cycle through the site enough times to getaround the versions they didn't crack. took about a week.

Another alternative is to remove linking of addresses, and then
using a obfsucation technique like:

  earl<!--
  -->&#64;<!--
  -->example.com

This way the address renders like "earl(_at_)example(_dot_)com" (and can be
copy-n-pasted by readers to their MUA), but a harverster may not
catch it.  Of course, a smart harvester that expands entity references
and deletes comment declarations would.

be very wary of "fixes" that merely make the problem more difficult. Assoon as they have a financial incentive to crack them, they'll becracked. you're basically looking to try to implement the "I don't haveto outrun the bear, I just have to outrun you" solution, meaning youmake it tough enough to crack they go harvest someone else's site.

In the case of mHonarc especially, that's a bad design choice. Since somany sites use mHonarc, any change you make to mHonarc will be a focusof the spammers to crack. mHonarc doesn't have the option of making ittough enough for the spammers to go elsewhere. So you risk puttingenergy into things that won't fix the problem long (if at all), andworse, might create a false sense of security for developers and usersof the tools.

My suggestion: don't get involved in any "solution" that merely makesit "harder" or "causes more work", because they only solve things aslong as the spammers don't feel it's worth it. and if you get into anarms race with them, you'll lose. So you have to fix things in waysthey can't crack, or you probably shouldn't fix them at all.half-measures waste time and energy and give people a sense of comfortthat is worse than doing nothing.

I don't believe any obfuscation setup is safe. Period. They may worktoday, but if they ever get adopted widely enough to annoy thespammers, they'll be broken. And with their continuing to build hugefarms of zombied machines for delivery (which is what's hosed over theRBLs, the spammers have figured out how to hack around them by changingtheir delivery methods and using stolen system access), if they can usea machine for zombie delivery of spam, they can use that machine forcomputational work, too, so you should assume the spammers have aroughly infinitely large cluster of machines they can use to throwcycles at whatever you build. Because they do.

I read a study dated March 2003 that showed that simple obfsucation
techniques actually work, but I think (and the study even states)
that it likely that it is a matter of time that spammers adapt.


most of them are broken now. basically useless.

 Mail-archive.com
uses a POST form to obfsucate addresses, but it is straight-forward
to customize a harvester to defeat it.

anythign with a large enough data-set to warrant the spammer'sattention will get it. mHonarc, sort of by definition, will be high ontheir lists.

Obfuscation is a waste of energy. It works only as long as the spammersdon't bother worrying about it. Graphic representations arenon-accessible, crackable (via OCR) and not easily used by end-users,so they not only don't solve the problem, they create new ones.javascript-based and POST-based stuff, ditto -- you break in all sortsof systems today (like phone browsers) where people want access to thatdata, and it only holds off the spammers as long as they don't botherimplementing it. those aren't solutions, just delaying tactics. Bad useof time.

 Since text-only
browsers can still read the messages in the archives, is it okay that
they will not have the ability to determine the author's address if
an image-based solution is adobted?  Is this an acceptable limitation
weighed against the problem of spam?

I think a "guest" has no demand on access to sensitive data. I don'tallow "guests" open access to private mail lists, for instance, and Isee no reason why they should assume they should have access to it.

I think it's safe to extend that to data I consider sensitive orprivate. Just because we've always been open and that data isaccessible doesn't mean there's any requirement it remain so. Afterall, there was a time in life when few houses had locks on them, too.Times change. not only do we lock doors and windows, we build gatedcommunities.

I think the only safe way to do this is to make sure that thissensitive data is simply never in the data stream -- it's edited outbefore a user can get to it. If it's not there, it can't bede-obfuscated, it can't be reconstructed, it can't bereverse-engineered, because it's not there.

If people want more access, including that restricted data, then biulda system to let them authenticate in and be granted access. I thinkthat's more or less beyond the scope of mHonarc, but strongly relatedto it. In a perfect world, however you authenticate yourself to themaling list to prove that "you are you" for purposes of posting oraccepting list mail is how you'd authenticate into the archives, too,which implies this is probably a list-server operation which pulls dataout of mHonarc, not a mHonarc operation, unless you want to starttightly coupling all of these different pieces together. Which hasadvantages and disadvantages...

I'd probably argue against building data-stripping data into mHonarc,but perhaps a group of mHonarc folks would be interested in building aseparate-but-equal project (similar to mharc) to handle thedelivery/stripping/authentication piece, with hooks that allow it tointerface into other systems for authentication data, so it could,perhaps, use Mailman email addresses and passwords, or Sympa user datato simplify things for the users a bit.