Re: Poll: Should mail archives hide mail addresses

2004-01-02 18:32:33
On January 1, 2004 at 23:26, Chuq Von Rospach wrote:

The problem with this approach is that it won't work with text-based
browsers.  Accessibility is something I try to maintain,

Sure it will. Jeffrey Zeldman has a lot of useful information on how to 
be accessible and compliant by degrading gracefully. you can start 
here: to get a first cut on 
this. The idea is to build things that use XHTML/CSS such that if 
certain features aren't supported by a browser, the site does the 
"right thing" instead of simply breaking, and does it without building 
multiple versions with browser sniffing. And accessible means more than 
sight-limited, it means alternative browsing tools, like my phone's 
mini-browser, and search engines like google.

You are straying.  The CSS/XHTML approach was brought up as
a form of implementing dynamic obfsucation of addresses.  And as
I have noted, and you emphasize, obfsucation is extremely limited
as spammers will adapt.

So accessibility is good. CSS/XHTML is good. and since mHonarc gets 
used in so many sites where people have to skin an interface onto it, I 
think moving to those models is a great idea (and basically a 
no-brainer), once you get past a bunch of the myths about those tools.

MHonArc is neutral about CSS/XHTML since a user can customize the
layout as they see fit.  I think talking about CSS/XHTML is off-topic
unless someone provides a case of how it can be used to deal with
the harvesting problem.

I first thought of using libgd to have address changed into CGI
links that generate an image on the fly with showing email address.
I.e. Harvesters would have to use OCR to get the address.

and there's evidence that some harvester are experimenting in that 
direction. After all, it's only CPU time, and they're infinitely 
patient. Even if they only get a 10-15% hit rate on OCR conversions, 
that merely means that have to hit the site 10 times to get everything. 
That was the ultimate failure of the slashdot "random" obfuscation 
tool: spammers didn't have to break all of them, just enough of them to 
get useful data, and then cycle through the site enough times to get 
around the versions they didn't crack. took about a week.

Actually, what some are doing is using Net people to do the work.
I.e. They post the image to a porn site and require people to solve
it before entering.  This what some are doing to auto-create Yahoo,
Hotmail, and similiar types of accounts for sending out spam.

Now, there is always cost-benefit ratio.  Wrt to account creations, the
benefits out-weight the cost.  But, to do it for each email address,
it may not be, especially if the graphics include techniques that
OCR systems cannot deal with.

Another alternative is to remove linking of addresses, and then
using a obfsucation technique like:


This way the address renders like "earl(_at_)example(_dot_)com" (and can be
copy-n-pasted by readers to their MUA), but a harverster may not
catch it.  Of course, a smart harvester that expands entity references
and deletes comment declarations would.

be very wary of "fixes" that merely make the problem more difficult. As 
soon as they have a financial incentive to crack them, they'll be 
cracked. you're basically looking to try to implement the "I don't have 
to outrun the bear, I just have to outrun you" solution, meaning you 
make it tough enough to crack they go harvest someone else's site.

I made the statement about the problems of obfsucation, even
in reference to the above.

One can look at the obfsucation model as similiar to detering
crime.  For example, a professional car thief can steal any car,
but if you make your car more time consuming to steal, they will
go elsewhere the cost is less.  Also, with certain measures, you
deter amateur thiefs.

Obfsucation works on a similiar principle.  Of course, if you
become a worthy target, a spammer may take the time to break
any obfsucation techniques (with the Slashdot story you provided
as a good example).

In the case of mHonarc especially, that's a bad design choice. Since so 
many sites use mHonarc, any change you make to mHonarc will be a focus 
of the spammers to crack. mHonarc doesn't have the option of making it 
tough enough for the spammers to go elsewhere. So you risk putting 
energy into things that won't fix the problem long (if at all), and 
worse, might create a false sense of security for developers and users 
of the tools.

Actually, MHonArc allows you to complete hide addresses if you want.
However, there is one slight item that does expose author addresses,
so I personally can create a robot to harvest all author addresses from
an archive despite any resource settings by the archive maintainer.

I hope to fix this gap in a future release (once the Savannah
folks fix some issues with CVS).

My suggestion: don't get involved in any "solution" that merely makes 
it "harder" or "causes more work", because they only solve things as 
long as the spammers don't feel it's worth it. and if you get into an 
arms race with them, you'll lose.

I already stated something similiar to this.  Ideally, there is a
solution that does not use obfsucation but allows a human to determine
addresses.  Hence, the image idea.  But even that is technically a form
of obfsucation, wrt to a computer.  But when dealing with computers,
some forms of obfsucation may be sufficient if you start getting
to Turing level obfsucation.
uses a POST form to obfsucate addresses, but it is straight-forward
to customize a harvester to defeat it.

anythign with a large enough data-set to warrant the spammer's 
attention will get it. mHonarc, sort of by definition, will be high on 
their lists.

Technically, not MHonArc, but list archives.

Obfuscation is a waste of energy. It works only as long as the spammers 
don't bother worrying about it. Graphic representations are 
non-accessible, crackable (via OCR) and not easily used by end-users, 

I'm not talking about end-users.  I am talking about the
list archives, and only those archives.  Now, users may learn some
things in this discussion any what they may want to do with their
archives, but that is it.

The only thing relevant to MHonArc is that it allows users to
apply whatever solutions they want.

I think a "guest" has no demand on access to sensitive data. I don't 
allow "guests" open access to private mail lists, for instance, and I 
see no reason why they should assume they should have access to it.

The lists are not private lists.  MHonArc is an open
source project, and all the lists are intended to be as open as