Re: Poll: Should mail archives hide mail addresses

2004-01-01 22:29:53
On January 1, 2004 at 20:28, Jeff Breidenbach wrote:

Finally, Chuq had a good point about requirements changing over
time. In the future, MHonArc may want to move towards encouraging more
semantic markup (read: in default output) similar to the <DIV> tags
found throughout mharc output. This allows easier hooks into
on-the-fly obfuscators and whatnot, with the added bonus of better CSS
interoperability. I think CSS is finally coming of age (check out, and I am looking forward to moving more and more
layout information from the mhonarc resource file to style sheets.

The problem with this approach is that it won't work with text-based
browsers.  Accessibility is something I try to maintain, therefore
I am reluctant to use measures that mandate particular types of

I first thought of using libgd to have address changed into CGI
links that generate an image on the fly with showing email address.
I.e. Harvesters would have to use OCR to get the address.

However, this will not work for text-based browsers, but I thought
it would be kind of nifty.

Another alternative is to remove linking of addresses, and then
using a obfsucation technique like:


This way the address renders like "earl(_at_)example(_dot_)com" (and can be
copy-n-pasted by readers to their MUA), but a harverster may not
catch it.  Of course, a smart harvester that expands entity references
and deletes comment declarations would.

I read a study dated March 2003 that showed that simple obfsucation
techniques actually work, but I think (and the study even states)
that it likely that it is a matter of time that spammers adapt.
Right now, there are so many un-obfsucated addresses, spammers are
not driven yet to deal with obfsucation techniques.  However, once
obfsucation is the norm, spammers will adapt.
uses a POST form to obfsucate addresses, but it is straight-forward
to customize a harvester to defeat it.

Therefore, for long term protection, obfsucation does not seem to be
the best method.  The image idea is nice since it is type of a Turing
test, and the image can be generated to give OCR systems trouble.
But, people using text-only browsers will not be able to determine
author addresses of messages.

I think it is valuable for users (mainly ones not subscribe to
any list) to be able to read the archives and have the
ability to contact individual authors directly.  Even I have done
such a thing when scanning archives of other lists.  Since text-only
browsers can still read the messages in the archives, is it okay that
they will not have the ability to determine the author's address if
an image-based solution is adobted?  Is this an acceptable limitation
weighed against the problem of spam?