MHonArc: Anti-Spam Measures (RFC)

MHonArc Users,

Several MHonArc users have posted concerns about address harvesting
of their archives and what can be done to prevent user addresses from
getting harvested.  Some users have been able to use some MHonArc resources
to reformat pages to not show any addresses.  However, there is still
potential of addresses being exposed unless one knows some certain details
of MHonArc does.

For the next release, I am attempting to make modifications to make
address harvesting not a problem for those concerned.  By default,
MHonArc will do, or support, the following (regardless of resource
settings):

    o   The "<!--X-From: ... -->" comment at the beginning of message
        pages will now be "<!--X-From-R13: ... -->" where the value
        will have been passed through a slight variation of ROT13 to
        prevent the address listed from being harvestable.  Note, the
        mha-dbrecover utility will be modified to know how to read
        "<!--X-From-R13: ... -->" and it will also know about
        "<!--X-From: ... -->" for older pages.
        [* Feature implemented in development version *]

    o   Support for $FROMADDRNAME$ and $FROMADDRDOMAIN$, as mentioned in
        earlier posts.  This is to provide archive admins the ability
        to provide custom mail URLs/links without passing having the
        address in a form that is harvestable.
        [* Feature implemented in development version *]

Now, the above is not enough to avoid address harvesting, but are essential
in a complete solution.  In addition to the above, I am currently planning
(with some implementation already done) to have a SPAMMODE resource.  If
active, MHonArc will do the following:

    o   Addresses in message headers and what looks like addresses in
        message bodies will be modified so they are not valid "real"
        addresses but still useable by people.  For example, the
        address "someuser(_at_)corp(_dot_)foo(_dot_)com" will become
        "someuser(_at_)corp(_dot_)BogusPart(_dot_)foo(_dot_)com" (or something 
similiar --
        suggestions welcome).  This will allow user to still have the
        ability to respond to an address by just removing the "BogusPart".
        [* Feature implemented in development version *]

        Note, I did play with the idea of really obscuring the address.
        For example: someuser(_at_)corp(_dot_)foo(_dot_)com -> 
someuser(_at_)xxxxxxxxxxxx(_dot_)
        But I think it would be nice that the addresses are still usable
        by humans.  Thoughts?

        The modification of addresses in converted message bodies is
        done on the final filtered data.  I.e.  The modifications are
        done after the various MIMEFILTERS have already done their
        job.  This is to avoid dependencies on all the filters and
        to avoid the need to pass options to all applicable filters.

    o   The "<LINK REV="made" HREF="mailto:...";>" will not show up
        in message pages.  I.e. The default MSGPGBEGIN resource will
        not include the <LINK> tag if SPAMMODE is on.
        [* Feature implemented in development version *]

    o   The main index listing will only use $FROMNAME$ instead of
        $FROM$ by default.  I.e.  LITEMPLATE will have the default value
        if SPAMMODE is active:

            <LiTemplate>
            <LI><STRONG>$SUBJECT$</STRONG>
            <UL><LI><EM>From</EM>: $FROMNAME$</LI></UL>
            </LI>
            </LiTemplate>

        [* Feature *NOT YET* implemented in development version *]

My goal is to have "mhonarc -spammode ..." provide reasonable protection
from address harverters without the need for the user to do anything else.

I welcome comments on any of the above,

        --ewh