procmail
[Top] [All Lists]

Re: setting up an *opt-out* big brother

2003-03-20 13:20:27
At 08:09 2003-03-20 +0000, Nancy McGough did say:
I'm also working on a "Big Brother" scenario but it's for
archiving and users know its happening and can opt-out by using a
header like this

 X-ExampleCom-Archive: no

Uh, I'm unclear - users sending mail _into_ your system can avoid having their mail archived? I don't follow why the sender should be the one making this decision. Plus, the bulk of inbound mail is either from discussion lists, or through services where the sender lacks the ability (AOL, web/freemail services) or simply the knowhow of making a _header_ change, or even the mere knowledge of the setting option.

My first thought was to put a recipe like this at the top of
/etc/procmailrc

 :0 c
 * ! ^X-ExampleCom-Archive:.*no
 ! archiver(_at_)example(_dot_)com

Rummage the procmail list archives for a few months back - I've detailed a couple of different approaches to selective participation in /etc/procmailrc - from using unix group membership (obviously something a sysadm would control), checking for the existance of a file in the users home dir, external db queries, or whatever.

:0c
* ? somedblookupprogram $LOGNAME
action


For instance, the condition line could be:

* ? groups $LOGNAME | grep -q \\\<webmail\\\>

which would identify users who are in the webmail group.

Better yet would be to obtain the groups _once_:

GROUPS=`groups $LOGNAME`

:0c
* GROUPS ?? [   ]archive\>
action

and then the archiver account would be set up with its own
~archiver/.procmailrc that will extract data and put it in a
mysql database, deliver the message to appropriate IMAP shared
mailboxes, etc.

Via sendmail, you could just invoke an /etc/procmailrcs/archiver.rc script through an alias, like so:

archiver        "|/usr/bin/procmail -m /etc/procmailrcs/archiver.rc"

(keep in mind that if your archiver isn't invoked via an alias, or directly from within the original procmailrc, you'll need to have exclusionary logic to keep the archiver from forwarding to the archiver).

However, _forwarding_ to an archiver account doesn't seem like the proper approach - the archiving should be invoked directly from the /etc/procnmailrc:

:0c
* condition
{
        INCLUDERC=archiver.rc
}

Reasons:

        1. You don't re-run sendmail or procmail (you fork a new copy, but
        that's different).  That means it's faster, and less memory/CPU
        intensive - sure overall, *A* message isn't that much work, but if
        you're going to more than double the CPU work for every message
        coming into the system, it's a concern.

        2. No additional headers (a forward would have an extra received line
        and a different From_).

        3. No need for a separate "archiver" user.

        4. You still have access to $LOGUSER as the _original_ recipient
        account.

        5. No forward, no need for an X-Loop logic.

        6. No forward, high unlikelyhood of a BOUNCE (only if you terminate
        your SQL insertion with a failure).

I'm sure there are a variety of other benefits, but these spring immediatley to mind.

But, I'm mulling over the prospect of *loops* and *duplicates*

Dupes, you handle via a formail check, as described in the procmailex manpage. However, you've got to maintain separate databases for the individual recipients. I don't know how you're maintaining the original envelope recipient if you're forwarding the message (i.e. how would you restrict the reader of the message to the original recipient only?), but if invoked from within the _original_ /etc/procmailrc, you've got $LOGUSER right there to pass to your SQL insertion function.

Also, if *I* were storing messages to an SQL database for many users of a busy system, I'd be rather prone to generating a CRC32 or somesuch of the body (separate from the headers) and store single copies of known duplicate bodies (note: not just based on matching CRC32 - use CRC32, size, and From_, and heck, if those match, then COMPARE the current message body against the one already in the DB), with references to each of the recipients, with separate storage for the individual copies of headers.

If users have the ability to _delete_ messages from the archive, you'd delete the user's own copy of the headers, and remove the reference to that user from the body, and if the users reference for the body becomes a NULL list, THEN you delete the body. With a shared storage system, you can archive a *LOT* of email for a *LOT* of users without gobbling up disk space anywhere near as quickly as you would if you were storing each message separatley.

As for loops, as long as the archiver isn't _sending_ email, you're not going to _generate_ any. Of course, if the account which is being archived is subject to looping, all it's inbound mail will be copied to the archiver, but there's not a lot you'll be able to do about that.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>