procmail
[Top] [All Lists]

Re: Duplicates sent from lists

2004-02-18 09:17:32

On 17 Feb 2004, at 20:54, Bart Schaefer wrote:

On Tue, 17 Feb 2004, LuKreme wrote:

    { ADDTOCACHE=`echo "$LISTNAME $LISTPOST" >> $LISTCACHE` }

I've often found it annoying that formail won't cache an arbitrary header
when using -D.  It'd be very nice to be able to say

* ? formail -xList-Post: -D 1024 listpost.cache

Yeah, that WOULD be nice. Looking at man formail though, I noticed this:

       -D maxlen idcache
Formail will detect if the Message-ID of the current message has already been seen using an idcache file of approximately maxlen size. If not splitting, it will return success if a duplicate has been found. If splitting, it will not output duplicate messages. If used in conjunction with -r, formail will look at the mail
            address of the envelope sender instead at the Message-ID.

would the -r option then be useful? Honestly, I can't remember the specification for which address is the "Envelope sender" and could find that in formails man page. In a list post is the envelope sender the "Return-Path" (the list) or the "From: " (the user)?


You can sort of hack it with:

* ? echo "Message-Id: `formail -xList-Post:`" | \
        formail -D 1024 listpost.cache

But that's three processes and plus a shell to do what one process could
have done if-only ... of course, you can extract List-Post with a MATCH
recipe to avoid one formail and the backticks, but still ...

Anyway, once you have listpost.cache, you can do something like this:

* ? formail -xTo: -xCc: | \
        fgrep --word-regexp --fixed-strings "$(strings listpost.cache)"

so listpost.cache contains a formail -D style list of addresses? is there an advantage to doing that over creating a straight text file with returns? I mean, the strings command is taking the cache file and returning it in the same format as $LISTCACHE would anyway, right?

* $ ? formail -xTo -xCc: | fgrep  --word-regexp $LISTCACHE

would be equivalent, no? (given the LISTCACHE recipe I posted previously).

Still, the formail -x will extract the ENTIRE TO and CC, and that's no good. We need a clean list of addresses with no additional comments.

 procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
 Procmail <procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE>
 Procmail Mailing List <procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE>
 Procmail List <procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE>

are all possible extracts from formail -xTO: -xCc: just from this list.

Still, this method uses formail to manage the cache of list addresses, so that's probably enough of an advantage right there, even if we need a different method of getting the 'clean' To and Cc headers. This could be a problem with multiple addresses being possible in both the TO and the CC.

:0
* ^To:[  ]*\/.*
{ TO=$MATCH }

:0
* ^Cc:[  ]*\/.*
{
   ExtendedTO=`echo "$TO $MATCH" | sed 's/,//g' | tr ' ' '\n'`
}

seems to work. It generates a bunch of extra kruft, but since we are only using this to fgrep against, i think it's acceptable.




BTW, doesn't fgrep imply --fixed-strings?

In addition, two variant programs egrep and fgrep are available. egrep is the same as grep -E. fgrep is the same as grep -F. zgrep is the same as grep -Z. zegrep is the same as grep -EZ. zfgrep is the same
       as grep -FZ.

and

       -F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by new-
              lines, any of which is to be matched.

??


--
"You're an elf and you're going to wear panties like an elf." David Sedaris, Santaland Diaries


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>