procmail
[Top] [All Lists]

Re: How to extract multiple email addresses in To: header

2005-05-14 02:50:24
On Sat, May 14, 2005 at 03:12:34AM +0200, Dallman Ross wrote:

On Fri, May 13, 2005 at 10:47:08AM -0500, mark david mcCreary wrote:

Can someone point me to a recipe that will extract pure email
addresses (not the text verbage) from the To and Cc headers.

For example,

To: Joe Schmoo <joe(_at_)schmoo(_dot_)com>, Bob Smith 
<bob(_at_)smith(_dot_)org>
Cc: Sam Spade <sam(_at_)spade(_dot_)com>

I want to end up with a procmail variable containing

joe(_at_)schmoo(_dot_)com, bob(_at_)smith(_dot_)org, 
sam(_at_)spade(_dot_)com

Okay, I figured this out in all-procmail.  It's now more than an
hour past my bedtime, so it may well be that this can be further
improved.  But here it is.  Tested, seems to work:

Here are a couple of improvements and cautions that occurred to
me in bed. :-)


 ####################### start rcfile "get-addies.rc" #######################
 :0 
 * $ HDRFLD ?? $\MATCH.*\/[$ADDYSET]+(_at_)$AHOST[(_dot_)][a-zA-Z]+
 {
    STRIPPED_ADDRESSES = "${STRIPPED_ADDRESSES:+$STRIPPED_ADDRESSES, }$MATCH"
    SWITCHRC = $_
 }

It seems to me that if there are doubled-up addresses next to each other,
this will get stuck in an endless loop.  So we'll have to test for that
and exit or spring past the duplicate, somehow.  If we just want to
exit in that case, we would wnt to set MATCH to a varname before the
SWITCHRC, then compare it with the first part of the new MATCH after
the above condition.

We could try just springing over the repeat.  But what if there are
three or more repeats?  So I think we probably should just exit if
that happens.

IOW, it's not a problem for

   foo(_at_)bar bar(_at_)bar foo(_at_)bar baz(_at_)bar

but could be for

   foo(_at_)bar foo(_at_)bar baz(_at_)bar

(Comments and other punctuation could be between these words, also.)


 :0
 * HDRFLD ?? .
 { SWITCHRC }


 HOSTCL  = [a-zA-Z0-9-]
 AHOST   = "($HOSTCL+[.])*$HOSTCL+"
 ADDYSET = 'a-zA-Z0-9.=_+-'                # sensible set for address part

I want to try it with an exclusion set rather than an inclusion set,
but haven't done it yet.


 SPACE   = ' '
 TAB     = '  '

 HDRFLDNAME = ${HDRFLDNAME:-To}
 :0
 * $ ^$HDRFLDNAME:.*\/[^$SPACE$TAB].*
 { HDRFLD = $MATCH }

 MATCH
 SWITCHRC = $_


This part's a problem in that if there is no headerfieldname, e.g.,
no To:, we'll loop endlessly.  So we need to put the SWITCHRC
inside the curly braces:

   HDRFLDNAME = ${HDRFLDNAME:-To}
   :0
   * $ ^$HDRFLDNAME:.*\/[^$SPACE$TAB].*
   {
      HDRFLD = $MATCH
      MATCH
      SWITCHRC = $_
   }

-- 
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail