procmail
[Top] [All Lists]

Re: How to extract multiple email addresses in To: header

2005-05-20 08:33:54
On Fri, May 20, 2005 at 08:40:11AM -0500, mark david mcCreary
wrote:

Thank you Dallman for your procmail code to solve this problem !!

You're welcome.  I found it an interesting exercise.

I've got it working on my machine, and even made some small
tweaks to create a version that extracts from both the To and Cc
headers, and returns all email addresses (but not the friendly
names) in both of those headers, in a comma separated list.

Here's the tweaked version.

Okay, I'll just say that some of your tweaks are not entirely
efficient and miss something about the built-in flexibility I
wrote in.  Let me explain:

   HDRFLDNAME = ${HDRFLDNAME:-To}

From you alterations, I infer that you are not seeing what that
can do.  It means (I'll use "varname" here for the variable name I
chose), "varname assumes its own identity unless it is unset or
null, in which case it takes the value 'To'."

Specifically, where you added in:

   HDRFLDNAME

   HDRFLDNAME = ${HDRFLDNAME:-Cc}

I can see that a point implicit in the var:-something syntax
escaped you.  Although your changes may work -- I don't disbelieve
you that it does, but it's screwy for me to look at and I don't
particularly want to test it, and I do have my doubts about its
working cleanly as altered -- maybe, in light of the explanation of
the syntax above, you will agree that it's not optimal to unset the
var merely in order to then give a new assignment statement saying,
"it's what it is, unless it's unset or null, in which case it's now
'Cc'." :-)

If we were going to go that route, we could just restate the
var in one fell swoop:

   HDRFLDNAME = Cc

and be done with it.  :-)

Btw, you can read the man pages for /bin/sh to see more about
how var:-value and its close siblings work.  (As the procmail man
pages say, it accepts the same syntax.)


Part of the reason I've gone to some length to explain is that
you've resented something I wrote mixed up with something you
wrote, and the casual reader/kibitzer won't be able to tell what's
whose.  Somehow that doesn't appeal to me.  I can already envision
the altered version's making go-rounds for months on the net and
then someone asking a question about it in the altered form and
wondering why it behaves funnily. :-)  (That kind of thing has
happened before; as has that a buggy early proposal I made to
the list made the rounds like that, but the corrected version
sent later the same day somehow was missed by all those glomming
on to the buggy one.  Months later, I had to explain why the
buggy one didn't work.)

I enjoyed solving the puzzle, and I'm not intending to sound
annoyed here.  But I just want to be careful about what we
are presenting.  Maybe it would help (me, you, and others)
if you commented the changes you make as yours, by way of
a cleaner documentation.  Months from now, when you revisit,
you'll be glad you were so professional about notating the
source.

What I'm coming to by way of advice is this: instead of collecting
the To's contents and then restating the value of the var to Cc and
running that through, do the following:  Skip grabbing the To and
Cc fields inside the recipe-set here.  Get those earlier on,
for example, like so:

   [ Setting for LINEBUF and var definitions for SPACE and TAB
     deleted here, but they would come earlier.]

   WS = "$SPACE$TAB"

   :0
   * $ ^To:.*\/[^$WS].*
   { H_TO = "$MATCH" }

   :0
   * $ ^Cc:.*\/[^$WS].*
   { H_CC = "$MATCH" }

   H_TOCC = "$H_TO$H_CC"

Okay, and now you can go on with 

 ####################### start rcfile "get-addies.rc" #######################

  :0
  # below condition will succeed only on first recursion of this file!
  # presumes the first time in we used "INCLUDERC" rather than "SWITCHRC".
  # This lets us recurse but only define the preliminaries the first time 
through.
  #
  * $! SWITCHRC ?? ^^$_^^
  {
    _ifs        = ${_ifs:-,$SPACE}
    HOSTCL      = [a-zA-Z0-9-]
    AHOST       = ($HOSTCL+[.])*$HOSTCL+
    ADDYCL      = "[^]><)([$SPACE$TAB,;:\"'@]"

    _HOLD       = $H_TOCC
  }


  :0 
  * $ _HOLD ?? ()\/$ADDYCL+(_at_)$AHOST[(_dot_)][a-zA-Z]+
  {
    ADDRESS = $MATCH
    STRIPPED_ADDRESSES = 
${STRIPPED_ADDRESSES:+$STRIPPED_ADDRESSES${_ifs}}$ADDRESS

    :0
    * $ _HOLD ?? $\ADDRESS\/.+
    {
      _HOLD = $MATCH
      SWITCHRC = $_
    }

  } _HOLD _ifs ADDRESS               # unset unneeded local vars

  LOG = "Stripped addresses: >$STRIPPED_ADDRESSES<
  "
 ######################## end rcfile "get-addies.rc" ########################


I just tested it. 

Btw, if you want the list delimited other than by comma+space, you pre-set _ifs.
E.g., set it to $NL or '|'.

-- 
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail