RE: Separate incoming mail into 4 categories


Thanks for the advice and I have tried your script and it works (with
typo correction) so far for the basic 4 categories test under manual
setup; I will let it run in real to see whether it would sort thing into
right place.  

This is just the beginning of my effort to trying to understand the
procmail beyond the most basic and to be honest, many of the script you
wrote will take me a while to comprehend its intransic meaning. Such
kind crptic writing in Unix is not easy for the causal user. 

However maybe one can check the validity of my reasoning. What I think
is that instead of thinking about how to get rid of the junk mails,
maybe I should concentrate on thinking about the sender and recipent.
Basically I only need to pay attention to those with both sender and
recipient are known to my system.

For those match neither, it is for bin straight away. 

If it match from_list only, the mail could be for an ex-staff of the
company. 

If it match to_list only, then it could be a new contact. 

Only these two categories need to be examined by an operator or doing
further sorting using the subject of the mail.

It is easy enough for the computer to generate an updated from_list and
to_list from company database system.


Cheers,
Kwang-Fuh Lee.




At 22:43 2007-01-04 +0800, DR. Lee - NS3 wrote:

I am trying to develop a simple mechanism to separate the incoming mail
into four categories by using a from_list and a to_list. From_list 
contains the fully qualified e-mail addresses of the known senders and 
the to_list contain the addresses of the known recipients.

The logic is as following:


The Procmail mantra, please recite it with me:

         Procmail is not an MTA.

If someone addresses somehting with a BCC, you're hosed because you
won't 
see the contents of that header.  If the path to you involved the
message 
being sent twice (but with the same cleartext addressees), you'll end up

distributing it twice to BOTH sets of recipients.

It certainly appears that you're trying to manage a mailing list in a 
convoluted fashion.

 if it is in the (to_list) then
   it is forwarded to a match_both mailbox if the it is also match
from_list
   otherwise goes to match_to mailbox
If it is not in to_list but match from_list, it goes to match_from
  otherwise it goes to match_neither.

Since the To: address sometimes has a <> sometime does not, the program
checks both type..


Note that the To: address may contain multiple addresses.  If your
recipe 
as written enounters such a header, you're only going to grab the first
token.

This program looks like working superfically but not exactly; the
problem seem to do with how grep -f works. After use gawk to yank out 
the mail address like kflee(_at_)penit(_dot_)com, the match of this against 
the 
list seem behave inconsistently.


Check my sandbox config - in there, I have various header address 
extractions.  See if they work for what you're trying to accomplish.

You should try extracting the addresses SEPARATE from the grep, then
emit 
them to the VERBOSE log with a character delimiter around them to ensure

you're getting the string you believe you're getting.  Since the string
is 
part of a pipeline, you don't see it as a passed argument anywhere, even
if 
you're running with VERBOSE logs.  i.e. you THINK you have an address
like 
"kflee(_at_)penit(_dot_)com" but you very probably do not.

There's another very good reason to do the header extractions separatley

and assign them to variables: then you're only doing that expensive awk 
pipeline ONCE for each of the two headers, instead of TWICE.  Get the
from, 
massage it and save it.  Get the to, massage it and save it.  Then do
your 
lookups (also only once each).  File mail based on the saved results of
the 
two lookup operations.

Follows is a wholly untested rewrite which might help to get you on the 
right track:


LOGFILE='/home/kfl_root/log'
VERBOSE=yes
MATCH_TO='match_to(_at_)penit(_dot_)com'
MATCH_FROM='match_from(_at_)penit(_dot_)com' 
MATCH_BOTH='match_both(_at_)penit(_dot_)com'
MATCH_NEITHER='match_neither(_at_)penit(_dot_)com'

# (pulled from my sandbox)
# get the From: address as an address component ONLY (no comments) :0 h
CLEANFROM=|formail -IReply-To: -rtzxTo:

# You'll want to do some further scrubbing of this (esp if there are #
multiple addresses), but doing this extraction spares you having to #
perform a grep at the start of your pipeline. :0
* ^To:[         ]*\/[^  ].*
{
         TO=$MATCH
}

# this should strip standard comments, and address bracketing, # then
put each result on a separate line (the TR is for that) MY_TO=`echo
${TO}| sed -e "s/\"[^\"]*\"//" \
         -e "s/\(<\([^>]*\)>\)/\2/g" -e "s/^[    ,]*//" \
         | tr -s "       ," "\n"`

# (do lookups)
# the file lookups are performed this way because you then have the #
actual match string in the variable, which is a lot more useful as # a
diagnostic.  You can simplify later if you choose. :0
* ! MY_TO ?? ^^^^
{
         TO_MATCHED=`grep -iF ${MY_TO} to_list`
}

:0
* ! CLEANFROM ?? ^^^^
{
         FROM_MATCHED=`grep -i ${CLEANFROM} from_list`}{
}


# (take action)
:0
* ! FROM_MATCHED ?? ^^^^
{
         # from matched
         :0
         * ! TO_MATCHED ?? ^^^^
         ! ${MATCH_BOTH}

         :0
         ! ${MATCH_FROM}
}

:0
* ! TO_MATCHED ?? ^^^^
! ${MATCH_TO}

:0
! ${MATCH_NEITHER}

---
  Sean B. Straw / Professional Software Engineering

  Procmail disclaimer:
<http://www.professional.org/procmail/disclaimer.html>
  Please DO NOT carbon me on list replies.  I'll get my copy from the
list.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail