procmail
[Top] [All Lists]

RE: Help with redundancy

2007-08-03 16:18:44
Someone wrote me privately.  (I prefer to keep
the discussion on-list if it's pertinent to procmail.)


On 2-Aug-2007, at 16:15, Dallman Ross wrote:
Look:

    :0
    * ^From:(_dot_)*(_at_)\/[a-z0-9_.+-]+
    { FROM = $MATCH }

I prefer:

         :0
         * $ ^From:$WS+\/[^$WS]+
         { FROM=$MATCH }

Okay, I'm answering for two reasons.  One, I didn't
really look carefully at what the OP had coded when
I made a suggestion.  I simply noticed that he was
re-using either the same or similar code a bunch
of times, and I hurriedly copied and pasted it.  I
see that it's not really "From:" that he's wanting
to save, but rather the domain part only.  So
just change the name from FROM to DOMAIN or something.

Also, here is how to get the From: address.  Note
that there are potential problems, because sometimes
there is an @ sign in the "comment" part of the name,
and so on.  But we can probably presume the address
will be the same in most all cases anyway.  Also,
the RFCs actually do allow multiple From: addresses,
I think!  I've never actually seen one in the wild,
though.

Personally, I grab addresses from the From_ header instead,
because that's at least putatively the Envelope-
From.  But there can be legitimate reasons why the From:
address would be a different one.  In any case, here is something:


  NL  = '
' SP  = ' '
  TAB = '       '
  WS  = $SP$TAB

  ADDRESS = [^$WS]+(_at_)[^$WS]+[(_dot_)][a-z][a-z]+
  FROMADDRESS = "Bad or Missing"

  :0
  * $ 9876543210^0  ^From:.*<\/$ADDRESS>
  * $ 9876543210^0  ^From:.*[(]\/$ADDRESS[)]
  * $ 9876543210^0  ^From:.*[[]\/$ADDRESS]
  * $ 9876543210^0  ^From:.*[$WS]\/$ADDRESS([$WS]|^)
  * $ MATCH ?? ^^\/.*[^]>)$WS]
  { FROMADDRESS = $MATCH }

  LOG = $FROMADDRESS$NL

The scoring is because we prefer angle brackets, then
parentheses (old-style RFC address), then ordinary brackets,
then whitespace only for the From: address.

I tested the above on the several hundred messages in
my current spam folder (rotated daily).  It seems to work
fine.  ("distrib" is an alias of mine.)

 1:02am [~/Mail/spam] 714[0]> sh -c "cat * |\
        formail -s procmail  -m  ../rc 2>&1" | distrib | head
   5 Bad or Missing
   4 MAILER-DAEMON(_at_)mail101(_dot_)store(_dot_)mud(_dot_)yahoo(_dot_)com
   4 macon691(_at_)rdslink(_dot_)ro
   3 MAILER-DAEMON(_at_)overstock(_dot_)com
   3 legbienenzuchtvuf(_at_)bienenzucht(_dot_)com
   3 lifbluecomfig(_at_)bluecom(_dot_)dk
   3 lifbluerosesfig(_at_)blueroses(_dot_)org
   3 nogbabymamanfiw(_at_)babymaman(_dot_)com
   2 KellyLe16(_at_)yahoo(_dot_)com
   2 MAILER-DAEMON(_at_)polarcomm(_dot_)com

FYI, here are the "bad or missing" ones:
 1:02am [~/Mail/spam] 715[0]> frm BadOrMissing
MAILER-DAEMON         Delayed message: You've received a postcard from a
Mate!
MAILER-DAEMON         Undelivered Mail Returned to Sender
                      ***SPAM*** Latest info on Events around the Middle
East.
                      **Message you sent blocked by our bulk email
filter**
                      ** Message blocked **


As for the private emailer's suggestion, it is not
ideal.  Specifically, this condition:

   * $ ^From:$WS+\/[^$WS]+

ought to have been written like this (note the brackets):

   * $ ^From:[$WS]+\/[^$WS]+

But even that results in absolutely no different result 
from this one:

   * $ ^From:.*\/[^$WS]+

That's because of procmail's natural leftward-sparse
matching, and rightward-greediness after the match token.
We'll look for anything at all before the match token,
and procmail will prefer not to find anything at all,
if it can get away with that.  But it can't (assuming
there's some non-$WS character in the From: header), 
because we're requiring a match starting on the first
non-whitespace char.

Dallman
____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>