procmail
[Top] [All Lists]

Re: howto make shure to get the right Received header

2005-07-30 06:06:47
On Sat, Jul 30, 2005 at 09:04:50AM +0000, Matthias Haeker wrote:

[Dallman Ross wrote:]
 
The list archives cover the topic of finding (a) the top
Received only; (b) the bottom one only; and (c) finding a
specifically numbered one.

to make clear i understood now what i was looking for:

A. Top Received:

 :0
 * ^Received:.*\/[^         ].*
 { RCVD_TOP = $MATCH }



so i changed my recipe
to:

:0
* $ ^Received:.*\/[^$WS].*
{ X_RECEIVED=$MATCH
    :0
     * $ MATCH  ??  [[]\/$DOTQUAD
     { X_SENDER_IP="$MATCH" }
}

will MATCH on the first "Received:"  
containing the pattern .* = anything
and will fill the $MATCH variable
with all it finds from the first non whitespace on

and in the nested recipe on the $DOTQUAD behind the "[" if any
what is not the case if the email got forwarded local with
!target action


The thing is, some lines in the Received chain have the dotted quad
inside brackets, some have it inside parens, and some have it not
inside anything.  We prefer brackets, because that is more or less
a standard (but not always adhered to), and the other ways of
presenting the IP address tend to be less trustworthy or more
likely to have been forged.  This is a general note, and not
so true with the top-most Received, however, which, one hopes,
your server created.  (But pipelined messages (which tend almost
always to be eithe spam or [solicited] bulk) can have only one
Received header on many systems, so we still need to be careful.

Anyway, that all is why I wrote (in Virus Snaggers, which you have
been using and from which you got some of these algorithm ideas)
the line as I did:

     :0 # look for an IP address in top Received
      * $ ^Received:[$WS]*from[$WS]+\/[^$WS].+[$WS]by[$WS]
      * $  $GO^0  MATCH  ??  [[]\/$DOTQUAD
      * $  $GO^0  MATCH  ??  ()\/$DOTQUAD
      { TOP_IP = $MATCH }

$GO is defined as an oversaturated integer for scoring,
9876543210, and it is defined in the genvars.rc file that
comes in vsnag, along with some other things such as $DOTQUAD.
I'm just stating this for others reading along; I know you
know that.

Anyway, the point of having the two condition lines is to
catch the dotted quad in [brackets] preferentially, but if
it's not there, then accept any dotted quad we find in the
MATCH.  I haven't captured the entire Received line here into
MATcH, but only the part that would have the leftmost host
information.  See, if I capture the whole line and there's
no IP address in that part but there is later in the line,
then we fool ourselves.


and

:0
* $ X_RECEIVED ?? [\(]from[$WS][^$WS]*[\(_at_)]\/localhost
* $ X_SENDER_IP ?? ![^$WS] ## i am not shure if 
                           ## i realy need this condition
                           ## but it is true if email is 
                           ## forwarded local with !target
{ X_IS_LOCAL=$TRUE } 


I don't see why you'd need that, because above would not have
matched on X_SENDER_IP if there was a space.



will MATCH if "localhost", literally not the local ip, is 
behind a @ in the X_RECEIVED and not like:

:0
* $ ^Received:[$WS]*[^$WS]from[$WS]\/[^$WS]*
* $ MATCH ?? [\(_at_)]\/.*[^\)]
* $ MATCH ?? localhost
{ X_IS_LOCAL=$TRUE }

I'm unclear on what you're trying to do.  It might be the
English you're using.  Hmm.


what will MATCH on the first Received: in a chaine of Received:
containing, again literally, "localhost".

Basically, if there's no IP address to be found, we could look for
localhost.  We could even put it all in the same recipe.  But I
think it is better to divide it up:


      * $ ^Received:[$WS]*from[$WS]+\/[^$WS].+[$WS]by[$WS]
      * $  $GO^0  MATCH  ??  [[]\/$DOTQUAD
      * $  $GO^0  MATCH  ??  ()\/$DOTQUAD
      { SOMEVAR = $MATCH }

      :0 E   # else
      * MATCH ?? ()\<localhost\>
      { ANOTHERVAR = $TRUE }   # $TRUE is in genvars and is just a dot


Note that I put delimiters around "localhost" to stop a match on
some cute guy's computer that might be named "notalocalhost"
as a joke.


and i had to change my black.rc because X_RECEIVED contains the full
Received: header now

See above for why you might not want to do that.  But on the other
hand, your syntax looks harmless enough.  :-)


from:

:0
* $ X_RECEIVED ?? .*\/($HOST|$X_LOCAL_IP)
{ 
  :0
  * $ X_SENDER_IP ?? !$X_LOCAL_IP
 {  ITS_EVIL=$TRUE  } 
} 

to: 

:0 
* $ X_RECEIVED ?? .*from[$WS]*\/($HOST|$X_LOCAL_IP)
{ 
  :0
  * $ X_SENDER_IP ?? !$X_LOCAL_IP
 {  ITS_EVIL=$TRUE  } 
} 
 

is that correct ?


Well, if you find $HOST then the matching will stop and there
won't be an X_LOCAL_IP to compare it to.  And you might want
to put anchors around the right side, and to quote it.  And,
as Ruud said, it's bad form (though it will work) to put the
bang on the right side of the expression.  And you don't want
NO spaces after the whitespace that followed the word "from.
Also, you want to anchor the right IP address and quote it.
Quote, because otherwise dots become anything.  Anchor, because
otherwise 111.222.000 will "match" 1.222.0.  For example.


    * $! X_RECEIVED ?? ^^$\X_LOCAL_IP^^


Since you're using my vsnag plug-in, you actually have at
your disposal an easy tool for bench-testing this stuff.
The vsnag.point-n-shoot.sh Bourned script in the tools
archive in the vsnag package will let you run any rcfile
against any message and see the verbose output on the screen.


  % vsnag.point-n-shoot.sh --rcfile somefile message

See the help screen for p-n-s.


Use that or another method to test your syntax, and you'll save
yourself and the list lots of make-work.  :-)


Package available at http://vsnag.spamless.us .

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail