procmail
[Top] [All Lists]

Re: Removing new lines in MATCH

2000-04-05 13:38:03
* Anthony Thyssen <anthony(_at_)cit(_dot_)gu(_dot_)edu(_dot_)au> [000330 23:11]:

I have a procmail file which looks for a particular sting which could be
broken accross lines.. No problme it finds the string ok.

When a match is found I want it put into a mail header before it is
delivered to a folder so I can see what was matched...


Example...

TAB  = "        "   # a single tab
SPC  = "[ $TAB]"
SPCL = ""($SPC|$)"

# Later...

JUNK = _Junk    # a Mh folder (rcvstore used for deliver for correct
                #              unseen-cache handling)
OR=9876543210   # Ultra large score for OR'ed conditions -- Jari's Tips #8.18

X="$SPCL+"      # word seperator

What is wrong, for your proposed use, with either \> or \<, both of
which include non-word characters and newlines?


:0 Bfh
* $ $OR^0 ()\/this${X}is${X}just${X}a${X}test${X}match
* $ $OR^0 ()\/kill${X}me${X}please
* $ $OR^0 ()\/I${X}am${X}only${X}kidding
| formail -i "X-Junk-Match: $MATCH" 
:0 A
| rcvstore +$JUNK

:0 Bfh
* ()\/(\
       this\>+is\>+just\>+a\>+test\>+match|\
       kill\>+me\>please|\
       i\>+am\>+only\>+kidding|\
      )
| formail -i "X-Junk-Match: $MATCH"
:0 A
| rcvstore +$JUNK

\>* can be used when words might be run together. \> also matches
commas, periods, and other punctuation, making these matches much more
flexible in dealing with the language found in spam.

The works but...

Problems...

  1/ The header line will only contain the matched string up
     until the first new line.
  2/ Also Formail seems to lowercase all but the first letter (the X)
     I can live with this but it is annoying.

So I used the shell to remove the extra spaces which should be safe from
attach as only matches strings could be open for shell parsing.
As a shell is being called anyway I can also combine the two commands

* .. as above
| formail -i "X-Junk-Match: `echo $MATCH`" | rcvstore +$JUNK

This is horible and probably not good for all posibilities.


Does any one have a better way of fixing the newline in MATCH ?

To join an arbitrary number of lines in procmail requires either an
external program (tr, sed, perl...) or a recursive rc. Here is a
recursive rc which does it. Extract the code between the borders and run
it with your local equivalent of
    procmail ./splittertest.rc < /dev/null
to see how it works. Turn verbosity on to see more detail.

This does suffer from the limitation on the number of file descriptors
available, and will die after some number of iterations, corresponding
to that same number of lines in the string to be split. I typically
see 20 or more iterations before such a failure, so am careful to send
through only strings which have fewer newlines.

-=-=-=-=-=-=-=-=-=-  splittertest.rc  -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# Firt time through, set up environment and test string
  :0
  * ! split ?? yes
  { SHELL=/bin/sh
    MAILDIR=.
    DEFAULT=|
    VERBOSE=no
    NL="
"
    split=yes
    manylines="This is line one
this is line two
this is line three
this is line four and final"
    LOG="Input: $manylines$NL"
    INCLUDERC=$_
  }
# The next section is the splitter. The outer test isn't needed if this
# placed in a standalone rc file
# If called as an includerc, start here
  :0
  * split ?? yes
  { # grab next line of input into MATCH and append to output
    :0
    * manylines ?? ^\/.*
    { # choose an appropriate separator, or none. I chose ~.
      oneline=$oneline~$MATCH
    # if there are more lines, replace manylines with remainder and reiterate
      :0
      * manylines ?? ^.*$\/(.*$)+
      { manylines=$MATCH
        INCLUDERC=$_
      }
    }
  }
# The splitter ends here
# Done with example, now show the result
  LOG="${NL}output: $oneline$NL"
# if your machine is named byebye, change the next line
  HOST=byebye
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

To use this in your code, do something like:

:0 B
* ()\/(multiline string\>*1|multiline string\>*2)
{ manylines=$MATCH INCLUDERC=$HOME/path/to/splitter.rc
  :0
  | formail -i "X-Junk-Match: $oneline" | rcvstore +$JUNK
}

ASIDE: What procmail is badly in need of is a regular expresion
substution function, maybe a minimalistic sed `s' function, I mean I
understands RE's so it should not be too much extra and probably would
resolve a lot of existing procmail `wierd coding' practices.

There is a procmail-dev list which is appropriate for such discussions.

<Prev in Thread] Current Thread [Next in Thread>