procmail
[Top] [All Lists]

Re: Filtering out unwanted mime attachments

1997-10-16 07:59:55
On Thu, 16 Oct 1997 14:47:05 +0100 (WET DST),
"J. Daniel Smith" <J(_dot_)Daniel(_dot_)Smith(_at_)WriteMe(_dot_)com> wrote:
you'll have to grab the separator line from the message header's
Content-Type but when you have that, parsing should be fairly
This seems to be difficult, if not impossible in procmail because

(Indeed, but for the general topic of parsing MIME with Procmail, it's
not impossible. I've been passively thinking about solving it in the
general case, but it's a somewhat twisty thing to do. I think it can
be done with Procmail plus sed, though.)

(as David pointed out) MATCH is greedy, thus the last $boundary
matches, not the first one.  Resorting "sed" seems to be the only
solution...unless there is some way to get procmail to parse/match one
line at a time.

Sneaky idea: run the boundary thru sed which produces as output a
regular expression which matches anything except the boundary line.
  I've been simulating $\VAR in sed for some cases where I wanted to
use the result as a sed regex (can't use $\VAR directly because of the
leading @&$0f!! parens):

    # This is from a recipe which discovers fake IP numbers in 
    #  Received: lines

    # Simulate $\MATCH quoting for sed
    IP=`echo "$MATCH" | sed -e 's/[][().]/\\\&/g'`

    :0hfw    # Disarm the fake Received: line and any after it
    | sed -e "/^Received: .*$IP/,/^$/s/^Received:/X-Fake-&/"

Here's a first stab at doing a sed transform for a non-matching
string; I'm sure it could be improved upon:

    # This is for use back in Procmail itself
    #  but we want to catch non-MATCHes instead of $\MATCHes
    HCTAM=`echo "$MATCH" | sed -e 's/./([^&]|&/g' -e 's/$/./' -eh \
                -e 's/[^(]//g' -e 's/(/)/g' -eH -eg -e 's/\n//'`

This doesn't cope with characters with regex special meaning but I'm
sure it wouldn't be too hard to implement. On the other hand, I
believe the MIME RFCs actually don't permit too much by way of
non-alphabetic characters in the boundary string. Periods are
permitted and should be escaped, what about others?

/* era */

Going from here to something actually useful is still a long way.

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>