procmail
[Top] [All Lists]

Re: Filtering out unwanted mime attachments

1997-10-15 12:47:01
David W. Tamkin writes on 15 October 1997 at 10:11:41
Dan Smith wrote,

|      # reduce the body to just the first MIME part (plus a little extra
|      # garbage)
|      :0B
|      * $.*^--${boundary}\/(.|$)+^--${boundary}$
|      { part1=$MATCH }
|      :0wfib

Don't you want an `A' flag there, or perhaps to include this recipe inside
the same braces where part1 is defined?

Yup, did some of that after I posted; bad things happened if part1
wasn't set. :-)

I can't think of a straightforward way to stop extraction at the second
[...]
If we can be sure that only boundary lines will begin with two hyphens
flush left, we don't need sed after all, just echo as Dan intended:

     :0Bbfwi
     * $ ^--$boundary$\/(.?([^-].*)?$)+
     | echo "$MATCH"

I didn't look it up, but I think MIME requires this (the format was
intended to be easy to parse).  Also, for multipart/alternative and
this example, the first part is fine (MIME states that such
messages should be listed in increasingly complex order).  But it was
your magic regexp that did the trick!

FWIW (for what it's worth), I've appended the recipe I have now.

   Dan
------------------- message is author's opinion only ------------------
J. Daniel Smith <DanS(_at_)bristol(_dot_)com>        
http://www.bristol.com/~DanS
Bristol Technology B.V.                   +31 33 450 50 50, ...51 (FAX)
Amersfoort, The Netherlands               {info,jobs}(_at_)bristol(_dot_)com
-----
# ... your experimental recipes here
LINEBUF=100000
:0B
* $> ${LINEBUF}
{
  # message too large
}
:0E
* ^Mime-Version:
* ^Content-Type:[       ]*multipart/alternative;[       ]*boundary="?\/[^"]+
{
  boundary=$MATCH

  # reduce the body to just the first MIME part
  :0B
  * $ ^--$boundary$\/(.?([^-].*)?$)+
  {
    :0Bbfwi
    | echo "$MATCH"

    # get the Content-Type:
    :0B
    * $.*^Content-Type:[        ]*\/[^  ]+
    {
      content_type=$MATCH
      # and the contents of this MIME part
      :0B
      * $.*^Content-Type:[      ]*${content_type}$\/(.|$)+
      {
        :0Bbfwi
        | echo "$MATCH"

        :0wfi
        | formail -I "Content-Length:" -I "Content-Type: $content_type"
      }
    }
  }
}