procmail
[Top] [All Lists]

Re: Filtering out unwanted mime attachments

1997-10-16 12:47:36
A few hours ago I wrote
Edward J. Sabol writes on 16 October 1997 at 11:46:36
The exact solution is left as an exercise for the reader.

I'll play around with this and see what I come up with...more to

This line-by-line parsing could be useful...the only problem I had was
with blank lines.  I couldn't get the reg-exp modified right, so I
used scoring to fix things up; there's probalby a better way.

Anyway, here's what I think is a decent first-pass at a reasonably good
way to reduce a MIME multipart/alternative message (Netscape's text
and HTML) message to just the first part (text).  Because of the
recursive INCLUDERC two files are needed, both appended below.

Next exercise: extract the Nth part of a MIME multipart message.

So this *was* about procmail after all :-)  My "some-filter..." can be
done in procmail.

   Dan
------------------- message is author's opinion only ------------------
J. Daniel Smith <DanS(_at_)bristol(_dot_)com>        
http://www.bristol.com/~DanS
Bristol Technology B.V.                   +31 33 450 50 50, ...51 (FAX)
Amersfoort, The Netherlands               {info,jobs}(_at_)bristol(_dot_)com
----
TMPDIR=/tmp/$LOGNAME
MAILDIR=$TMPDIR/procmail.out
DEFAULT=$MAILDIR/$LOGNAME
VERBOSE=yeah
SHELL=/bin/sh

:0
* ! ? test -d $TMPDIR || mkdir $TMPDIR
{
  # Bail out if directory didn't exist and couldn't be created
  EXITCODE=127
  HOST
}

:0
* ! ? test -d $MAILDIR || mkdir $MAILDIR
{
  # Ditto
  EXITCODE=127
  HOST
}

# ... your experimental recipes here
RCDIR=$HOME/.procmail/mime

LINEBUF=100000
:0B
* $> ${LINEBUF}
{
  # message too large
}
:0E
* ^Mime-Version:
* ^Content-Type:[       ]*multipart/alternative;[       ]*boundary="?\/[^"]+
{
  boundary=$MATCH

  :0B
  * ^^\/(.*$)+
  {
    BODYLINES = $MATCH

    INCLUDERC=$RCDIR/extract-part.rc

    :0
    * part ?? .
    {
      :0bfwi
      | echo "$part"

      # get the Content-Type:
      :0B
      * $ ^Content-Type:[       ]*\/[^  ]+
      {
        content_type=$MATCH

        # and the contents of this MIME part
        :0B
        * $ ^Content-Type:[     ]*$\content_type$\/(.|$)*
        {
          :0bfwi
          | echo "$MATCH"

          :0hwfi
          | formail -I "Content-Length:" -I "Content-Type: $content_type"
        }
      }
      # A Content-Type: isn't required
      :Ehwfi
      | formail -I "Content-Length:"
    }
  }
}


# From: "Edward J. Sabol" <sabol(_at_)alderaan(_dot_)gsfc(_dot_)nasa(_dot_)gov>
# To: Procmail Mailing List 
<procmail(_at_)Informatik(_dot_)RWTH-Aachen(_dot_)DE>
# Date: Thu, 16 Oct 1997 11:46:36 -0400


:0
* BODYLINES ?? ^^(.*$)\/(.*$)+
{ REMAININGLINES = $MATCH }
:0E
{ REMAININGLINES }

:0
* BODYLINES ?? ^^\/.*$
{ THISLINE = $MATCH }
:0E
{ THISLINE }

# .*$ sucks up blank lines; this puts it back
:0
* 1^1 BODYLINES ?? ^$
{ b=$= }
:0
* 1^1 REMAININGLINES ?? ^$
{ r=$= }
:0
* $ b ?? $r
{ }
:E
{ 
  THISLINE="$THISLINE
"
}

:0
* $ THISLINE ?? ^--$\boundary(--)?
{
  THISLINE
  :0
  * in_boundary ?? yes
  {
    in_boundary=no
    stop=yes
    THISLINE
  }
  :E
  { in_boundary=yes }
}

# Now recurse if there are any remaining lines.
:0
* REMAININGLINES ?? .
{
     BODYLINES = $REMAININGLINES

  :0
  * stop ?? yes
  { }
  :E
  * in_boundary ?? yes
  { part=$part$THISLINE }

     INCLUDERC = $_
}