procmail
[Top] [All Lists]

Re: Filtering out unwanted mime attachments

1997-10-15 13:14:29
When I wondered,

| >If we can be sure that only boundary lines will begin with two hyphens
| >flush left, we don't need sed after all, just echo as Dan intended:

| >      :0Bbfwi
| >      * $ ^--$boundary$\/(.?([^-].*)?$)+
| >      | echo "$MATCH"

Dan Smith answered,

| I didn't look it up, but I think MIME requires this (the format was
| intended to be easy to parse).  Also, for multipart/alternative and
| this example, the first part is fine (MIME states that such
| messages should be listed in increasingly complex order).  But it was
| your magic regexp that did the trick!

Glad to have helped.  Perhaps we can extend it to accept lines consisting
of nothing but two hyphens or two hyphens and a space so that a .sig sepa-
rator isn't mistaken for a multipart boundary:

         * $ ^--$boundary$\/(.?([^-].*|- ?)?$)+

| FWIW (for what it's worth), I've appended the recipe I have now.

OK, let's give it a look ...

|   # reduce the body to just the first MIME part
|   :0B
|   * $ ^--$boundary$\/(.?([^-].*)?$)+

I'm thinking now that $\boundary would be a good idea there.  The boundary
might contain regexp magic characters.

|     :0Bbfwi
|     | echo "$MATCH"

No need for `B' any more; there are no conditions on the recipe now (they're
taken care of at an outer brace level).

|     * $.*^Content-Type:[      ]*\/[^  ]+

      * $ ^Content-Type:[       ]*\/[^  ]+

The leading .* doesn't help and can slow things down.

|       * $.*^Content-Type:[    ]*${content_type}$\/(.|$)+

        * $ ^Content-Type:[     ]*$\content_type$\/(.|$)+

Same thing about the leading .*, plus, since I'm not sure what characters
are allowed in the content type, using $\content_type can't hurt.

|         :0Bbfwi
|         | echo "$MATCH"

Again, `B' is superfluous because there are no conditions.

|         :0wfi
|         | formail -I "Content-Length:" -I "Content-Type: $content_type"

That command leaves the body unchanged, so it will be more efficient to add
an `h' flag and filter only the head.