W. Mark Herrick <markh(_at_)va(_dot_)rr(_dot_)com> asked a few days ago:
On a related note, is there any way to do the following:
If MIME (or any, for that matter) attachment, then
a. Drop MIME part
b. Autorespond to sender telling them that MIME/any attachment was
dropped
This would be useful.
Well, when it comes to MIME anyway, it can be done in procmail (with
a few helpers like sed), but it can be a little tricky. Here are the
problems, and some hints if you want to proceed anyway.
To start, let's define some terms and clear up just what
you want to do. We'll deal with part (a) of your request,
the second part is repeated many times in the list archive
<http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/>, q.v.
The first problem is identifying attachments. "Simple!" you say. Locate
Content-Disposition: attachment and its variants. That works most of the
time, but not when it's part of a discussion on MIME deletion, such as
in this paragraph. Cute trick, lets go on.
Once we have identified a part, the question is how do we go about
deleting it. The excision itself is pretty simple, and I'll provide a
GNUsed filter script to do that part. The more interesting question is:
How do we ensure that what remains is still valid. The good news is that
we don't have to do much at all. We have to deal with a few different
situations. From my own limited testing, some MUAs (elm and pine) have
problems with malformed MIME messages. Some (mailx) don't. I would guess
that you want to create few, if any problems.
So let's examine come cases.
1. What if the whole message is an attachment (not multipart, but
Content-Disposition: attachment in the RFC822 headers).
2. (Likely) What if there are multiple parts, one of which is an
attachment?
3. What if there is only one part (the attachment) in a MIME multipart
message?
4. What if the attachment is part of a nested part?
5. (Also likely) What if there are multiple attachments?
Here's how we'll deal with each.
1. Detect this with
:0
* ^content-disposition:[ ]*attachment
{ action }
The stuff after 'attachment' ensures that the word isn't part of a
filename. Fortunately for us, RFC2045 forbids RFC822-style comments
in content-disposition headers. The disposition-type should follow
the header name directly, per RFC, although you can code this more
defensively if you like.
Removing involves dropping the body and removing the MIME indicia
from the header. We can try to grab the filename to use in our
autoreply, and we can insert a dummy body in the message so that
there is something there when we look at it. We'll also set a flag
for autoreplying that we can check later. Putting these together:
:0
* ^content-disposition:[ ]*attachment
* ^content-.*filename="?\/[^";]+
{ file=$MATCH sendNoMIME=yes
:0 f h w
| formail -imime-v -icontent-d -icontent-t -icontent-m -icontent-b \
-A"X-Munged: removed attachment $file from message"
:0 f b w i
| echo "$file was here. It is gone."
}
Note: Where there are two spaces in character class square brackets,
I mean space and tab.
2. (One part in a 2+ part multipart)
This and the case 5 are the most likely occurences. Detect this one
thus:
:0 B ## first, look for a multipart with one or more attachments
* H ?? ^content-type:.*multipart/
* content-disposition:.*attachment($|;| | )
{ sendNoMIME=yes
:0 B ## next, grab the part header for the first attachment
* ^\/--(.+$)+content-disposition:[ ]*attachment.*$(.+$)*
{ partHead=$MATCH file=NameNotFound
:0 ## grab the filename for use in reporting
* partHead ?? filename="?\/[^";]+
{ file=$MATCH }
:0 ## grab the boundary for use by filter
* partHead ?? ^^--\/.+
{ boundary=$MATCH }
:0 ## grab the disposition line for use by filter
* partHead ?? ^\/content-disposition.+
{ cdisp=$MATCH }
:0 f b w
| sed -n -e ":top;\\?^--$boundary?!{p;n;btop;}" \
-e ":hold;\\?^--$boundary--?{p;n;btop;}" \
-e "h;:head;n;\\?$cdisp?brepl;/./{H;bhead;}" \
-e "H;x;p;:print;n;\\?^--$boundary?bhold;p;bprint" \
-e ':repl;x;P;a\
Content-Type: text/plain; charset=US-ASCII\
\
Part dumped\
' -e ":dump;n;\\?^--$boundary?bhold;bdump"
:0 f h w
| formail -A"X-Munged: removed attachment $file from message"
}
}
3. If there was only one part (the attachment), we're OK. The recipe
above has replaced that part with another valid part, so the headers
should all be fine.
4. If the message is nested, the multipart section may not be mentioned
at all in the RFC822 headers of the original message. This means
that the recipe in (2) won't detect it. A simple change handles
that. Replace the first condition line of the recipe (H ??...) with
this new line:
* HB ?? ^content-type:.*multipart/\/[^ ;]+
and it should work on all nested messages.
5. Finally, how do we handle messages with multiple attachments. This
requires a set of recursive recipes. The first recipe, in the
calling rc file, does the initial detection, and the remaining
recipes, in a separate rc file, do the bulk of the work.
Putting it all together, this may be what you asked for.
In the main file (usually .procmailrc, YMMV):
:0
* ^content-disposition:.*attachment($|;| | )
* ^content-.*filename="?\/[^";]+
{ file=$MATCH sendNoMIME=yes
:0 f h w
| formail -imime-v -icontent-d -icontent-t -icontent-m -icontent-b
:0 f b w i
| echo "$file was here. It is gone."
}
:0 E B ## otherwise, look for a multipart with one or more attachments
* HB ?? ^content-type:.*multipart/
* content-disposition:.*attachment($|;| | )
{ sendNoMIME=yes INCLUDERC=ReplacePart.rc
:0 f h w
| formail -A"X-Munged: removed attachment(s) $files from message"
}
:0
* sendNoMIME ?? yes
* other autoresponder conditions
{ autoresponder recipes }
And in the rc file ReplacePart.rc
:0 B ## next, grab the part header for the first attachment
* ^\/--(.+$)+content-disposition:.*attachment[; ]*$(.+$)*
{ partHead=$MATCH files
:0 ## grab the filename for use in reporting, append to list
* partHead ?? filename="?\/[^";]+
{ files=${files:+$files, }$MATCH }
:0 ## grab the boundary for use by filter
* partHead ?? ^^--\/.+
{ boundary=$MATCH }
:0 ## grab the disposition line for use by filter
* partHead ?? ^\/content-disposition.+
{ cdisp=$MATCH }
:0 f b w ## filter to replace attachment by fixed message
| sed -n -e ":top;\\?^--$boundary?!{p;n;btop;}" \
-e ":hold;\\?^--$boundary--?{p;n;btop;}" \
-e "h;:head;n;\\?$cdisp?brepl;/./{H;bhead;}" \
-e "H;x;p;:print;n;\\?^--$boundary?bhold;p;bprint" \
-e ':repl;x;P;a\
Content-Type: text/plain; charset=US-ASCII\
\
Part dumped\
' -e ":dump;n;\\?^--$boundary?bhold;bdump"
:0 a B ## look for more attachments in body iff previous worked
* content-disposition:.*attachment($|;| | )
{ INCLUDERC=$_ }
}
I leave the eradication of non-MIME attachments as an excercise.
Some notes on what these recipes do, how they do it, and what they don't
do, and what they can't do:
1. They haven't been tested in production. The recipe for point 2,
above, has been pasted into a test harness and fed controlled
messages, and works. (Be sure to put tab characters into the right
places, though).
2. The non-multipart recipe (point 1) requires filename=value in a
content header. The multipart recipes assume that filename=value
is present in the MIME part header; if it isn't, the recipe should
still work, but the reporting may look strange.
Note that, unlike content-disposition headers, content-type headers
can have comments. This means that there is a small chance that the
multipart test in the first parts of these recipes set will suffer a
false positive. C'est la vie.
3. The formail call with all the content headers listed can be
simplified if you know that there aren't any content-length headers,
or are willing to delete one if it is there.
4. The recipe in point 2 works as follows.
. The top is sees if there is something in the message worth
replacing, that is, something is defined in the head or body as
multipart, and the body has at least one content disposition
header. This doesn't guarantee that there is something there to
replace, but it does say that it's worth looking.
. The next test is the big one. It catches a content-disposition
header along with the MIME part header lines which procede it,
and the boundary line before those. If we can't find something to
match this, the first test was a false positive.
. On a hit, we parse the boundary, disposition header, and filename
for later use. We then pass the boundary and disposition header to
sed for the replacement action.
. Sed is used to filter the message body, replacing each part which
matches this target disposition line. It doesn't look for the
particular part which the procmail regexp isolated, simply any
part which has the same disposition line.
. The sed filter may catch more than one attachment, in which case
the file names are underreported.
5. The sed routine may well need fiddling for your version of sed. Mine
is gnu sed, and this works with it. Many seds don't abide anything
after a label in a -e expression. Many don't want anything before an
opening curlybrace. On one sed I had tried, I needed 19 expressions
to do this. Part of that is my sed skill, part is the sed version.
Here is a walkthrough of the sed script:
. If we haven't hit a boundary yet, print the current line and go
back for more. (Print everything from the start to the first
boundary, but not the first boundary.)
. If we hit the final boundary, print it out and go back for more.
(Print everything from the final boundary to the end of the
message.)
. We hit a non-terminal boundary. Put it into the hold, clearing out
whatever might be there.
. Process the MIME part head: Get the next line. If it is our target
disposition line, branch to the replace routine. If it isn't a
blank line (its another head line) append it to the hold and go
back for more head lines. Otherwise (its a blank line) we've
reached the end of the part header and this one isn't the one
we're after, so
. Append the blank line to the hold, then retrieve and print the
hold, then print every line until we get to another boundary line.
When we get to a boundary, branch back to the boundary tests.
. To replace the target part, first retrieve the hold so we can
print the boundary line. Print just the boundary line. Then append
an appropriate header, a blank line, and a fixed message. Then
loop through more lines until we hit another boundary, and branch
back to the boundary tests.
What this sed script can't do:
. Insert variable data in place of the removed part. I couldn't
find a way to insert the file name or boundary variable, so I
resorted to popping the boundary variable off the top of the hold
and inserting constant text. Blank lines are appended with space
backslash, I learned through experimentation and rapid hair loss.
What this sed script doesn't do:
. Save the attachment to a file. Actually, you could use sed w
(write) commands to append the mime part header and body to a file
if you want, but then you have to deal with generating unique
names and the (unproven) cost of those individual writes. You also
have to deal with the chance that the script will replace more
than one part, and the complications that introduces for saving
them.
6. The recipes cannot catch defaulted MIME parts (parts with no MIME
part headers after the boundry line). That isn't a problem here,
since attachments require some of those headers.
Now that the question is answered, howzabout some discussion:
First, or course, is the question: "Why?" Why do you want to delete all
attachments? Many MUAs handle them quite well. Is there a problem with
attachments like this one:
--boundaryline
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: attachment;
filename="stage2.txt"
Content-Transfer-Encoding: base64
QWxnb3JpdGhtOiBDYWVzYXIgQ2lwaGVyDQpPZmZzZXQ6IDE5DQoNCkNpcGhl
cnRleHQ6DQpNSElMWSBMWkEgWkJITCBYQlBaWEJMIE1WWUFCVUhMIEhXV0FQ
QlogSlNIQktQQlogSkhMSkJaIEtQSkFCVCBIWUpIVUJUIExaQQ0KVUxCQVlW
VQ0KDQpQbGFpbnRleHQ6DQpGQUJFUiBFU1QgU1VBRSBRVUlTUVVFIEZPUlRV
TkFFIEFQUFRJVVMgQ0xBVURJVVMgQ0FFQ1VTIERJQ1RVTSBBUkNBTlVNIEVT
VA0KTkVVVFJPTg0K
--boundaryline
If the problem is the size of the messages, you can more easily bounce
all large messages.
Maybe your complaint is about vcards and tnef and the like. Those can be
removed easily, to be sure, and many of us do that. You don't, however,
need a big stick to do that, and this is a big stick.
Anyway, that's how I might go about silencing the MIME, should I need
to. I'm sure that others on the list will have some comments. As a
matter of fact, I wouldn't try to implement this until a few of our
éminence grise have had their say.
--
Rik Kabel Creating tomorrow's legacy systems, today
rik(_at_)netcom(_dot_)com