On Thu, 12 Aug 1999 13:42:22 +0200, Jost Schaper <jost(_at_)schaper(_dot_)org>
wrote:
I often get MS word (doc-) - attachments, which I don't want to read.
I would love to have them in RTF-Format, instead.
Is there a chance to trap them and autoreply, that you don't accept
DOC-files, but accept RTF instead?
Part of the problem is that there is more than one way to send Word
attachments, and no really foolproof way to positively identify a Word
attachment, short of decoding it to a file, and even then I'm not sure
how to proceed from there.
Here are two recipes which might be enough to get you started. They
are built on the assumption that the Word file is sent using MIME in
an application/octet-stream or application/msword body part with a
filename=something.doc parameter. The first is for the case when the
Word doc is sent all by itself, and the second for when it's one part
in a multipart/mixed message. It will not cover files sent with
non-doc filenames, or without a filename at all; it won't cover files
send as something else than application/octet-stream and
application/msword; and it won't cope with other mechanisms than MIME
(uuencode comes to mind).
:0
* ^Content-Type:\<*multipart/mixed; boundary="?\/[^" ]+
* B ?? $ ^-$\MATCH($)([ ].*($))*\
Content-Type:\<*application/(octet-stream|msword);(.*\<)?\
name="?[^" ]\.doc
{ ... respond ... }
:0
* ^Content-type:\<*application/(octet-stream|msword);(.*\<)?\
name="?[^" ]\.doc
{ ... respond ... }
I believe there are several other more or less nonstandard
"application" subtypes in common use for Word attachments; I'm
fortunate enough to not have a large corpus of test material for this
(and I'm too lazy to go out and check :-)
It's actually not very smart to require a name=something.doc parameter
to the content-type if it's already identified as application/msword
whereas octet-stream is a "catch all" MIME type which might contain
anything at all, in which case it's a good idea to check for a .doc
attachment. The filename could also (or additionally) be in
content-disposition but if you worry about getting this right, check
out some MIME docs for the full scoop.
Otherwise, if this recipe doesn't cut it, perhaps you can post a
(heavily trimmed) example to work from. Here's the one I have been
looking at:
From: xxxxxxxxxxxxxxxxxx
To: <era(_at_)iki(_dot_)fi>
Subject: xxxxxxxxxxxxxxxxxx
Date: Tue, 6 Apr 1999 11:46:36 +0300
This is a multi-part message in MIME format.
------=_NextPart_000_0006_01BE8023.205E1540
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
xxxxxxxxxxxxxxxxxx
------=_NextPart_000_0006_01BE8023.205E1540
Content-Type: application/msword;
name="xxxxxxxxxxxxxxxxxx.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="xxxxxxxxxxxxxxxxxx.doc"
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAA ...
------=_NextPart_000_0006_01BE8023.205E1540--
If the { ... respond ... } part becomes complicated, a useful trick is
have a number of identification recipes which simply set a variable if
they detect a Word attachment, and then do the autoresponder part if
the variable got set by any of those recipes.
:0
* one way to detect what you want to detect
{ DETECTED=yes }
:0
* another way to detect something that matches
{ DETECTED=yes }
# ... more recipes like those two here
# Now if DETECTED somehow got set, respond
:0
* DETECTED ?? .
{ ... respond ... }
For what it's worth, there's a program called catdoc which can crudely
decode MS Word porridge. It's not exactly a pretty-printer, but it can
often extract enough to enable you to get some idea of what's in the
Word attachment.
Hope this helps,
/* era */
--
Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition