On Fri, 6 Jun 2003, Frank Nørvig wrote:
Is it possible to extract part of a body, ex. only the
"Content-Type: text/plain;" part?
I've been mildly interested in this question for a long time and have
finally gotten around to doing something with it.
See <http://www.well.com/user/barts/email/mimepart.txt>. This is a fairly
general pure-procmail (no external shell processes!) MIME part extractor.
It does rely on being able to allocate enough memory to slurp the entire
part into $MATCH and then assign it to a variable, so beware of using it
on really enormous attachments.
You simply load it with:
INCLUDERC=mimepart.txt
By default it extracts the first text/plain part, from up to a two-level
deep (the top level message plus one nested) multipart structure. The
raw text of the extracted part is stored in the variable $BODY_PART (that
is, there's no quoted-printable decoding or anything like that).
However, you can extract a body part of a different type like so:
CONTENT_TYPE=text/html
INCLUDERC=mimepart.txt
In this case the first text/html part is placed in $BODY_PART.
I didn't actually try this, but it should even work to extract the first
part of any text subtype with something like:
CONTENT_TYPE=text/
INCLUDERC=mimepart.txt
The limitation is that it can only extract the first matching part that it
finds, so if there are, e.g., three image/gif attachments, you can only
use this to get the first one ... unless you're willing to do some more
work yourself, e.g., to discard (or mung the content-type of) the first
such part before INCLUDERC'ng this to get the next one.
To make it easier to do that sort of extra processing, the variables
$BOUNDARY (a pattern matching the boundary of the enclosing multipart, if
there was one) and $PART_HEADER (the header of the extracted $BODY_PART,
including the leading boundary) are also set for you.
If you also use <http://www.well.com/user/barts/email/mimewrap.txt>, it
should be possible to do things like:
INCLUDERC=mimewrap.txt
CONTENT_TYPE=message/rfc822
INCLUDERC=mimepart.txt
and end up with the entire original message in $BODY_PART. However, I
didn't try that one myself either ... bug reports might result in fixes,
but only might.
WITH CONTENT_TYPE=multipart/ and a recursive (self-including) INCLUDERC,
it should even be possible to dismantle arbitrarily nested structures.
I'm not going to embark on figuring that one out anytime soon, though.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail