procmail
[Top] [All Lists]

Re: Extract Content-Type: text/plain;

2003-06-07 17:21:18
On Fri, 6 Jun 2003, Frank Nørvig wrote:

Is it possible to extract part of a body, ex. only the
"Content-Type: text/plain;" part?

I've been mildly interested in this question for a long time and have
finally gotten around to doing something with it.

See <http://www.well.com/user/barts/email/mimepart.txt>.  This is a fairly
general pure-procmail (no external shell processes!) MIME part extractor.
It does rely on being able to allocate enough memory to slurp the entire
part into $MATCH and then assign it to a variable, so beware of using it
on really enormous attachments.

You simply load it with:

INCLUDERC=mimepart.txt

By default it extracts the first text/plain part, from up to a two-level
deep (the top level message plus one nested) multipart structure.  The
raw text of the extracted part is stored in the variable $BODY_PART (that
is, there's no quoted-printable decoding or anything like that).

However, you can extract a body part of a different type like so:

CONTENT_TYPE=text/html
INCLUDERC=mimepart.txt

In this case the first text/html part is placed in $BODY_PART.

I didn't actually try this, but it should even work to extract the first
part of any text subtype with something like:

CONTENT_TYPE=text/
INCLUDERC=mimepart.txt

The limitation is that it can only extract the first matching part that it
finds, so if there are, e.g., three image/gif attachments, you can only
use this to get the first one ... unless you're willing to do some more
work yourself, e.g., to discard (or mung the content-type of) the first
such part before INCLUDERC'ng this to get the next one.

To make it easier to do that sort of extra processing, the variables
$BOUNDARY (a pattern matching the boundary of the enclosing multipart, if
there was one) and $PART_HEADER (the header of the extracted $BODY_PART,
including the leading boundary) are also set for you.

If you also use <http://www.well.com/user/barts/email/mimewrap.txt>, it
should be possible to do things like:

INCLUDERC=mimewrap.txt

CONTENT_TYPE=message/rfc822
INCLUDERC=mimepart.txt

and end up with the entire original message in $BODY_PART.  However, I
didn't try that one myself either ... bug reports might result in fixes,
but only might.

WITH CONTENT_TYPE=multipart/ and a recursive (self-including) INCLUDERC,
it should even be possible to dismantle arbitrarily nested structures.
I'm not going to embark on figuring that one out anytime soon, though.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


<Prev in Thread] Current Thread [Next in Thread>