The following script is a first cut at decoding mail messages with a single
MIME part Content-Type: text/html, building a new message having a body
of multitype/alternative, with a text part derived from the html part.
Further, if the original message body had been encoded (eg, in base64),
it will be decoded into plain text in the resulting message.
I ran this script from within a test directory, as follows:
% procmail -m -p MAILDIR=`pwd` ./decode_html.rc < html_enc_mail.txt
where 'html_enc_mail.txt' is a file containing a single mail message
(including headers) with a message body of Content-Type: text/html.
The script depends upon various utility programs: munpack, lynx, mktemp,
formail,
hostid, and date. It uses a few POSIX/Linux-isms in the command invocations,
so may need to be tweaked some on non-Linux (or older Linux) platforms.
I found munpack here:
source: http://rpmfind.net/linux/contrib/libc6/SRPMS/mpack-1.5-3.src.rpm
binary: http://rpmfind.net/linux/contrib/libc6/i386/mpack-1.5-3.i386.rpm
LOGFILE=`rm -f test.log; echo test.log` # debugging
VERBOSE=yes # debugging
DEFAULT=| # debugging
:0
* ^Content-Type: text/html
{
# Create temp. dir. for unpacking, and 'cd' to it.
OLDDIR=`cd $MAILDIR; pwd`
MAILDIR=`mktemp -q -d /tmp/decode.$$.XXXXXX`
# Set up exception handler to delete the temp dir.
# if something unexpected (like a segfault) happens while we're decoding.
OLDTRAP="$TRAP"
TRAP='cd $OLDDIR; rm -rf $MAILDIR; $MAILDIR=$OLDDIR; cd $MAILDIR;'"$OLDTRAP"
# Boundary has the form:
# ----=_Part_0000_7f0100_20040121_205801.222159000
# (can be anything that is likely not to occur inside the MIME parts.)
BOUNDARY=`echo '----=_Part_0000_'\`hostid\`\`date +'_%Y%m%d_%H%M%S.%N'\``
#
# Unpack the HTML part, should create a file called 'part1'.
#
:0 ci
|munpack -f -q -t
#
# The following hack works around munpack's refusal to
# unpack a message body unless it is encoded or has a multipart
# attachment. We just copy the body into 'part1', if the
# file wasn't created by munpack.
#
:0 cbr
* ! ? test -e "part1"
part1
#
# Place this long shell script into a variable.
# It is easier this way than having to figure out
# the various quoting rules inside an action.
#
SCRIPT='
# Build the html and text parts.
cp part1 html.part
lynx -stdin -force_html -nolist -dump < html.part > text.part
# Fix up the header.
formail -I "Content-Type: multipart/alternative;
boundary=\"$BOUNDARY\"" -X "" \
-I "Content-Transfer-Encoding: 7bit"
# Emit the body via a sequence of "cat" commands.
cat - text.part << EOF
This is a multi-part message in MIME format.
--$BOUNDARY
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 8bit
EOF
cat - html.part << EOF
--$BOUNDARY
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: 8bit
EOF
cat << EOF
--$BOUNDARY--
EOF'
:0 fw
|sh -c "$SCRIPT"
# Remove temporary directory and cd back into the
# original mail directory.
MAILDIR=`cd $OLDDIR; rm -rf $MAILDIR; echo $OLDDIR`
# Restore old exception handler.
TRAP="$OLDTRAP"
}
decode_html.rc
Description: Binary data
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail