procmail
[Top] [All Lists]

RE: adding plaintext when only html is received

2004-01-22 09:33:56


The following script is a first cut at decoding mail messages with a single
MIME part Content-Type: text/html, building a new message having a body
of multitype/alternative, with a text part derived from the html part.
Further, if the original message body had been encoded (eg, in base64),
it will be decoded into plain text in the resulting message.

I ran this script from within a test directory, as follows:
  % procmail -m -p MAILDIR=`pwd` ./decode_html.rc < html_enc_mail.txt
where 'html_enc_mail.txt' is a file containing a single mail message
(including headers) with a message body of Content-Type: text/html.

The script depends upon various utility programs: munpack, lynx, mktemp,
formail,
hostid, and date. It uses a few POSIX/Linux-isms in the command invocations,
so may need to be tweaked some on non-Linux (or older Linux) platforms.
I found munpack here:
source: http://rpmfind.net/linux/contrib/libc6/SRPMS/mpack-1.5-3.src.rpm
binary: http://rpmfind.net/linux/contrib/libc6/i386/mpack-1.5-3.i386.rpm


LOGFILE=`rm -f test.log; echo test.log` # debugging
VERBOSE=yes                             # debugging
DEFAULT=|                               # debugging

:0
* ^Content-Type: text/html
{
# Create temp. dir. for unpacking, and 'cd' to it.
OLDDIR=`cd $MAILDIR; pwd`
MAILDIR=`mktemp -q -d /tmp/decode.$$.XXXXXX`

# Set up exception handler to delete the temp dir.
# if something unexpected (like a segfault) happens while we're decoding.
OLDTRAP="$TRAP"
TRAP='cd $OLDDIR; rm -rf $MAILDIR; $MAILDIR=$OLDDIR; cd $MAILDIR;'"$OLDTRAP"

# Boundary has the form:
# ----=_Part_0000_7f0100_20040121_205801.222159000
# (can be anything that is likely not to occur inside the MIME parts.)
BOUNDARY=`echo '----=_Part_0000_'\`hostid\`\`date +'_%Y%m%d_%H%M%S.%N'\``

#
# Unpack the HTML part, should create a file called 'part1'.
#
:0 ci
|munpack -f -q -t
#
# The following hack works around munpack's refusal to
# unpack a message body unless it is encoded or has a multipart
# attachment. We just copy the body into 'part1', if the
# file wasn't created by munpack.
#
:0 cbr
* ! ? test -e "part1"
part1

#
# Place this long shell script into a variable.
# It is easier this way than having to figure out
# the various quoting rules inside an action.
#
SCRIPT='
  # Build the html and text parts.
  cp part1 html.part
  lynx -stdin -force_html -nolist -dump < html.part > text.part
  # Fix up the header.
  formail -I "Content-Type: multipart/alternative;
        boundary=\"$BOUNDARY\"" -X "" \
        -I "Content-Transfer-Encoding: 7bit"
  # Emit the body via a sequence of "cat" commands.
  cat - text.part << EOF


This is a multi-part message in MIME format.


--$BOUNDARY
Content-Type: text/plain;
        charset="us-ascii"
Content-Transfer-Encoding: 8bit

EOF
  cat - html.part << EOF


--$BOUNDARY
Content-Type: text/html;
        charset="us-ascii"
Content-Transfer-Encoding: 8bit

EOF
  cat << EOF


--$BOUNDARY--

EOF'

:0 fw
|sh -c "$SCRIPT"

# Remove temporary directory and cd back into the
# original mail directory.
MAILDIR=`cd $OLDDIR; rm -rf $MAILDIR; echo $OLDDIR`
# Restore old exception handler.
TRAP="$OLDTRAP"
}

Attachment: decode_html.rc
Description: Binary data

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail
<Prev in Thread] Current Thread [Next in Thread>