procmail
[Top] [All Lists]

Converting text/html to text/plain

2002-06-16 19:54:09
I spent far too long today figuring out how to convert text/html to
text/plain, whether the top level content type is text/html,
multipart/alternative, multipart/related, or multipart/mixed.  My
specific application was wanting to offer a daily digest email of my
blog entries, each one of which is generated in HTML format.  Both Yahoo
Groups and Mailman can only digest plaintext messages.  My solution uses
procmail, with munpack breaking apart the message and then lynx
replacing URLs with placeholders "[1]" and adding a references section
at the bottom.  Here's a sample downconverted message:

   My message.
   --
   Posted by Dan Kohn to [1]Dan Kohn's Blog at 6/16/2002 7:29:23 PM
   Powered by [2]Blogger Pro

References

   1. http://www.dankohn.com/blog/
   2. http://pro2.blogger.com/



My .procmailrc:

#Uncomment the following lines and use tail -f procmail.log to debug
#VERBOSE=yes
#LOGFILE=$HOME/procmail.log
#LOGABSTRACT=all


MAILLIST=dankohn-blog(_at_)example(_dot_)com
SUBJ_=`formail -xSubject:`
DIR=$HOME/.procmail


# These messages should be converted from HTML to text
# (converting links to references) and then
# forwarded to $MAILLIST.  Lynx does the trick.

:0w
* ^Subject: \[Add your test here\].*
{


# Message is text/html with no multipart mixed, related, or
# alternative.  Process body with lynx -dump.

:0wb
* ^Content-Type: text/html.*
| lynx -dump -force_html -stdin | \
mutt $MAILLIST -s "${SUBJ_}"


# Message is text/plain with no multipart, so just forward

:0w
* ^Content-Type: text/plain.*
! $MAILLIST


# Message is multipart/alternative.  First part is text
# and should be discarded.  Second part is HTML and should
# be converted

:0w
* ^Content-Type: multipart/alternative.*
| munpack -t -C $DIR && \
lynx -dump -force_html $DIR/part2 | \
mutt $MAILLIST -s "${SUBJ_}" && \
rm -f $DIR/*


# Message is multipart/mixed or related.  HTML is first part.

:0w
* ^Content-Type: multipart.*
| munpack -t -C $DIR && \
lynx -dump -force_html $DIR/part1 | \
mutt $MAILLIST -s "${SUBJ_}" && \
rm -f $DIR/*
}


# Send all other mail through SpamAssassin, which is awesome.

:0fw
| /usr/bin/spamassassin -Pa -F0

:0      # All messages
! dan(_at_)skymv(_dot_)com    # Forward mail to my non-publicized address

          - dan
--
Dan Kohn <mailto:dan(_at_)dankohn(_dot_)com>
<http://www.dankohn.com/>  <tel:+1-650-327-2600>
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • Converting text/html to text/plain, Dan Kohn <=