Hello,
Am 2010-01-30 13:11:46, schrieb JW Simpson:
There are several programs already available that you can call from
procmail for this:
w3m -> http://w3m.sourceforge.net/
w3mmee -> http://pub.ks-and-ks.ne.jp/prog/w3mmee/
links -> http://links.sourceforge.net/
elinks -> http://elinks.or.cz/about.html
html2text -> http://man-wiki.net/index.php/1:html2text
lynx -> http://man-wiki.net/index.php/1:lynx
I am alredy using html2text, but if you fetch the HTML page you have it
in memory and then I have to pipe it through 3 different filters because
the pages are not always in the same format.
If I use your recipe the pages are unreadable...
First I have to use a SED script to cut from the beging to a marker and
then from a second marker to the end and add new <html>...</html> tags.
And now I can use html2text to get the stuf I like.
The problem ist here:
----[ '~/.tdtools-procmail/BUSINESS_firms_maxim' ]----------------------
<snip>
:0
* ^From:.*Application Notes
{
# Eliminate any encodings
TMPVAR1=`mimedecode |grep --max-count=1 "^URL: " |sed 's|^URL: ||' ; :`
# Get the number of the AppNote
TMPVAR2=`echo "${TMPVAR1}" |sed 's|.*an_pk/||' |tr -d ' ' ; :`
# Download the AppNote
TMPVAR3=`wget --user-agent="tdtools-procmail v${TDTP_VERSION}" --quiet -O -
${TMPVAR1} ; :`
# Here I have now from the as error mail a 87 kByte HTML file in memory
:0
* ? test -n "${TMPVAR3}"
{
# Get the link to the PDF
TMPVAR4=`echo -e "${TMPVAR3}" |grep 'pdfserv' |head -n1 |sed
's|.*http://pdfserv|http://pdfserv|' |sed 's|\.pdf.*|.pdf|' ; :`
# Eleminate the unused HEADER and attach a new HEAD
# if it fails, use the original and continue
TMPVAR5=`echo -e "${TMPVAR3}" |sed -n '/APPLICATION NOTE /,$p' |sed
's|.*APPLICATION NOTE |<html><head></head><body><table><tr><td>APPLICATION
NOTE |' ; :`
# Here is the begining of the message I want to eleminate...
# Now the USED memory is nearly 128 kByte or something like this
:0
* ? test -z "${TMPVAR5}"
# This test produce the first error and
# the next line is not executed...
{
TMPVAR5=`echo -e "${TMPVAR3}" |sed -n '/REFERENCE DESIGN /,$p'
|sed 's|.*REFERENCE
DESIGN |<html><head></head><body><table><tr><td>REFERENCE DESIGN |' ; :`
}
:0
* ? test -z "${TMPVAR5}"
# This testproduce the second error and
# the next line is not executed...
{ TMPVAR5="${TMPVAR3}" }
# Eleminate the unused FOOTER
# if it fails, use to original and continue
TMPVAR9=`echo -e "${TMPVAR5}" |sed '/<!-- BEGIN: EE-MAIL -->/,//d' ; :`
# A new variable which exceed the memory usge again
:0
* ? test -z "${TMPVAR9}"
# This testproduce the fourth error and
# the next line is not executed...
{ TMPVAR9="${TMPVAR3}" }
# Attach a footer if there is no one
:0B
* ! </html>
{
TMPVAR9="${TMPVAR9}${NL}</table></body></html>"
}
# Convert the HTML page to text/plain
TMPVAR9=`echo "${TMPVAR9}" |html2text -width 72 -nobs -style pretty ; :`
:0fw
* ? test -n "${TMPVAR9}"
| ( formail -i "Subject: [${TMPVAR2}] ${MSG_SUBJECT}" ; \
echo
"========================================================================${NL}"
; \
echo "PDF URL: ${TMPVAR4}${NL}" ; \
echo
"========================================================================${NL}"
; \
echo "${TMPVAR9}" )
}
:0
.Business.USA.Maxim.App_Notes/
}
<snip>
------------------------------------------------------------------------
At home I can use temporary files, but not on my Mail-Backup.
Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant
--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
<http://www.tamay-dogan.net/> Michelle Konzack
<http://www.can4linux.org/> Apt. 917
<http://www.flexray4linux.org/> 50, rue de Soultz
Jabber linux4michelle(_at_)jabber(_dot_)ccc(_dot_)de 67100
Strabourg/France
IRC #Debian (irc.icq.com) Tel. DE: +49 177 9351947
ICQ #328449886 Tel. FR: +33 6 61925193
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail