On Sat, 2010-01-30 at 23:51 +0100, Michelle Konzack wrote:
Hello,
Am 2010-01-30 13:11:46, schrieb JW Simpson:
There are several programs already available that you can call from
procmail for this:
w3m -> http://w3m.sourceforge.net/
w3mmee -> http://pub.ks-and-ks.ne.jp/prog/w3mmee/
links -> http://links.sourceforge.net/
elinks -> http://elinks.or.cz/about.html
html2text -> http://man-wiki.net/index.php/1:html2text
lynx -> http://man-wiki.net/index.php/1:lynx
I am alredy using html2text, but if you fetch the HTML page you have it
in memory and then I have to pipe it through 3 different filters because
the pages are not always in the same format.
If I use your recipe the pages are unreadable...
First I have to use a SED script to cut from the beging to a marker and
then from a second marker to the end and add new <html>...</html> tags.
And now I can use html2text to get the stuf I like.
The problem ist here:
TMPVAR1=`mimedecode |grep --max-count=1 "^URL: " |sed 's|^URL: ||' ; :`
TMPVAR2=`echo "${TMPVAR1}" |sed 's|.*an_pk/||' |tr -d ' ' ; :`
TMPVAR3=`wget --user-agent="tdtools-procmail v${TDTP_VERSION}" --quiet -O
- ${TMPVAR1} ; :`
TMPVAR4=`echo -e "${TMPVAR3}" |grep 'pdfserv' |head -n1 |sed
's|.*http://pdfserv|http://pdfserv|' |sed 's|\.pdf.*|.pdf|' ; :`
TMPVAR5=`echo -e "${TMPVAR3}" |sed -n '/APPLICATION NOTE /,$p'
|sed 's|.*APPLICATION
NOTE |<html><head></head><body><table><tr><td>APPLICATION NOTE |' ; :`
TMPVAR5=`echo -e "${TMPVAR3}" |sed -n '/REFERENCE DESIGN /,$p'
|sed 's|.*REFERENCE
DESIGN |<html><head></head><body><table><tr><td>REFERENCE DESIGN |' ; :`
TMPVAR9=`echo -e "${TMPVAR5}" |sed '/<!-- BEGIN: EE-MAIL -->/,//d' ; :`
TMPVAR9="${TMPVAR9}${NL}</table></body></html>"
TMPVAR9=`echo "${TMPVAR9}" |html2text -width 72 -nobs -style pretty ; :`
At home I can use temporary files, but not on my Mail-Backup.
You need to be freeing up these variables as you go ...
To free the variable just name it on a line by itself.
You might also want to consider using an external program for the
parsing.
Here is another example showing inline usage of awk code, where I have
an extremely large variable TEST, that I eliminate when I no longer need
it:
[john(_at_)bx1]# H:~> cat /home/rln/filters/Common/04_quote_counter.rc
# swa-percentage-quoted.rc
#
# Calculate percentage of quoted material, reject if at or above the
limit
#
# Copyright John WS Hibbs, SwaJime's Cove, 2009
# Date: 7-Nov-2009
# Version 1.0.4
#
# 7-Nov-2009: Removed "Bcc:" header from reminder notice.
#
oldVERBOSE=$VERBOSE
VERBOSE=yes
:0 fhw
* ! H ?? ^Content-Type:.*text/plain
{
# | /usr/bin/formail -A "X-swa-percentage-quoted: only text/plain
content type is supported by this procmail recipe"
LOG="$_: only text/plain content type is supported by this procmail
recipe"
}
:0 E
* H ?? ^Content-Type:.*text/plain
{
LIMIT=80
# This is just to write calculations to the log ... it can be
commented out (please don't delete it)
:0
{
# We are going to do some heavy lifting ... make absollutely sure
we have enough memory to work in
saved_LINEBUF=$LINEBUF
LINEBUF=876543210
TEST=`/usr/bin/formail -I "" -s /bin/awk 'BEGIN
{ OM=0; L=0; N=0; Q=0; printf("\n") }
/[Oo]riginal.+[Mm]essage/
{ OM = 1; sprintf("---\t\t\t%s\n",$0); next }
/^([[:space:]]*>?)*(At|On) .* wrote:$/ || /^([:space:]*>)*$/
{ sprintf("---\t\t\t%s\n",$0); next }
OM == 1 || /^([[:space:]]*>)+[[:space:]]*[0-9a-z_A-Z]+/
{ Q+=length;L+=1+int(length($0)/80) }
OM == 0 && /^[^>]*[0-9a-zA-Z]+/
{ N+=length;L+=1+int(length($0)/80) }
{ printf("L=%d\tN=%d\tQ=%d\t%s\n",L,N,Q,$0) }
END { printf("Lines: %d\nNew characters: %d\nQuoted characters:
%d\nPercentage Quoted: %d%%\n\n",L,N,Q,100*Q/(N+Q)) }'`
# Recover memory and reset LINEBUF (default was 2048) to avoid
hogging memory
TEST
LINEBUF=$saved_LINEBUF
saved_LINEBUF
}
# LNQP returns a string -> "LINECOUNT:NEWTEXT:QUOTED:PERCENTAGE"
LNQP=`/usr/bin/formail -I "" -s /bin/awk ' BEGIN
{ OM=0; L=0; N=0; Q=0 }
/[Oo]riginal.+[Mm]essage/
{ OM=1; next }
/^([[:space:]]*>?)*(At|On) .* wrote:$/ || /^([:space:]*>)*$/ {
next }
OM == 1 || /^([[:space:]]*>)+[[:space:]]*[0-9a-z_A-Z]+/
{ Q+=length; L+=1+int(length($0)/80) }
OM == 0 && /^[^>]*[0-9a-zA-Z]+/
{ N+=length; L+=1+int(length($0)/80) }
END
{ printf("%d:%d:%d:%d",L,N,Q,100*Q/(N+Q)) }'`
:0 fhw
| /usr/bin/formail -A "X-swa-LNQP: $LNQP"
#
# ... rest of file not relevant to thread
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail