procmail
[Top] [All Lists]

Re: How to let procmail use mre memory?

2010-01-30 17:41:30
On Sat, 2010-01-30 at 23:51 +0100, Michelle Konzack wrote:
Hello,

Am 2010-01-30 13:11:46, schrieb JW Simpson:
There are several programs already available that you can call from
procmail for this:
        w3m       ->    http://w3m.sourceforge.net/
        w3mmee    ->    http://pub.ks-and-ks.ne.jp/prog/w3mmee/
        links     ->    http://links.sourceforge.net/
        elinks    ->    http://elinks.or.cz/about.html
        html2text ->    http://man-wiki.net/index.php/1:html2text
        lynx      ->    http://man-wiki.net/index.php/1:lynx

I am alredy using html2text, but if you fetch the HTML page you have  it
in memory and then I have to pipe it through 3 different filters because
the pages are not always in the same format.

If I use your recipe the pages are unreadable...

First I have to use a SED script to cut from the beging to a marker  and
then from a second marker to the end and add new <html>...</html> tags.

And now I can use html2text to get the stuf I like.

The problem ist here:

    TMPVAR1=`mimedecode |grep --max-count=1 "^URL: " |sed 's|^URL: ||' ; :`
    TMPVAR2=`echo "${TMPVAR1}" |sed 's|.*an_pk/||' |tr -d ' ' ; :`
    TMPVAR3=`wget --user-agent="tdtools-procmail v${TDTP_VERSION}" --quiet -O 
- ${TMPVAR1} ; :`
      TMPVAR4=`echo -e "${TMPVAR3}" |grep 'pdfserv' |head -n1 |sed 
's|.*http://pdfserv|http://pdfserv|' |sed 's|\.pdf.*|.pdf|' ; :`
      TMPVAR5=`echo -e "${TMPVAR3}" |sed -n '/APPLICATION NOTE&nbsp;/,$p' 
|sed 's|.*APPLICATION 
NOTE&nbsp;|<html><head></head><body><table><tr><td>APPLICATION NOTE |' ; :`
        TMPVAR5=`echo -e "${TMPVAR3}" |sed -n '/REFERENCE DESIGN&nbsp;/,$p' 
|sed 's|.*REFERENCE 
DESIGN&nbsp;|<html><head></head><body><table><tr><td>REFERENCE DESIGN |' ; :`
      TMPVAR9=`echo -e "${TMPVAR5}" |sed '/<!-- BEGIN: EE-MAIL -->/,//d' ; :`
        TMPVAR9="${TMPVAR9}${NL}</table></body></html>"
      TMPVAR9=`echo "${TMPVAR9}" |html2text -width 72 -nobs -style pretty ; :`

At home I can use temporary files, but not on my Mail-Backup.

You need to be freeing up these variables as you go ...
To free the variable just name it on a line by itself.
You might also want to consider using an external program for the
parsing.
Here is another example showing inline usage of awk code, where I have
an extremely large variable TEST, that I eliminate when I no longer need
it:

[john(_at_)bx1]# H:~> cat /home/rln/filters/Common/04_quote_counter.rc
# swa-percentage-quoted.rc
#
# Calculate percentage of quoted material, reject if at or above the
limit
#
#   Copyright John WS Hibbs, SwaJime's Cove, 2009
#   Date: 7-Nov-2009
#   Version 1.0.4
#   
#   7-Nov-2009: Removed "Bcc:" header from reminder notice.
#

oldVERBOSE=$VERBOSE
VERBOSE=yes

:0 fhw
* ! H ?? ^Content-Type:.*text/plain
{
   # | /usr/bin/formail -A "X-swa-percentage-quoted: only text/plain
content type is supported by this procmail recipe"
   LOG="$_: only text/plain content type is supported by this procmail
recipe"
}

:0 E
* H ?? ^Content-Type:.*text/plain
{
   LIMIT=80

   # This is just to write calculations to the log ... it can be
commented out (please don't delete it)
   :0
   {
      # We are going to do some heavy lifting ... make absollutely sure
we have enough memory to work in
      saved_LINEBUF=$LINEBUF
      LINEBUF=876543210
      TEST=`/usr/bin/formail -I "" -s /bin/awk 'BEGIN
{ OM=0; L=0; N=0; Q=0; printf("\n") }
            /[Oo]riginal.+[Mm]essage/
{ OM = 1; sprintf("---\t\t\t%s\n",$0); next }
            /^([[:space:]]*>?)*(At|On) .* wrote:$/ || /^([:space:]*>)*$/
{         sprintf("---\t\t\t%s\n",$0); next }
            OM == 1 || /^([[:space:]]*>)+[[:space:]]*[0-9a-z_A-Z]+/
{ Q+=length;L+=1+int(length($0)/80) }
            OM == 0 && /^[^>]*[0-9a-zA-Z]+/
{ N+=length;L+=1+int(length($0)/80) }

{ printf("L=%d\tN=%d\tQ=%d\t%s\n",L,N,Q,$0) }
         END { printf("Lines: %d\nNew characters: %d\nQuoted characters:
%d\nPercentage Quoted: %d%%\n\n",L,N,Q,100*Q/(N+Q)) }'`
      # Recover memory and reset LINEBUF (default was 2048) to avoid
hogging memory
      TEST
      LINEBUF=$saved_LINEBUF
      saved_LINEBUF
   }

   # LNQP returns a string -> "LINECOUNT:NEWTEXT:QUOTED:PERCENTAGE"
   LNQP=`/usr/bin/formail -I "" -s /bin/awk '   BEGIN
{ OM=0; L=0; N=0; Q=0 } 
         /[Oo]riginal.+[Mm]essage/
{ OM=1; next } 
         /^([[:space:]]*>?)*(At|On) .* wrote:$/ || /^([:space:]*>)*$/ {
next } 
         OM == 1 || /^([[:space:]]*>)+[[:space:]]*[0-9a-z_A-Z]+/
{ Q+=length; L+=1+int(length($0)/80) } 
         OM == 0 && /^[^>]*[0-9a-zA-Z]+/
{ N+=length; L+=1+int(length($0)/80) } 
      END
{ printf("%d:%d:%d:%d",L,N,Q,100*Q/(N+Q)) }'`
   
   :0 fhw
   | /usr/bin/formail -A "X-swa-LNQP: $LNQP"
#
# ... rest of file not relevant to thread

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail