procmail
[Top] [All Lists]

horribly slow converting 3.4mb of news to mail

2001-12-15 22:47:08
I suspect something or several things in my scripting are the cause of
the sloth in this process.  Or maybe it is just endemic.

I've setup a process that converts downloaded news to mail and
delivers it to a maildir directory (maildir style  with tmp new cur
setup)

The idea here is for my script to cd to the end directory in a
newsgroup path, and process the files it finds there. The paths look
like this example:
   $base_news/comp/unix/questions/NUMBER 

The script tries to generate maildir names from the paths in the news
hierarchy.   That part seems to be working ok.
   $base_news/comp/unix/questions  becomes
   $MAILDIR/comp.unix.questions
(with the maildir style tmp new cur directories under it)

It all works, the base directories are created if not already present
and the messages end up in $MAILDIR/news.group/new/MESSAGE.  Just like
they are supposed to.

However, it is taking something like 10-11 minutes to process 1100
messages.  This is on an athlon 1.2ghz (running solaris 8 (intel).
And that is with the news files already on disk.

I suspect I'm doing some huffing and puffing in the script that could
be done some more econmical way.  An actual timed run against 
3.4MB of news messages (5-6 were kind of large but mostly average)
Some 1100 messages.

$time proc_test
real    11m5.526s
user    2m40.450s
sys     6m38.550s

Isn't that time a little excessive? (athlon 1.2ghz (not in a busy state)) 

Script and .procmailrc included below.

Note: There is a small awk script not shown that adds a `From ' line
and copies the Xref line to an X-Save-Xref: header. (I used awk
because it seemed in my experiments that formail isn't that good at
recognizing the nntp output, even with the -d flag and -m5 set.  I
still got some messages split in odd places).  That doesn't happen
using awk.

cat proc_test:
========================================
!/bin/ksh
PATH=/usr/local/bin:$PATH:/export/home/reader/scripts
base=$HOME/projects/proc/suck_test
news_source=$base/news
host=$(hostname)
AWK=/bin/nawk
maildir=$HOME/n2md

cd $news_source

## create an array of newsgroup paths by extracting unique paths from the
## news hierarchy
set -A nntp_d $(find $base/news -type f -name '*[0-9]' \
              |$AWK -F"/news/|/[0-9]+$" '
              /[0-9]+$/{print $2}'|uniq)

## Finish creating  newsgroup names in maildir by replacing "/"
## with ".".  Create the correct directories if necessary by passing
## delivery variable to procmail
for source_d in ${nntp_d[(_at_)]}; do
    delivery="$HOME/n2md/$(echo $source_d|sed 's:/:.:g')"
    
## Feed the files to awk|procmail by iteration'
   cd $news_source/$source_d
   for message in $(ls *[0-9]);do
       $AWK  -f $base/proc.awk  "host=$host" $message |procmail -m 
DELIVERY=$delivery MAILDIR=$maildir $base/.proc_maildir_test  

## Get rid of the source messages
#       rm -f "$message"
   done
done
=========================================

.proc_maildir_test
========================================
## --*-shell-script-*--
PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin
SHELL=/bin/ksh
BASE=$HOME/projects/proc/suck_test
LOGABSTRACT=ALL
ORGMAIL=$BASE/$LOGNAME
DEFAULT=$ORGMAIL
VERBOSE=YES
LOGFILE=$BASE/procmail_maildir.log
LOG="`date +'%b %d %T %w '`
"
TRAP='formail  -XMessage-Id:'

 :0
 $DELIVERY/
========================================
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>