After culling a lot of trash from my list's archives, I packed
the many small files into a much smaller set of larger ones,
so that I could index into them and retrieve articles by subject
reasonably efficiently. I'm worried, though that there may be
a much more efficient packing algorithm than the one I used,
and I thought I'd poll the list here for ideas.
Here's what I did:
$files contains a list of the small gzipped files being packed.
--------------------
Archiveno=1
cat /dev/null >$digout
while [ -r $files ]
do
read f
if [ "$f" = EOF ]; then
break; fi
gzip -dc $f >$digin
while [ -s $digin ]
do
formail -1 -dfs <$digin >>$digout
cat /dev/null >$tmp
formail +1 -dfs <$digin >>$tmp
cat $tmp >$digin
if [ `wc -l <$digout` -gt $Filesize ]; then
gzip -9c $digout >$archivedir/$Newname$Archiveno.gz
cat /dev/null >$digout
Archiveno=`expr $Archiveno + 1`
fi
done
done <$files
-------------------
It's the formail -1... formail +1 part that worries me.
It seems like a nice straightforward way to peel articles
out of a mail folder, but is there a significantly more
efficient way to do it?
Thanks for any advice,
jimo(_at_)eskimo(_dot_)com