On Wed, Jul 23, 2003 at 09:53:34AM -0500, David W. Tamkin wrote:
Jeffrey Parker had,
P>INBOXSIZE=`ls -l ${ORGMAIL} 2>/dev/null | awk '{print $5}'`
Dallman Ross suggested,
R>Why not just
R> INBOXSIZE = `wc -c < $ORGMAIL 2>/dev/null`
Paul Chvostek commented,
C> I think it's debatable whether you generate more load by parsing the
C> directory listing through a two-part pipe versus shovelling the
C> entire mailbox into a smaller pipe. Of the two solutions, I'd go for
C> the directory entry, so that a 20MB mailbox didn't generate 20MB of
C> data being read from the disk every time a message comes in.
OK, but since both approaches invoke $SHELL anyway, I think we can get
the best of both worlds:
:0i
INBOXSIZE=| set -- `ls -l $ORGMAIL` ; echo $5
or if you use anything in the csh family as $SHELL,
:0i
INBOXSIZE=| sh -c 'set -- `ls -l $ORGMAIL` ; echo $5'
I'm not sending stderr to /dev/null, because if $ORGMAIL can't be ls'ed,
I'd like the logfile to tell me!
Paul continued,
C> As I've mentioned before, my solution to this was to write a small C
C> program which is lighter-weight than either of these approached.
C> It's at http://www.it.ca/software/fsizecompare.c for your viewing
C> pleasure.
I've modified Paul's program just a wee bit to simply print the
filesize (instead of doing a compare, etc.) and run some benchmarks
just for fun:
1 megabyte file on different hardware, different versions of perl:
Benchmark: timing 1000 iterations of fsize, ls_awk, perl_5_00503, perl_5_6_1,
perl_5_8_0, set_ls, wc, wc_awk...
fsize: 2 secs ( 0.08 usr 0.20 sys + 0.21 cusr 1.93 csys = 2.42 CPU)
@ 3555.56/s
set_ls: 3 secs ( 0.12 usr 0.17 sys + 0.16 cusr 3.34 csys = 3.80 CPU)
@ 3368.42/s
perl_5_6_1: 9 secs ( 0.12 usr 0.20 sys + 2.17 cusr 6.52 csys = 9.00 CPU)
@ 3200.00/s
ls_awk: 10 secs ( 0.13 usr 0.17 sys + 1.03 cusr 7.80 csys = 9.13 CPU)
@ 3282.05/s
wc_awk: 10 secs ( 0.09 usr 0.20 sys + 1.36 cusr 8.70 csys = 10.35 CPU)
@ 3368.42/s
perl_5_8_0: 10 secs ( 0.12 usr 0.20 sys + 2.88 cusr 6.12 csys = 9.31 CPU)
@ 3121.95/s
perl_5_00503: 11 secs ( 0.10 usr 0.21 sys + 5.15 cusr 5.53 csys = 10.99 CPU)
@ 3200.00/s
wc: 77 secs ( 0.12 usr 0.19 sys + 68.18 cusr 7.05 csys = 75.53 CPU)
@ 3282.05/s
Legend:
fsize: Paul Chvostek's filesize program (slightly modified as described above)
set_ls: David Tamkin's set/ls/echo pipeline
perl *: <<perl -e 'print -s "$ORGMAIL"'>> with various versions of perl
ls_awk: the ls/awk pipeline (this one was mine once upon a time; it's
on my website, but I'm sure others have thought of it too)
wc_awk: using a wc/awk pipeline
wc: Dallman Ross's wc trick
Summary:
I ran these tests on 1M, 50M, 500M and 1G files. I've found that using
'wc' in any fashion is simple, but its runtime increases linearly with
the size of the file being analyzed. So while the wc_awk test beat
several others in the 1M file test, it came tied for last in the 50M
tests and did not finish after 1 hour in the 500M test.
Paul's C program is the fastest, usually beating the next shell
contender by one or two wallclock seconds. Surprisingly quick is David
Tamkin's clever Bourne shell 'set' trick; David reminds us of the
inexhaustible Unix creed (adopted by Perl): there's more than one way
to do it (and some ways are better than others).
If anyone wants the actual test program (in Perl), let me know
offlist.
Scott
--
Scott Wiersdorf
scott(_at_)perlcode(_dot_)org
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail