nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] MH-W intro/help request

2014-12-04 01:44:36
Ken Hornstein writes:

Specifically, I was testing it on a very large folder of approx 100K
messages.  Both the "mark ..." and a "show N" invokation take about
the more than 1/10th of a second on average, even for extremely short
outputs.

Okay, yeah, THAT makes sense.  Pretty much every command calls folder_read(),
and I am 99% sure the problem there is doing a readdir() on that super huge
directory (it's not performing a stat() on every file, though).  Obviously
the output size isn't the problem.  Note that maybe the problem is we do
something stupid with malloc or something else; it might be interesting
to see how long things like "ls" take in that directory (running ls without
stat()ing any files, of course); if it's significantly faster then maybe
we can do better.

It's not faster to do, say "ls -1 > /dev/null".  It is similar overall
perf.


We're kind of in a tough spot here.  Sequences can contain entries for
messages that don't exist; the way that gets resolved is by reading the
directory and removing any files from the sequence list when the folder
data structure is built. mark(1) isn't just reading the sequence file
and printing out the exact line; it's calling seq_print(), which is
the same routine that the sequence routines use to output the sequence
structure.  Getting the sequence list without actually reading the
folder ... well, it's possible, but it would require some surgery.

This seems wrong...  for example, as I make the MIME stuff work
better I'll be extracting many separate components from an email.
I just measured it, and for that kind of large folder, displaying 1
email could easily get to be 1+ seconds of cumulative time, which
would likely make this unacceptable.

Is there any way I can completely avoid the giant folder check?  I
can't think of why it is being done time after time for simple
program invokations that, for example, refer to a specifically
enumerated message.  Obviously *asking* for some relative message
list ID like "last" would need to check the directory to find
which message number that is referring to, but it would be easy to
do that in one step, always referring to the number after that.

[NOTE: I suspect I'm getting into the "we've talked/fought/etc. about
 this many times before" territory and it may not be worth discussing
 on the list right now ... since this issue is arguably orthogonal
 of what I'm doing]


--
    Erich Stefan Boleyn     <erich(_at_)uruk(_dot_)org>     http://www.uruk.org/
"Reality is truly stranger than fiction; Probably why fiction is so popular"

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>