nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] IMAP testing, again

2017-11-09 12:22:20
Hi Ken,

BTW, it turns out it takes on average 0.66 seconds to append a message
to a Gmail mailbox

So refiling the half a dozen emails that are a thread I've just finished
with would be about six seconds because that's appending?  (I don't know
IMAP.)

(tls-encrypted) => A2 SELECT "Enron"
...
(tls-decrypted) <= * 480832 EXISTS
...
Command (SELECT) execution time: 4.457428 sec

I don't even have an idea how long it would have taken for nmh to do a
readdir() on a directory with that many files.

On my normal slow machine on ext4, reading the current directory of `.',
`..', and files {1..480832} takes

    $ time ls -f >/dev/null

    real    0m1.184s
    user    0m0.599s
    sys     0m0.585s
    $

A remote VM with network-block-device ext3 goes a bit quicker, I think
because of its better CPU.

    real    0m0.903s
    user    0m0.308s
    sys     0m0.592s

Both have had the command previously run recently, but then manipulation
of an nmh folder tends to keep the interesting data in memory.

This is the bare minimum currently needed to work out `last:42' which is
of interest to those that either work at the start of end of a folder,
or on a folder that is kept small.

Strangely, the speed of adding messages to that mailbox seemed to not
depend on the number of messages in the mailbox

You're thinking it would be O(n log n) or similar so you'd see an
effect?

Not quoting some lines:
Performing a scan equvalent on that many messages kind of bogs down
also:

% imaptest +Enron 'FETCH 1:5000 (FLAGS RFC822.SIZE BODY.PEEK[HEADER.FIELDS 
(FROM TO SUBJECT DATE)] BODY.PEEK[TEXT]<0.80>)' -timestamp 
Command (FETCH) execution time: 44.801250 sec
Total elapsed time: 49.410705 sec

Compared to the performance of the Cyrus-SASL archives, that's kind of
disappointing.  But the mailbox is 40x bigger, so maybe that's the
issue.

% imaptest +Enron2 'FETCH 1:* (FLAGS RFC822.SIZE BODY.PEEK[HEADER.FIELDS 
(FROM TO SUBJECT DATE)] BODY.PEEK[TEXT]<0.80>)' -timestamp
Command (FETCH) execution time: 88.653104 sec
Total elapsed time: 88.880982 sec

I make that

     5,000 from 480,832 is 44.8 s fetch, 49.4 s total.
    10,426 from  10,426 is 88.7 s fetch, 88.9 s total.
        ×2.09              ×1.98         ×1.80

So it looks fairly linear.

- But ... where things are a win is here (on the original "Enron"
folder)

(tls-encrypted) => A3 SEARCH TEXT "corruption"
(tls-decrypted) <= * SEARCH [... whole lot of entries ...]
(tls-decrypted) <= A3 OK SEARCH completed (Success)
Command (SEARCH) execution time: 0.136124 sec

I doubt we could ever achieve that kind of performance on that many
messages, and I guess this makes it clear where Google is putting
their energy.

Google is indexing the emails and the search is consulting that index.
We could do the same, or the user could use one of the existing email
indexers on their ~/mail, or we could integrate one of those if it's
installed.

And what are "typical" operations?  Do people really want to scan(1) a
folder with a half-million messages in it? Or do they really want to
run "pick" on it and only look at a few?

That want to scan a few of the many, over and over as they intersperse
other commands, show quite a few, repl some, refile some, comp, and
sometimes pick.  They don't bother picking if it's in `scan last:40'
that fills the screen.  That's how I think I use them.  I don't run
acct(2), but those that do might like to retrieve the frequencies of
nmh's commands, e.g. lastcomm(1).

Talking of picking, I still use it when the condition is more complex,
or needs the body, but for idle "I'm sure I talked to Ken about IMAP"
browsing I build a mail-index file occasionally that has paragraphs for
all my older emails that look like

    inbox-15867 from Ken Hornstein <kenh@pobox.com>
    inbox-15867 to nmh-workers@nongnu.org
    inbox-15867 subject [Nmh-workers] IMAP testing, again
    inbox-15867 date 2017-11-09 03:32:48 +0000 Thu
    inbox-15867 message-id <20171109033249.B521FA4F8A@pb-smtp2.pobox.com>
    inbox-15867 list-id "Discussion of nmh development, and help for new users" 
<nmh-workers.nongnu.org>
    inbox-15867 return-path 
<nmh-workers-bounces+ralph=inputplus.co.uk@nongnu.org>
    inbox-15867 x-envelope-to ralph@inputplus.co.uk

It's straightforward to awk this, RS="", to print paragraphs that are
`date 2017', and that filters into another awk, grep, etc.  Finally,
into a per-folder pick if required.  :-)  The speed boon, of course, is
avoiding reading all N files instead of one.  I also have sequences
yYYYY, e.g. y2015, so make it easy to limit pick's reading of an archive
to a year or few.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>