nmh-workers
[Top] [All Lists]

Re: Very large folderTo:

2021-06-05 20:59:51
Starting in late 2014 I have stopped deleting messages, putting them in a
directory, +gone, which now contains 465,147 messages and uses about 17
gigabytes. The bulk of these messages were of transitory or of less interest
to me. But they include 1,702 messages from my daughter. They were almost all
of no interest or use to me within a day or two of when she sent them. But she
recently died (the worst thing by far that's ever happened to me). Now every
byte she ever wrote is precious to me. So I am glad that I stopped deleting
messages that I no longer care about.

First off, please accept my sympathies for this unimaginable tragedy.

So, what is the likelihood of such a bug? Does anybody have any experience
dealing with such large folders?

I can't think of any _buffer overflows_ that might happen; this isn't
anything out of the ordinary, except that it's a very large number of
messages.  What I think you might bump up against are virtual memory
limits, but even then I suspect you're fine.

There's a number of things that are allocated when a folder is read
(in the function folder_read()).  From what I see, the ones that are
affected by the number of messages in the folder are:

- The "message number" array, which holds the message number for each
  message.  That's an int, so 4 bytes per message on most platforms.
  But it is free()d after folder_read() is done, which seems ....
  sub-optimal?  Doing better here might be hard, though.  It would certainly
  be more complex.  We could do something smarter about message numbers
  that are contiguous that would cut down on this memory usage a lot.

- The msgstats array, which is ... an array of struct bvector.  A struct
  bvector looks like .. a pointer, size_t, two unsigned long.  Call it
  32 bytes on a 64 bit platform, maybe?  It looks like we only set 4
  bits possible for each message, so we don't use anything more than
  that size; with the exception of sequence membership flags.  If you
  have a lot of sequences in that folder, it's possible you could get
  something more than that (you'd need ... more than 60 sequences in
  a single folder before it affected anything).  It's possible my
  quick math is wrong, but I think that it's probably close.

So by my count, that's 1.9 MB of memory that gets free()d and 
14.9 MB of memory for that folder's structure.  Which, in 2021, does not
seem like a lot!  MH and nmh were always a bit casual with memory
management since all of the programs are short-lived, but I think you
should be fine.  All of the calls to malloc() are wrapped using
mh_xmalloc() and friends which call die() if a call to malloc() fails.

--Ken


<Prev in Thread] Current Thread [Next in Thread>