(I'm back from out-of-town and catching up on email)
On April 12, 2007 at 20:34, "Jeff Breidenbach" wrote:
Does this look reasonable to people? Anything obviously
Total Elapsed Time = 7.334524 Seconds
User+System Time = 3.794524 Seconds
%Time ExclSec CumulS #Calls sec/call Csec/c Name
20.3 0.773 1.455 7 0.1104 0.2079 mhonarc::sort_messages
Sorting does not surprise me. MHonArc does not keep a persistent
sorted data structure, so it resorts everytime new messages are added
(under the assumption that messages may come in in arbitrary order).
This can definitely be painful if one updates an archive on-the-fly
versus doing a queuing-batch model. In the latter, multiple messages
may be added in a single invocations, avoiding the resorting for
each message added.
Do you invoke mhonarc for each new message for a list or do you
queue up messages for a given list (over a specified period) before
invoking mhonarc for the list?
Note, sorting includes thread sorting, which is the most complicated.
Some speed increase may be possible by disabling SUBJECTTHREADS
(this is mentioned in the Performance Tips doc). However, disabling
SUBJECTTHREADS may have a usability impact for messages that fail
to define the proper reference headers.
For large scale usage, a (robust) persistent data structure is
needed. However, such a structure would require a redesign of
18.6 0.707 0.707 446811 0.0000 0.0000 mhonarc::get_time_from_index
This is due to the Perl 4 legacy code base. The unique index for
each message also contains the date-time stamp applicable for the
It may be possible to add in a new hash to just maintain the date-time
information to avoid the split() operation each time get_time_from_index
is invoked. This will cause an increase in the database size (and
in memory size), but it may be negligable in the grand-scheme of
I think when mhonarc was first written (and it was not called mhonarc),
I favored reducing the numbering of hashes used versus performance
gains (since performance was not a real issue since I did not forsee
mhonarc being used at such a large scale).
14.7 0.558 0.558 4805 0.0001 0.0001 MHonArc::RFC822::tokenise
This code is non-trivial since it does full RFC-822 parsing.
Older versions of mhonarc used to use a more simple parsing routine,
but a more robust routine was required as mhonarc evolved (and
to address bugs in email name add address extraction).
14.4 0.548 2.264 13800 0.0000 0.0002 mhonarc::replace_li_var
Minimizing variable usage in resource files is the main way to
reduce the calls to this routine. However, resource file maintenance
concerns may trump any performance hit gained.
5.09 0.193 0.193 13037 0.0000 0.0000 mhonarc::compute_msg_pos
This is part of resource variable resolution. See
on how to minimize the performance impact of this routine.
4.77 0.181 0.561 9538 0.0000 0.0001 MHonArc::UTF8::Encode::clip
This actually is more efficient than using the default CHARSETCONVERTERS
I.e. Encoding everything to UTF-8 is more efficient (assuming
proper resource settings). In MHonArc's default configuration,
charset conversion can be very costly when dealing with non-ASCII
Years ago, I discovered this when doing my own profiling tests
on MHonArc when performance complaints were raised when more
extensive charset routines were added.
4.48 0.170 0.319 1 0.1700 0.3193 mhonarc::get_resources
This loads in the resource file(s).