On October 31, 2006 at 11:58, Jim Barber wrote:
I've gone through the mail archives but can't find a solution to my problem.
I've been using mhonarc to process emails that are being generated by nightly
builds of products within the company I work for.
This has been fine until recently when the developers enabled more verbose lo
gging of the builds within their tool.
The mail messages that are being processed are stored in maildir format and t
hey are sent as an email with a html attachment that is to be processed.
Looking at the raw format of the emails the html is encoded using base64.
The MS Visual Build Professional build tool is now producing ridiculously lar
ge HTML reports up to 34 MB in size (typical MS tool I guess).
Unfortunately I don't have control over that.
I would venture that the HTML filter may be the problem.
The HTML filter is probably not the most efficient, especially since
it attempts to strip all kinds of markup that can be sources of
In the past, the comment declaration filtering had to be modified
since the initial Perl regex would gobble up all kinds of memory
on an reasonably-sized HTML entity.
<!-- Leave text/plain attachements as attachments and don't expand them
m2h_text_html::filter; allowcomments allowscript
You may want to write your own simplified HTML filter if security
issues with HTML is not a concern (i.e. you trust the source), and
see if this helps in your memory problems.
As a test, you could assign m2h_text_plain::filter to text/html
data just to see if memory could be a culprit. Or, even better,
assign m2h_external::filter for text/html since it just saves
the data to a separate file. If you do not get any memory problems
with either test, then m2h_text_html::filter is highly likely
the source of the problem.