mhonarc-users

Re: Message body size limits? (Bigger Problem)

2011-12-04 01:03:41
On Sun, Dec 4, 2011 at 12:16 AM, Tom Hutchison <tom(_at_)hutch4(_dot_)us> wrote:
Is it possible some malformed email could be causing a parsing error? What I
am getting at. If I have 250 emails in a folder, how is it the run on the
folder is writing 260. The extra ten being date and subject blank,
sometimes, and sometimes, with or without content.

Your problem is likely related to the following FAQ entry:

http://www.mhonarc.org/MHonArc/doc/faq/archives.html#split

When the parser reads them, is it possible Mhonarc is picking up on
malformed reply quotes and thinks they are new emails within the actual
email? So instead of 4 emails in the above example, it thinks there are 6.
Garbage in, garbage out comes to mind.

I did solve the broken HTML, not very efficently with Outlook 2010 as it
does allow for a striping of all HTML code by setting the open email to
“edit” then choosing “plain text”  after you edit anything in the body of

IIRC, Outlook allows a text/plain alternative to be generated along
with the HTML part.  You can use the MIMEALTPREFS resource, as noted
in the FAQ, to give higher precedence to text/plain over text/html.

the email. Even if it is just a carriage return or a space. Close the email
and save on exit and the whole email is rewritten, stripping out all HTML
and resetting the header information to show “plain/text” and whatever you
have the encoding set to. Stripping out all HTML from the emails was the
only way I could think of to solve the unclosed <table> attribute in quite a
few emails which was causing problems with the msgxxx.html pages.

It’s long past time for standardized header and html format for email. If
anything it might secure them more...

text/enriched was created a long time ago to provide enhanced formatting
of email messages, but it faded away when the Web grew and HTML became
a defacto markup format for "enriched" text.

IMO, it is inexcusable for major software/services organizations to generate
such malformed HTML.  Dealing with malicious HTML is one thing, but when
non-malicious-generated HTML is so badly formatted (when it should not
be) it makes the lives of consumers of such content much more difficult.

--ewh