On Jul 25, 2019, at 4:25 PM, Ken Hornstein <kenh@pobox.com> wrote:
Once in a while I download email archives of some mailing list
and unpack them using "inc -file <archive-file>". But more
than once I have seen that inc gets confused and doesn't
unpack the whole thing. The cause seems to be a line starting
with From in some message body. Ideally inc should look that
a "From ..." line is immediately followed by header lines.
And if this is not the case, assume it is in the message body.
Ralph answered this, but let me expand a bit.
The job of inc(1) is to incorporate messages from a 'mail drop' into your
MH mailbox. Traditionally it handles mbox-style files and POP (it also
does MMDF, but let us not speak of that).
As you can see from the Wikipedia entry Ralph linked to, all of the
various mbox formats use the same scheme: a line beginning with "From
" is the mailbox delimiter (mboxcl and mboxcl2 uses a Content-Length
header; I believe they are officially dead at this point). The big
differences are in quoting rules. Unfortunately since we're kind of
locked in to the mbox format in inc(1) at least, changing that would
have some nasty consequences (Ralph gave you an example of a message
that it would break on but I am sure there are others). I think your
best bet is to preprocess these mailing list archives so they are valid
mbox files.
Thanks, Ralph & Ken. The site from where I downloaded the latest
email archive uses mailman so I was a bit surprised. The method
I suggested would make inc able to handle a larger set of inputs.
While there can still be false positives, the number of messages
matching
From ... [0-9]$
<mail header>:
is likely to be much much smaller than a random line starting with
"From " and ending in a digit. Still, I can understand the reluctance
to add this logic to inc.
--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers