nmh-workers
[Top] [All Lists]

[Nmh-workers] nmh internals: full MIME integration

2014-07-25 14:05:32
Warning: this email will be discussing technical details of nmh
internals and if you haven't paid that much attention to those sorts of
things it will likely go over your head.  The changes discussed here
wouldn't directly affect nmh behaavior (although it would make a lot of
things a lot easier to do, which would result in behavior changes).  I
think any behavior changes that came out of this would be positive,
but as we've seen recently it's hard to make everyone happy.

So, in short what I'm proposing is: full MIME integration.  What do I mean
by that?  Well, right now the internal nmh API has a reasonable abstraction
for dealing with folders; you have some higher level APIs to read a folder,
you manipulate these data structures to make changes, and you don't have to
really deal with any of the details of the backend.  Really, from an
API perspective it could (mostly) be stored in IMAP and you would't
have to change anything.  This was always part of the original MH design
and it still works pretty well today.

But the actual message handling is, in the words of Paul Vixie, "a major
letdown".  Every caller basically has to parse the entire messages
theirself; m_getfld() does some of the work in breaking down headers
and their contents for you, but it still pretty much sucks.  If you
have any context in the message body, you have to handle that on your
own.  This leads to the poor MIME handling, because only a few programs
call the MIME parser, leading to the whole "bolt on" MIME handling.

So my proposal is to instead get rid of almost all callers of
m_getfld(), and instead relplace them with a new function, for which I
haven't decided on a name yet.

This new function would return a Content struct (see h/mhparse.h)
but we'd have to change it a bit.  Right now a call to the MIME parsing
routines end up slurping in the whole message, but that's not desirable
for a lot of programs (scan, pick).  It seems like parsing all of the
messages headers is generally worthwhile; that (usually) fits within
a single stdio buffer, so doing extra work there shouldn't be a huge
problem.

The Content struct would be extended to indicate whether or not the
complete message had been parsed; programs that just needed to examine
the header would simply parse out those headers in the message.
Because address parsing is common, we could parse out all of the addresses
as well during header reading.  We could also maintain a list of
headers that contain addresses (right now each program has to
keep that list locally) and make a function/macro to query that.

We could make MIME part iterators (or simply call the selector module I
outlined in an earlier email) and when the iterator reached a MIME part
that hadn't been parsed yet it would then parse the rest of the message.

So in other words, the only way to access message contents would be
through a Content struct.  This would either enable or force (depending
on your point of view, I suppose) all of the utilities to handle MIME.
Obviously the technical details would have to be worked out, but this
would dovetail nicely into the MIME multiplexor model I talked about
earlier (I'm borrowing some termology originally suggested by Mike
O'Dell).

Comments/thoughts are welcome.

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>