[Top] [All Lists]

Re: Idle curiosity -- Time to implement MIME

1993-09-07 16:22:18
Idle curiosity mind you ...

Those of you who have implemented MIME from scratch please try and answer
the following question:

How much effort (man months?) did it take to implement a MIME parser?

Just the parsing code, or the whole viewing she-bang?  I whipped up a
C++ parser for MIME messages in a couple of weeks in my spare time.
It builds a data structure, and does some mailcap-like lookups to find
programs to pass body parts off to if they aren't understood internally.
The viewing pieces (for Microsoft Windows) took a little longer, and
are still evolving.

There must be no policy decisions in the parser on the order of presenting
the body parts; that is decided by other code.

My code does have a few policy decisions: multipart messages are assumed
to be presented in order.  Multipart/alternative messages have all alternatives
loaded into memory, and then I use some heuristics to determine which one
to display, but leaving open the option for other alternatives to be displayed
later.  But, if you were perverse enough, you could traverse the data
structure in any way you wanted.

Don't include RichText implementation ..

I haven't got that far yet: working on it now.

C++ is ideally suited to parsing MIME messages if you choose the right
model.  Mine uses a kind of "filter stream": bytes are extracted from
the raw message with one filter, quoted-printable and base64 are stripped
by the next one, the message is split at boundary lines for multipart
body parts in the next, and the final filter handles the particular kind
of body part (text/plain, text/enriched, image/gif, etc).  Just add a
new subclass, and presto, a new filter!  By glueing together filters,
you can do almost any kind of parsing.  e.g. to parse an base64'ed image/gif
inside a multipart/mixed, inside a multipart/alternative, you'd link
together the following filters:

        gif -> base64 -> multipart -> multipart -> read

Each filter calls on the next to get more bytes, and as far as it is concerned
it is just calling a normal system call like "read".  Filters are gradually
added and removed from the list as you move down through the message.