If you didn't want to wire this into metamail, you could just use 
metamail's normal program calling mechanism to handle the 
multipart/reference type; the program called could then call 
metamail recursively to process the "start" body part. 
The problem is the multiplying utilization of the temporary disk.  Most
mail readers write a temp file to call metamail.  To the extent that
metamail can do things in pipes from there, it is a big win; in cases
like this, it is more or less unavoidable to double that disk usage by
then copying each part to a separate file.  If the multipart/reference
is handled by a separate intermediate program instead of directly by
metamail, this could effectively TRIPLE the temporary disk usage.  (The
"could" is because when things are optimal, some of this can be done
with pipelines instead of temp files.)   No big deal, assuming you clean
it up, until you start getting messages that are bigger than a third the
size of the free space on your disk. 
Maybe I'm compulsive to think about such things, but I've recently been
sending messages around that are 1 or 2 megabytes in size.  Tripling the
temp space required is definitely suboptimal..... 
I'm concerned about that too.  There are ways around this, but they're not
simple.  As long as there's a copy of the whole message on random-access
storage, the called programs can access that.  The calling mail reader could
keep track of the start position and length of each component's header and
body, and write *those* to a file, but then the called program would have to
do its own decoding.
(How many of the programs in the average mailcap file *already* require the
body part to be stuffed in its own file?)
Basically I figure that if we're going to let body parts reference one
another, we have to pay the extra overhead.  It's similar to what you pay to
get multipart/parallel.  (But is it worth the extra overhead to do things like
appledouble, file transfer, delivery reports?)
Maybe the mailcap file could have a way to specify "don't split up the files
for me; just give me the filename and offset of the first component body part
(after the start component), and the length of the body part, and I'll read it
from there".  (But it would be nice if the called programs didn't have to
parse MIME headers or understand content-transfer-encodings...)
Okay, but that makes for two levels of fallback, since you can still 
have a multipart/alternative as the "start" component.  And having a 
boolean option requires the mail reader to display all of the 
components, not just the ones that make sense.  So it seems like a 
cleaner implementation to use the multipart/alternative containing 
a multipart/external-body (or possibly multiple external body parts) to
indicate which pieces should be displayed for fallback purposes. 
Yes, my only fear is that this puts too much of the burden on the sender
to be realistic; pragmatically, the recipients are going to have to cope
with stuff that wasn't so nicely laid out, so it would be nice to at
least well-define the default semantics for the individual pieces. 
Actually, this suggests an idea:  The spec could say that if you don't
recognize the "start" part, you don't show anything.  This would
effectively FORCE composing software to use multipart/alternative if it
wants anything useful to happen for recipients who don't understand the
"ideal" semantics. 
Yes.  (I thought this was obvious, but the spec could certainly be specific
about this point.)
Keith