Meta data (and codeset announcement is meta data) belongs in the inode
on Unix systems. Both the Mac and OS/2 treat such information
as meta data and put it in the file system, not in the file itself.
...
Walt,
While I agree with the principles you express, I've got a lot of
problems with some of the inferences and implications and even some of
the analogies.
First of all, as you should know, "U**X", "Unix-imitators", and
"Unix-wannabees" have not quite become synonyms for "operating system"
yet or even for "TCP/IP" or "SMTP mail". To a considerable extent,
whether the U**X models are right or wrong, OS/2 (and even recent MSDOS)
and a number of aspects of the Mac file are similar because they are
attempted imitations, not because they represent independent reasoning.
If you start down the metadata path in a serious way, you end up
needing several layers of "envelopes" that begin to bear a resemblance
to several versions of the "internal", "external", "conceptual" schema
models of the database folks. Knowing that a file is a text stream
might be enough to prevent destroying it with inappropriate tools;
knowing that it is "binary" certainly is not.
If you want to have a look at an extremely tedious exploration of
those issues from a database --and explicit metadata-- standpoint, hunt
up my paper in Rafanelli et al, Statistical and Scientific Database
Management, Springer Lecture Notes in Computer Science No. 339.
One clearly should layer these things somehow, but whether the
layering occurs in-band wrt the file or "in the file system" is often
more a matter of how one layers the applications that will access them
than anything else. It also makes a good foundation for religious wars.
But, while I've got my prejudices too, there are really no extremely
strong arguments for one model over another that don't come down to
aesthetics. That is, at least as long as the applications--the file
manipulators and readers and transformers--are written to be consistent
with whatever the chosen model is.
As for the SMTP mail headers, they are another example of wrong headed
thinking. They try to be all things to all people. The fundemental
problem is that they confuse the envelope with the letter. If we look
...
No. They are just very old, as these things go, and we have learned a
lot about layering since. That said, the RFC821 envelope (not the
headers) really contains only the information necessary to arrange
delivery: the sender address, the list of recipients, and some bits
needed for handshaking. If you like, you can think of the headers as an
inner envelope, but they are clearly not the message envelope.
The envelope of
computer mail does not have to be human readable at all. It should be
compact and easily interpreted by the routers.
And the X.400 one isn't human readable. There are, however,
advantages in human readable envelopes, and SMTP happened to make that
choice.
Similarly the
"Received" lines should be on the envelope just like the post mark
cancelation, not in the letter.
The "post office cancellation" does not correspond to the trace
fields. Except on registered mail, the post office does not provide
trace information. But, yes, these things, ideally, probably should be
in the envelope. They got put at the beginning of the "inner envelope"
--the mail header-- instead, at least partially to preserve the
objective you listed of having the envelope as uncluttered as possible.
I'm not completely happy with the choice, but why is it so serious?
I don't think anyone who has thought about the Internet mail
environment would claim that the layering is perfect. It does, however,
work and many of us would rather see the mail go through than worry
endlessly about purity.
Incidentally, BITNET (aka RSCS) gets this separation right.
The envelope is the CP TAG information and is entirely separate
from the contents of the letter.
Interesting that you should say this. What BITNET discovered, long
ago, was that 8 character, upper-case-only, user names and host names
really didn't cut it--even within BITNET, much less communicating with
interconnected systems and networks. So they invented an inner
envelope, called BSMTP, that is transported along with the message body.
That inner envelope bears a strong resemblance to the SMTP envelope,
incidentally, and has MAIL FROM and RCPT TO commands in it. The usual
"CP TAG information" turns out to be equivalent to MAILER at HOST1
sending to MAILER at HOST2; in other words, it identifies the MTAs, not
the sender and receiver.
Curiously, this meant that, until your company decided to begin to
speak Internet and BITNET mail and to open things up a bit, mail could
be sent through certain VNET gateways only by using escapes from from
the normal BITNET mechanisms--large and complex tables that remember
which sites "speak BITNET" and which ones had to be fed raw
RSCS-over-NJE.
And as part of that same pattern, use of BSMTP is *optional*. If it
is omitted, the poor receiving MTA (whose own name is the only thing in
the tag, remember) has to go decode the 822 headers to try to figure out
which poor sot the mail is destined for and, if necessary, where to send
rejection notices. These are activities that no self-respecting
Internet MTA will ever need to contemplate.
So much so that it has always
been 8 bit clean because mail is just a file and sent by the
same mechanism that sends binary files.
Sorry. BITNET is "eight bit clean" because mail sits directly on top
of NJE, rather than having an intermediate mail transport protocol. If
the Internet didn't have SMTP, but handled mail by opening TCP circuits
and just sending the stuff somehow, it would be "eight bit clean" too.
And the mail machinery that sits on top of NJE and RSCS--the precise
stuff that compensates for the fact that the envelopes aren't adequate--
is written for an EBCDIC environment, and that *better* be able to
handle 8 bit data. With regard to the "same mechanism", yes, of course.
But that is, in the first instance, because NJE is several OSI layers
lower than SMTP. And it is interesting to note--especially as we try to
redesign Internet mail to make it suitable for transferring files--that
the one real use the BITNET mail system makes of the NJE TAG information
is to use a separate message class for mail so that mail messages can be
distinguished from all of that "same mechanism" stuff.
I think what this all proves, if anything, is that "pure layering" is
a very nice conceptual model. It really does help us think about things
in an intelligent way. But the number of implementations out there that
are really pure and clean is really very small. And the price of true
purity may be a bit too high, even though I think that most of us, if
given the opportunity to do a mail system from scratch, would do it
somewhat differently than 821/822 were done.
--john