Re: (i18n 97) Re: data announcement

Meta data (and codeset announcement is meta data) belongs in the inode
on Unix systems. Both the Mac and OS/2 treat such information
as meta data and put it in the file system, not in the file itself.
...

Walt,
  While I agree with the principles you express, I've got a lot of 
problems with some of the inferences and implications and even some of 
the analogies.
  First of all, as you should know, "U**X", "Unix-imitators", and 
"Unix-wannabees" have not quite become synonyms for "operating system" 
yet or even for "TCP/IP" or "SMTP mail".  To a considerable extent, 
whether the U**X models are right or wrong, OS/2 (and even recent MSDOS) 
and a number of aspects of the Mac file are similar because they are 
attempted imitations, not because they represent independent reasoning.
  If you start down the metadata path in a serious way, you end up 
needing several layers of "envelopes" that begin to bear a resemblance 
to several versions of the "internal", "external", "conceptual" schema 
models of the database folks.  Knowing that a file is a text stream 
might be enough to prevent destroying it with inappropriate tools; 
knowing that it is "binary" certainly is not.
  If you want to have a look at an extremely tedious exploration of 
those issues from a database --and explicit metadata-- standpoint, hunt 
up my paper in Rafanelli et al, Statistical and Scientific Database 
Management, Springer Lecture Notes in Computer Science No. 339.
  One clearly should layer these things somehow, but whether the 
layering occurs in-band wrt the file or "in the file system" is often 
more a matter of how one layers the applications that will access them 
than anything else.  It also makes a good foundation for religious wars. 
But, while I've got my prejudices too, there are really no extremely 
strong arguments for one model over another that don't come down to 
aesthetics.  That is, at least as long as the applications--the file 
manipulators and readers and transformers--are written to be consistent 
with whatever the chosen model is.

As for the SMTP mail headers, they are another example of wrong headed
thinking.  They try to be all things to all people.  The fundemental
problem is that they confuse the envelope with the letter.  If we look
...

  No.  They are just very old, as these things go, and we have learned a 
lot about layering since.  That said, the RFC821 envelope (not the
headers) really contains only the information necessary to arrange
delivery: the sender address, the list of recipients, and some bits
needed for handshaking.  If you like, you can think of the headers as an 
inner envelope, but they are clearly not the message envelope.

The envelope of
computer mail does not have to be human readable at all.  It should be
compact and easily interpreted by the routers.

  And the X.400 one isn't human readable.  There are, however, 
advantages in human readable envelopes, and SMTP happened to make that 
choice.

Similarly the
"Received" lines should be on the envelope just like the post mark
cancelation, not in the letter.

   The "post office cancellation" does not correspond to the trace 
fields.  Except on registered mail, the post office does not provide 
trace information.  But, yes, these things, ideally, probably should be 
in the envelope.  They got put at the beginning of the "inner envelope" 
--the mail header-- instead, at least partially to preserve the 
objective you listed of having the envelope as uncluttered as possible.  
I'm not completely happy with the choice, but why is it so serious?
   I don't think anyone who has thought about the Internet mail 
environment would claim that the layering is perfect.  It does, however, 
work and many of us would rather see the mail go through than worry 
endlessly about purity.

Incidentally, BITNET (aka RSCS) gets this separation right.
The envelope is the CP TAG information and is entirely separate
from the contents of the letter.

   Interesting that you should say this.  What BITNET discovered, long 
ago, was that 8 character, upper-case-only, user names and host names 
really didn't cut it--even within BITNET, much less communicating with 
interconnected systems and networks.  So they invented an inner
envelope, called BSMTP, that is transported along with the message body.
 That inner envelope bears a strong resemblance to the SMTP envelope,
incidentally, and has MAIL FROM and RCPT TO commands in it.  The usual
"CP TAG information" turns out to be equivalent to MAILER at HOST1
sending to MAILER at HOST2; in other words, it identifies the MTAs, not
the sender and receiver.
  Curiously, this meant that, until your company decided to begin to
speak Internet and BITNET mail and to open things up a bit, mail could
be sent through certain VNET gateways only by using escapes from from
the normal BITNET mechanisms--large and complex tables that remember
which sites "speak BITNET" and which ones had to be fed raw
RSCS-over-NJE. 
   And as part of that same pattern, use of BSMTP is *optional*.  If it 
is omitted, the poor receiving MTA (whose own name is the only thing in 
the tag, remember) has to go decode the 822 headers to try to figure out 
which poor sot the mail is destined for and, if necessary, where to send 
rejection notices.  These are activities that no self-respecting 
Internet MTA will ever need to contemplate.

So much so that it has always
been 8 bit clean because mail is just a file and sent by the
same mechanism that sends binary files.

  Sorry.  BITNET is "eight bit clean" because mail sits directly on top 
of NJE, rather than having an intermediate mail transport protocol.  If 
the Internet didn't have SMTP, but handled mail by opening TCP circuits 
and just sending the stuff somehow, it would be "eight bit clean" too.  
And the mail machinery that sits on top of NJE and RSCS--the precise 
stuff that compensates for the fact that the envelopes aren't adequate-- 
is written for an EBCDIC environment, and that *better* be able to 
handle 8 bit data.  With regard to the "same mechanism", yes, of course. 
But that is, in the first instance, because NJE is several OSI layers 
lower than SMTP.  And it is interesting to note--especially as we try to 
redesign Internet mail to make it suitable for transferring files--that 
the one real use the BITNET mail system makes of the NJE TAG information 
is to use a separate message class for mail so that mail messages can be 
distinguished from all of that "same mechanism" stuff.

   I think what this all proves, if anything, is that "pure layering" is 
a very nice conceptual model.  It really does help us think about things 
in an intelligent way.  But the number of implementations out there that 
are really pure and clean is really very small.  And the price of true 
purity may be a bit too high, even though I think that most of us, if 
given the opportunity to do a mail system from scratch, would do it 
somewhat differently than 821/822 were done.
    --john