This is getting *very* UNIX-specific, and very email-irrelevant. Can we
please drop "ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu" off of the
distribution unless
someone wants to draw it quickly back to email issues that can be
discussed across system platforms.
I found Donn's note interesting, partially because some of it is
historically incorrect.
***begin small tirade***
Various interesting rewritings of history notwithstanding, once upon a
time there was a Multics project. Regardless of its other strengths and
weaknesses, Multics was probably the most intensivedly "designed" and
critically debated and reviewed general-purpose operating system in the
history of the computer field. UNIX started out, after Bell Labs left
the Multics project, as an attempt to build something Multics-like with
much more modest goals for machine sizes (and costs) and service base.
To understand the fundamental early design decisions behind UNIX, you
really have to see them against a Multics backdrop: some things
accepting Multics ideas uncritically (with the qualification that the
UNIX principals and their management had been involved in those
decisions in the first place) and other things done in other ways
because of "learning" from Multics or reacting to Multics decisions that
people disagreed with (either in retrospect or because they got outvoted
the first time).
The UNIX file naming model was largely inherited. Someone would have to
go ask the authors, but I'd suspect that not a lot of thought was given
to it at UNIX-time.
Now one of the important doctrines about Multics, from 64 until after
the Bell Labs departure, was an assumption that it was a design for a
utility and that "users" would never see the base operating system
primitives or their manifestations; they would work in applications
environments of one flavor or another. And those applications
environments would be built separately, by separate groups, and would
not require "installation" into the system in the way that characterize
several of the systems that Donn mentions. If that is your model, you
need a very flexible naming structure so that applications can layer
arrangements on top of it that suit *their* purposes. You also need to
keep all sorts of assumptions about how applications will be structured
completely out of the operating system because they will end up
constraining either applications yet-to-be-thought-of or provide
channels for security problems.
In OSI language, it means you want extremely clean layering, with
absolutely no assumptions in the lower layers about what the upper
layers are going to look like or do. Multics' layering in this regard
is much cleaner than that of UNIX--the clean layering was a victim of
efficiency constraints and different assumptions and goals.
***end small tirade***
A long, long, time ago, I was principal architect of an applications
system that sat on top of Multics. It used what is being called
"data announcement" this week, and used it down to the level of highly
self-describing files, class operators that could figure out what to do
based on the file descriptions, and similar things. The operating
system layers didn't know about that system at all, which meant that it
had to do its own name space management, maintaining downward-looking
windows for accessing "normal" (undescribed) files when necessary.
Although I agree that data announcement is becoming necessary, putting
it in the file is the wrong answer, because that requires that all programs
be modified* to reflect that, and that some programs dealing with binary
data as "images" will have a fair amount of extra trouble dealing with it.
(It's very nice to have data aligned at zero in a file.)
Donn, I think this is backwards. You have to bind the file
description to the file somehow, how you do it should be buried
in a sufficiently deep layer that you don't care anymore whether "bound"
is expressed by "in the same file at the beginning", "in the same file
at the end", or "somewhere else". If programs that don't know about
self-describing files start working on them, they are, in general, going
to get very confused and/or going to screw things up, since, for some
purposes, you end up with stuff in the description that is real
sensitive to the data-content of the file. If you have programs, or an
operating system, that think "zero" (or the first byte of a file) is
special, and aren't able to virtualize that information by doing a bit
of pointer-offset calculation, then you have an environment in which it
is harder to do these sorts of things: your implementation choices are
either somewhat constrained, or programming-life is going to be harder.
But Erik's hypothesised fancy GUIs are really not going to care about
old, pre-description, files. They can't do anything much with them
besides display icons with question marks (or pictures of turkeys) on
them.
Additionally, simply extending the inode implies something approaching
omniscience on the part of the standardizers; what attributes are needed
and what are "temporary" needs, what are the future needs, and how much
room is "right" for expansion are all rather nasty problems to solve.
Well, there is another way to do it, and that is to dump the inode as
you know it (inadequate model for this sort of thing, as you imply) and
replace it by a tuple that identifies the type of description (the
secret for pulling "what attributes are needed..." out of the operating
system) and the location of the description (the primitive for the
file-content/file-description binding mentioned above). This could be a
file name (if one was very careful about who got hold of "rm" and what
they did with it and had a higher-level "delete" abstraction). It could
also be, e.g., an offset to where the description sat at the *top* of
the physical file if you liked doing "seeks" or the equivalent and
wanted the content at zero. Lots of possibilities, including having a
description type code for "old inode" for when that is necessary.
The fact that UNIX doesn't have typed files in the sense of this discussion
is NOT an accident, and is NOT the consequence of "sloppy design", but rather
quite intentional and thought out.
Yes. In Multics.
(Too strongly) typed files are more trouble than they're worth!
More to the point, operating systems should not try to construe type
information into file names. THAT leads to bad problems. Applications
may be in a different situation.
To me, what makes sense is to develop a convention where if there is typing
information on a file, it is somehow associated with the file by name
and location, in another ordinary file.
Most of these conventions get you into trouble eventually, for the
same reasons that file name => type conventions do. The havoc that can
be created by people running around removing things they don't
understand is the tip of the iceberg, but illustrative.
Yes, I realize that this creates a bookkeeping mess (and thus is a
"research problem")
Nothing serious, really.
but it doesn't
violate the design of UNIX (which is "kiss")
I'd say it is "operating system keeps its hands off" and I think one
could debate how well UNIX succeeds in that. As I suggest above, there
were compromises...
and it doesn't change the interfaces.
Probably does if it is useful, if you are talking about interfaces to
the user. At the file system primitive level, of course not. These are
applications problems.
(Example: for each file <foo.bar> if there exists a
file .,<foo.bar> (leading . to make it invisible, "," to keep it pretty
unique) then that file contains typing information.
Might work. Think about other models, including having a description
directory s.t., the description for "foo.bar" is always in
./descr/foo.bar (or ./.,descr/foo.bar if your UNIX implementation will
permit that). Makes it easier to write "don't put anything there, don't
remove anything from there" rules. You still need a model to prevent
removing content without removing description.
By using a text file
with "field name: value" type of format, it's infinitely extendable without
ever breaking existing programs (if they're written from the beginning to
ignore fields they don't understand).
Our experience is that you need one additional layer of abstraction --
the description type information -- and, after that, you can use models
that are both more simple and more efficient.
--john