ietf-822
[Top] [All Lists]

Re: ps;1 or ps -- problems with the postscript application type

1992-03-30 06:22:39
   While I agree with your conclusion that Postscript should say and
with most of the reasoning that leads up to it, perhaps there is the
germ of a good idea or two hiding beneath the "it can't be precisely
defined, take it out" reasoning.

   If I'm a user, and I go to send a Postscript file to someone else,
I'm likely to have a very good idea whether I've tried to construct
things using an interchangeable Postscript or whether I've used enough
weird stuff to require "prior agreement" with the recipient.

I disagree. Most users simply use applications that happen to produce
PostScript for them. The number of users that have experimental PostScript
needs is quite small.

PostScript generating applications, however, weigh in at every possible point
in the spectrum. Most do a nice job of it -- certainly the ones that are used
in a wide variety of environments do. Others commit all sorts of sins in the
byte-level structure of their output, relying on the essential stream nature of
the language and the vagarities of the local operating system to handle stream
data. These applications then lose badly when files are moved to systems with a
different (or no) notion of stream data, or when the utilities used to move the
files don't understand stream data (most don't), or whatever.

I think some additional information and characterization of real-world
PostScript problems is appropriate at this point. I support a variety of
PostScript applications and products and have done so for several years, so I
know something about this (more than I'd like to, actually).

Of the PostScript problems I see from day to day, and I see at least one or two
every week and sometimes more, 95% are caused by inexact duplication of
information. People persist in using an old version of Kermit that's broken, or
don't set file characteristics properly, or don't set their FTP options
properly, or whatever. They send files through BITNET with long lines and they
get wrapped promiscuously throughout the document. There are a million
variations on this theme, and they all come up at one time or another.

The magnitude of this problem is so overwhelming that there's little point in
even considering other problems before we get this one under control. And MIME
is quite capable of solving this problem for e-mail. It won't solve the problem
overall because often as not the data is corrupted before it is mailed. I don't
think there is any solution to this problem that's effective -- it just has to
be dealt with one case at a time, and solved one case at a time. But until we
can deal with the ramifications it present in e-mail, I think that we'll spend
a lot of time examining PostScript problems that are simply this one in
disguise.

Next come the problems that I classify as environmental dependencies. By far
the most common is to assume the use of a particular printer, usually the Apple
LaserWriter. There are structures available in a LaserWriter that are not part
of the PostScript standard. In particular, there's this thing called the
statusdict that's used to maintain device-specific information. Far too many
PostScript-generating applications assume the presence of information in this
dictionary that's specific to the LaserWriter. (Some even go so far as to test
the device name and barf it is does not coincide with what they expect. Bleah.)
This is such a problem that most implementors of PostScript interpreters have
been forced to provide LaserWriter-compatible statusdict entries.

Things like the use of color operators are not, properly speaking,
environmental dependencies. Most color operators are supposed to be fully
implemented on all devices regardless of whether or not they support color.
True, the output will look like garbage often as not, but there will be output,
and it will match in gray what was supposed to come out in color.

Usage of fonts that aren't generally available is another interesting problem.
Although there's nothing in the specifications that says you have to have any
particular set of fonts in your PostScript interpreter, the industry has moved
to a position where if you don't have the basic 28 LaserWriter fonts you're not
going to be able to handle much PostScript. And as a matter of fact the basic
Courier font, probably the most commonly used font of all, was placed in the
public domain as part of X11 R5. No PostScript implementation has any excuse
any more for not having this one!

One of the paradoxes of PostScript is that it is often easier to update the
viewer or printer than it is to fix the generating application. Often as not
the generating application is some turnkey thing on a PC or Macintosh and
there's nothing that can be done to change it. Adding additional fonts to a
viewer or printer, on the other hand, is not too difficult in most cases (files
for the viewer, cartridges for the printer), and it is also possible to hack in
stuff so that unknown fonts are translated into some known font so that the
output can be viewed in some format. (This can even be done on printers  with
nothing but ROMs in them.) It may not be pretty but it works. This is an
especially popular approach in viewers, and I for one would not consider using
a viewer that did not support this.

Finally, there are the things that are just plain illegal but which happen
anyway. My personal favorite is sticking ASCII 04's (control d) into the data
stream. Control d is used as an end-of-job indicator on LaserWriters and many
other printers that connect using a serial line. However, this is not part of
the PostScript language. It is part of the transmission protocol for sending
PostScript over a serial line (there are other aspects of this protocol I won't
bother to explain here). Putting it into a PostScript file is the same as
putting SMTP commands into a mail message in a visible manner in the UA. Yet
many applications actually do this. This problem is so common most viewers have
chosen to define control d as a command -- a no-operation, to be specific.
GhostScript recently added a 0404 operator, since some application now seems to
think it needs to put in more than one end-of-job character.

These then are the sorts of problems that really arise when you try to exchange
PostScript and have it interoperate. The most common one, formatting garfs, is
solvable only on a case by case basis. The others have been solved for the most
part by adapting PostScript viewers and printers to cope with the vagarities of
PostScript in the real world. None of them as far as I can tell would have been
simplified by parameterizing things. If an "experimental" parameter has a good
use I don't think this is it.

Document structuring conventions are not generally useful either, at least not
as an automatic way to detect incompatibilities. (This is changing as
applications emerge that are more intelligent in these areas.) Say you get some
PostScript and it won't print or view. It is only at this point that you'll
look at the document structuring conventions. Most of the time these are not
useful in practice, since by far the most common problems are due to just plain
illegal stuff. But in the occasional cases where there's usage of some wierd
feature, the comments sometimes indicate this.

Note, however, that document structuring conventions are quite useful for
things like paging previewers. They are what let you find the page boundaries,
among other things.

The document structuring conventions are also extremely useful in one special
case. Often as not, when I lay my hands on a chunk of PostScript I want to use,
it has been encapsulated/extracted/output/translated/whatever from one
application to another. Sometimes this is done dozens of times. The result is a
very large file that's mostly boilerplate. We often work with logos and such
things, and it is rare when we cannot reduce the PostScript for such a thing to
1/10 of its original size. This would be flatly impossible to do without the
document structuring conventions.

Now, it
may not have gotten it right, but I'm likely to know if I've made the
effort.  It seems to me that might be a useful parameter-distinction to
make, so that, e.g., a mail receiver/reader could be able to make a
better guess as to where it could display or whether the thing had to be
routed to something special (this has some of the same tone as the
text/file issue, doesn't it?).

The commenting conventions for PostScript are intended to do precisely this --
tell you what features are used so you know what equipment to use to render the
output.

This is a very elaborate system that serves to identify quite precisely the
special needs of any given document. Among other things, it provides facilities
for listing the fonts needed, the specific features of the interpreter the
document requires, any include files the document wants, and what use of color
is present.

This convention was initially proposed with the very first version of
PostScript. It has continued to evolve as PostScript has evolved. It has
version numbering and other information associated with it so that it is
possible to tell whether the omission of a particular piece of information is
meaningful or not.

This is an incredibly elaborate system that has all the features it needs.
(Some would say it is too elaborate, but then the problem is what would you
eliminate?)

Duplicating it in the quite restricted environment of a header is not only
difficult, it is wasteful. And frankly, it reeks of arrogance. I don't
seriously think we can do a better job of this than the designers of the
language already have done.

Much of MIME, at sort of the second level, seems to be designed
around the assumption of letting the user (or originating UA) send
useful information to the receiver (or target UA).  It would seem that
this "intent of interchange" vs "big, complex, preformatted file"
distinction would be consistent with things one might want to tell a
receiving UA.

It is. But it is already taken care of by an existing mechanism that is much
more elaborate that what MIME provides. I see no reason for the duplication.

  So, would it make sense to spread a little sugar on the syntax to
permit senders to distinguish between "this is a Postscript file that is
intended to be enough self-contained that almost anyone with Postscript
capability should be able to read" and "this is a Postscript file that
you are going to need to know my conventions and architecture to read"?

The problem is that this is not a binary state. The state is quite complex. And
as I have remarked elsewhere, the particular binary states proposed  here
(level-one-p and experimental-p) are things I see as rather useless in any
case.

   Note that I'm *not* suggesting that we need to define (precisely or
otherwise) just what is in the minimal set.  I think that, to a 90%
level at least, we pretty much know.  The idea is to capture and
transmit sender intent.

If it won't print on a run-of-the-mill LaserWriter, you'll have trouble and
bellyaching, that's for sure. And this actually means you cannot do everything
that's in the books, since not all the stuff in the books is supported on a lot
of printers. For example, try defining an encoding vector for a font for, say,
8859-1 (I'm assuming you're using a level 1 implementation that does not have
the standard encoding vector for this already). If you do it in the way that's
presented in the book (first or second edition are different, but they'll give
you the same result here) you'll just get an error.

PostScript interoperability is an iterative thing. Applications are written. If
they don't work on important subsets of the hardware base they get rewritten so
they do. Hardware that cannot support what most applications are producing get
upgraded too. And eventually a common feature set comes into focus. It is not
written down anywhere, of course, but everyone who uses PostScript extensively
is aware of it.

I don't necessarily see this as something that needs to be dealt with
"now": it might be better to advance the document and get a little field
experience as to whether it is as much a problem in practice as people
think it might be.

I don't have any real objection to writing all this down. I'll even help do it.
I just don't want to do it now. We need to collect a lot more data before we
can even start. And to start doing it we need a PostScript type ASAP.

p.s.: I think your characterization that suggests that no revisable form
file or markup file is useable without external information is a bit
exaggerated.  Offline discussion on request.

Actually, I think it was overwhelmingly understated -- I was being kind. More
on this later offline.

                                Ned