ietf-822
[Top] [All Lists]

Re: Character-set header (was Re: Minutes of the Atlanta 822ext meeting)

1991-09-11 07:22:27
Bill Janssen writes:

Nice attempt!  I like what Greg has written (especially putting
PostScript under image!).  A couple of things:

I like it too, and I have no major problem with putting PostScript under image
or application or whatever (I've said this before).

TEXT PLUS: Requires a text oriented display only. 
    (I understand this to include LaTeX, .nroff, and richtext,

We should be careful here.  I agree about an nroff-restricted set of
troff, and richtext, because both explicitly restrict themselves to
text.  I'm less sure about LaTeX.  Formats like SGML can contain almost
arbitrary data, encoded as text, so I'd (continue to :-) argue that they
can't fall under this category.

I use TeX, LaTeX, and SGML on a daily basis. I maintain hundreds of thousands
of lines of documentation written in all of these markup languages. I NEVER
deal with any of them using anything other than a plain text editor. Indeed, I
am unaware of any mechanism to deal with revising them in any other way. I
occasionally print them out, and to do that I sometimes even use TeX, LaTeX, or
an SGML processor. But I often print out the input document without formatting
it since it is more useful from an editing standpoint.

For example, I've been working on one very large SGML document in particular
recently, mailing it around, diddling and piddling in it, and I don't recall an
instance of presenting it to my SGML processor (which is only installed in one
place, not the one where I work) in over two months. If my mailer supported a
content-type-based "revise this" option (it is close to doing so) I'd select
the same treatment for these subtypes that TEXT would get. I certainly would
not treat them as images. If I want images, I'll convert them to images and
mail the images!

Brian Reid (in his What You See Is What You Deserve talk) says this a lot
better than I do. The point of markup languages is that you can deal with them
effectively without regarding them as images, or application material, or
whatever. They are text. Repeat after me: TEXT. I don't know or care about the
layout details until I enter final production on the document. (I usually never
do this -- it is up to someone else to do final production work.) In the case
of SGML is definitely is not my concern -- I cannot even control how things are
going to be laid out in the SGML I use; formatting is totally external. And
when I use any of these one of the possible output forms is line printer
output. I don't need image production facilities. In fact, all I need is a
simple filter to strip the markup commands. And yes indeed, I have just such a
filter for each of these subtypes, as well as an option to produce lineprinter
output in the formatter itself.

Now, you claim that you can encode arbitrary data in these formats. Well, it is
not easy to do in any of variants I use, that's for sure. All of these formats
provide image inclusion facilities, but inline encoding of images is not a
useful thing to do. Sure, I suppose you can do it just like you can stuff a
piece of lineprinter "art" into a regular text mail message, but this is a
nonstandard, wasteful, and generally senseless use of TeX, LaTeX, or SGML. The
image inclusion facilities of these languages are very powerful mostly because
they are external, and use external formats that are good at specifying images!
Inlining such material does great disservice to the model and there is no
reason we should classify such subtypes based on such usage.

This is a show-stopper for me. If TeX, LaTeX, and SGML are moved somewhere else
while NROFF remains, I vote we get rid of text-plus completely since it is
clearly not useful, and its definition is obviously capricious and arbitrary.

Now, I want to revisit PostScript one last time, and then I'm going to give up
on it. Adobe claims that PostScript is evolving into a general markup language,
designed to compete head-to-head with TeX and SGML. I didn't say this, Adobe
did. I don't find this to be completely true at present ;-)

It is certainly possible to write PostScript where only the header is a complex
mess of commands and the rest of the document is plain ordinary text, and in
fact many documents are actually prepared in this way. (Macintosh documents are
the most amusing variant on this theme; there is a header that sort of does
this, but it defines something that's in between PostScript, plain text, and
QuickDraw. In addition, this internal format changes from version to version;
system 7.0 has now moved to a really minimal header and use of much more
PostScript stuff in the body.)

It is also true that you can represent binary stream data directly in
PostScript (it can even be LZW or JPEG compressed -- LZ and JPEG are mandatory
parts of Level II PostScript), and this sort of thing is not what I'd call
revisable or even editable. (How Adobe is going to fit this into being a markup
language is quite beyond me.) There's also a pre-tokenized binary form of
PostScript (this is a Level II gizmo).  You cannot do this sort of stuff with
TeX or SGML. Thus, the grouping of PostScript under text-plus is problematic at
best. But it is where Adobe "positions" it. We can move it to image or
application if it makes people feel better about it.

Similarly for ATK format, and other
mixed-media formats.  Yes, they are represented with text, and
knowledgeable wizards may be able to make useful changes using only a
text editor, but there is no guarantee that the text is not just an
encoding of an audio track or bitmap or whatever.

I don't know what ATK format is, so I have no comment on this.

Perhaps restricted versions of those formats could be defined (similar
to the nroff restriction on troff), and these restricted versions could
be regarded as 'text-plus' formats.

I don't see how you could possibly define such a thing. I don't see any
reason to define such a thing.

A general comment:

Future multi-media mailers are increasingly going to send messages that
are mixed in type, supported by formats that are either designed
explicitly or extended to support the mixed media.

We'll see. I'm an advocate of markup languages on odd days of the month and an
advocate of multimedia formats on even days... I use both, like both, and don't
expect to see either one dominate. But I do agree that we'll see more use of
multimedia as time goes on.

It would be nice if
these were all sent as MULTIPART, but they won't be:  there will be one
body (one document) of a particular format, with various kinds of
"stuff" mixed into it.

As a matter of fact, I had this fight with a bunch of X.400 types about five
years back. I pointed out that the (then current) X.400-1984 standard was
seriously deficient in its ability to deal with this sort of stuff, that the
various defined sorts of bodyparts were all "stuff-homogeneous", that the
future was going to be in the area of "stuff-heterogeneity" (you call it
multimedia), and that as long as the X.400 people wanted to deal with type
conversions, they really ought to get serious about conversions between
collections of homogeneous and heterogeneous entities, since this is where all
the interesting problems are. (If you want a laugh, note that there's a section
reserved "for further study" in X.402(?) for how to convert video to audio.
Maybe some software can describe the painting to me... What a concept.)

The response I got was the tag line to an old joke: "I don't understand;
where's the problem?". In other words, the X.400 folks I was talking with did
not see this issue at all. If people start using ODA, that's cool, we'll define
an ODA bodypart. But nobody seemed to see the close similarity between
multipart mail and heterogeneous formats, and the need to be able to convert
between them. And this is sure reflected in the lack of consideration this
received in X.400-1988. But if you think for about half a second about how a
gateway to, say, a PostScript printer is going to deal with multipart mail, you
might begin to see things a bit differently.

The problem is that the Internet is far behind X.400 in infrastruture. We don't
even have the mechanism in place (multipart mail) so we can start studing this
problem. Once we have the basics in place (and we're not going off and defining
conversions until we know what in hell we're doing, unlike X.400) we can look
at these issues a little more closely and see what we want to do about them.
But let's get the infrastructure in place first -- then we can study this
problem and deal with it. I don't think we can deal with it until we
have an operational infrastructure that makes experimentation possible.

If we really want the type field to support the
kinds of information that Greg outlines, there will have to be some way
to provide a list of types, e.g.

In any case, these are application formats, in my opinion, and belong under
application. A key component of a multimedia format is the need for
applications to deal with them.

                                Ned

<Prev in Thread] Current Thread [Next in Thread>