ietf-822
[Top] [All Lists]

Re: Comments on Draft RFC

1991-04-24 18:55:54
Date: Wed, 24 Apr 1991 13:55:52 -0400 (EDT)
From: Nathaniel Borenstein <nsb(_at_)thumper(_dot_)bellcore(_dot_)com>
Subject: Re: Comments on Draft RFC

Excerpts from mail: 24-Apr-91 Re: Comments on Draft RFC Einar
Stefferud(_at_)ics(_dot_)uci(_dot_) (1192)

So, do we also have "TeX-iso-10646" and "SCRIBE'iso-10646" ad nauseum?
Where do all these get registered, and who decides what they mean?
Does this group have to stay in sesion forever to decide these things?

As per my previous message, I think that the established registration
procedures will suffice.  I really doubt that there are going to be a
lot of these types, anyway.

Nathaniel, I strongly disagree with your proposal.  It is wrong from
both the user agent perspective, and country gateways.  Of course,
"right" and "wrong" assume some underlying model of how these things
will be used.  I'll explain how I intend to use this info in our
UA's and gateways, and why your solution is not suitable for us.

Think about who needs to know about the character set, who needs
to know about the "type" of the attachment, and what relationship
there is between the two.

In X.400 land (and at selected gateways in SMTP land) there are
gateway functions that attempt to translate from one character set
to another.  Some of these translations must be done (EBCDIC->ASCII,
ISO10646->LATIN1, etc) for anything to be done with the information
on the other side of the gateway.  These gateways clearly *only care
about the character set*, and couldn't care less about the type
of the attachment.

In your proposal, how should these gateways recognize what character
set a body part is in?  These gateways either need to understand
about character set postfixes ("*-iso-10646" means that the doc is
in 10646 char set?) or the gateways need to know that "tex-iso-10646"
maps to the iso-10646 character set via some table lookup.  The second
case is clearly unworkable -- the gateway needs to know about *every
type in the universe* if it wants to do character set conversion;
the former proposal moves the complexity to parsing the type
name space instead of a separate header field; it also means that
type names are not an atomic object; instead they need to be parsed
before they are interpreted.

The user agent has all the problems of the gateway (assuming it
will attempt to translate character sets) and then it has the
UA specific problems.  My model is that the UA will use the type
field to try and figure out what editor (viewer?) to use to
display the body part to the user.  But the type field is no
longer a simple object in your proposal.  Instead it is now
a compound object that must be parsed.

In case it is not clear yet, I believe that any conceivable benefits
to having one fewer header fields are overwhelmed by the cost of
having a "complex" type field that needs to be parsed.

Also, you keep on asserting that there won't be very many of these.
I disagree with that part too.  There are currently three different
multi-charset "standards" available (iso10646, iso2022, unicode), as
well as a plethora of national 8 bit char sets.  Trying to build
compound type names puts us in the position of needing N x M type
names, where N is the actual number of types and M is the number
of character sets.  This is a "bad thing" to inflict on people.


Do we need to pre-define every possible combination that people will use?

Not if we define things as "building blocks".  Thus, for example, we
don't need to define anything special about compression headers if we
have a compressed-message content-type, because all the other mechanisms
can then work on it recursively.  Similarly for encrypted messages.  It
seems to me that this is a simple and elegant solution that avoids
making the headers even more complex than they already are..

OK.  What we haven't asked is "what should be the philosophy of
what we should put into separate header fields?"  My opinion is
that we partition problems based on who will need to interpret
the information, and how hard it will be for UAs and gateways to
interpret this information.

By my model, Content-encoding is clearly proper to put in a header
field.  It has a clearly definined purpose -- to render a body
part capable of being passed through a particular transport.  Implicit
in the design is the ability to make an SMTP++ gateway that can
easily translate an RFC-XXXX message down to something that can be
passed through old-style SMTP.

Charset also passes my muster: it identifies the mapping between the
binary values in the body part and conceptual glyphs that one might
paint on the screen.

But where does "compression" fit in?  An encapsulated message
doesn't seem to hack it, because one probably wants the ability
to compress different body parts in different ways (an image
compression algorithm will be different from a good text compresser).
I think of "compression" in general as a filter function: a black
box to the email UA that you pass the body part through before
using it.  By that model the "filter" doesn't have to be something
that compresses; it could be something that decrypts instead.
Is this general enough to pass muster for its own header field?
It does to me, but I feel less strongly about this one.

        Neil Katin

<Prev in Thread] Current Thread [Next in Thread>