[Top] [All Lists]

Re: Massive Content-Type definition ideas & Gopher

1993-06-07 15:05:57
On Mon, 07 Jun 1993 14:08:01 EDT, Keith Moore said:
Seems like all of the existing the audio/image/video content-types already
have compression built-in.  So I'm only talking about "generic" compression,
and probably only on top of text/* or application/* types.

Hmm.. Can somebody verify that image/gif, image/jpeg, audio/whatever,
and video/whatever all already have either compression defined and
specified inside the datastream, or type-specific parameters?

However, I *do* agree with the general sentiment that structured
objects will probably have their own type-specific compression.
This leaves us with 3 basic categories to worry about:

text/* - the problem here is that "good compression" may be
language-specific - the glyphs used in US English most probably
will compress well using a different algorithm than something
written using the 16-bit 10646 Chinese code points (in particular,
the letter set 'etaionshrdlu' averages about 70% of all the characters
used, I'm positive that 12 Chinese glyphs don't make up 70% of the
average Chinese text).

application/octet-stream - Since you have *no* idea what the octet
stream is, your best bet is probably to either punt on compression
or use something that assumes all 256 octet values are equally
likely.  We cant assume anything about the data stream, as oppsed to:

application/wombat - We will in the future have wombats such as
'spreadsheet', 'hypertext', 'EDI', 'ASN1', and so forth, which are
aggregate objects composed of sub-items in some proprietary wrapping,
where although each sub-item would be compressible if it were exposed
and accessible, inside the data stream it's not accessible per se.
Would we have to define a seperate compressor for each, or wing it with
a generic one?  A compression scheme could win big if it "knew" that
the data was "mixed text and audio blocks, seperated thusly"...
(Of course, the OSI bigots among us are probably jumping up and down
yelling "level violation" ;)

                                Valdis Kletnieks
                                Computer Systems Engineer
                                Virginia Tech