ietf-822
[Top] [All Lists]

Re: audio compression

1993-01-21 16:25:37
Erik M. van der Poel writes:

I received an audio/basic message once (got to hear Marshall's voice
for the first time! :-), but I was just disgusted about the size
required for the trivial amount of sound (short recording).

Keith Moore replies:

(Actually, since audio/basic uses u-law encoding, it is already somewhat
compressed...a 14-bit sample takes up only 8-bits, though with a loss in
resolution.)

Audio compression has special requirements.  You have to use a form of
"lossy" compression since there's noise in the data that frustrates
schemes looking for redundancy.  U-LAW is indeed already quite a step
forward from simple linear encoding (the dynamic range is much better
than that of the 8-bit encodings used by PCs and Macs for low-quality
audio).

However, a better scheme exists that has been used for a while now in
several applications (e.g.  INRIA's videoconf tool IVS, and also in a
"radio" broadcast tool used locally).  Free code for this is available
by anon. ftp from ftp.cwi.nl.  This so-called ADPCM scheme uses only 4
bits/sample, i.e. a compression factor of 50% over U-LAW, while the
quality and dynamic range are comparable.  Moreover, it is *fast* (I
believe a Sparc 1+ can compress and decompress 500,000 samples/sec.).

I would certainly recommend using this encoding as a standard cheap
and effective audio compression scheme.  Given that it is not a
general data compression scheme, it should probably be considered a
new audio subtype (e.g. audio/ADPCM32, where the 32 means 32
kbit/sec).  To avoid double quantization, systems supporting 16-bit
linear audio need to convert ADPCM data directly to their native
format, not via the U-LAW format of audio/basic, so ADPCM can't be
considered an alternative encoding of audio/basic.

I'm not sure about the legal issues, but I know that this code was
written from scratch after a published example.

There are schemes that may be more standard or give more compression
for speech, but they are immensely expensive to implement (CCITT G.721
and G.723, ADPCM at 32 and 24 kbits/sec) and/or make the speech sound
"mechanic" (US fed std 1016: "CELP", 4800 bits/sec, and 1015: LPC-10E,
2400 bits/s).  (I am not an expert on these.)

--Guido van Rossum, CWI, Amsterdam 
<Guido(_dot_)van(_dot_)Rossum(_at_)cwi(_dot_)nl>

<Prev in Thread] Current Thread [Next in Thread>