18-796
Multimedia Communications: Coding, Systems, and Networking
Prof. Tsuhan Chen tsuhan@ece.cmu.edu
MPEG Audio
Outline
Basics
Psychoacoustics Subband coding
MPEG-1 audio
Layer I and II Layer III Frame structure and packetization
MPEG-2 audio
Multichannel audio Backward compatible coding Non backward compatible coding
18-796/Spring 1999/Chen
Digital Audio
Telephone Speech Wideband Speech Mediumband Audio Wideband Audio
Frequency Band (Hz) 300~3400 50~7000 10~11000 10~22000
Sampling Rate (kHz) 8 16 24 48
Bits per Sample 8 8 16 16
Raw Bitrate (kbits/s) 64 128 384 768
CD: 44.1 kHz 16 bits 2 channels = 1.411 Mbits/s
18-796/Spring 1999/Chen
Psychoacoustics
Threshold in quiet
26 critical bands 0~24 kHz
Frequency masking in the same critical band
18-796/Spring 1999/Chen
Frequency Masking
SMR (Signal-to-Mask Ratio)
18-796/Spring 1999/Chen
Temporal Masking
Post-Masking: 50~200ms
Pre-Masking: 1/10 of post-masking
18-796/Spring 1999/Chen
Subband Coding
H1 (z) H2 (z) M M
Q Q Q
M M
F1 (z) F2 (z) FM(z)
Synthesis Filterbank
HM(z)
Analysis Filterbank
Maximal downsampling Q should be based on signal-to-masking ratio (SMR) Ear critical bands are not uniform, but logarithmic s
The filter bank should match the critical bands Tree-structure filter bank (to be derived on board)
18-796/Spring 1999/Chen
Subband Coding vs. DCT
M z-1 M z-1 E(z) R(z) M z M z
M Polyphase Representation
When E(z) = DCT matrix, this becomes DCT
No overlap; blocking artifact
Modified DCT (MDCT)
50% overlap; less blocking artifact
18-796/Spring 1999/Chen
MPEG-1 Audio
ISO/IEC 11172-3 (1988~1991)
First high quality audio compression standard Sampling rates: 32, 44.1, 48 kHz CD quality two-channel audio at ~256 kbits/s
CD: 44.1 kHz 16 bits 2 = 1.411 Mbits/s
Quality demonstration (MPEG-1 Layer II)
Stereo 44.1 kHz at 64 kbits/s Stereo 44.1 kHz at 128 kbits/s Stereo 44.1 kHz at 192 kbits/s Stereo 44.1 kHz at 256 kbits/s
18-796/Spring 1999/Chen
Encoder Block Diagram
PCM audio samples 32, 44.1, 48 kHz analysis filterbank encoded bitstream frame packing
quantizer and coding
psychoacoustic model
11172-3 Encoder
ancillary data
18-796/Spring 1999/Chen
Decoder Block Diagram
encoded bits tream
fra m e unpacking
reconstruction
synthesis filte rbank
PCM audio samples 32, 44.1, 48 kHz
11172-3 Decoder
ancillary data
18-796/Spring 1999/Chen
Layers
Increasing complexity, delay, and quality
Layer I: ~384 kbits/s for perceptually lossless quality (4:1) Layer II: ~192 kbits/s for perceptually lossless quality (8:1) Layer III: ~128 kbits/s for perceptually lossless quality (12:1) (for two channels)
100% perceptual lossless
18-796/Spring 1999/Chen
Layer I and II Encoder
32 Analysis Filterbank
512-tap Masking Threshold Generator Dynamic Bit Allocator Coder
Scaler & Quantizer Mux
FFT
512-pt for Layer I 1024-pt for Layer II/III
18-796/Spring 1999/Chen
Block-Based Coding
12 Analysis Filterbank 12 12
...
Block: Layer I Superblock: Layer II/III
12 samples for Layer I, 36 samples for Layer II/III Block companding: Each block normalized by scalefactor For Layer II, up to 3 scalefactors, with 2-bit scalefactor select Each block/superblock receives one bit allocation
Layer III Encoder
6 or 18 with overlap
Analysis Filterbank
MDCT
Scaler & Quantizer
Huffman Coding
Mux
Masking Threshold Generator Coding
FFT
18-796/Spring 1999/Chen
Features in Layer III
Hybrid filterbank
MDCT with filterbank
Long/short window switching
Short for better temporal resolution (to prevent pre-echoes) Long for better frequency resolution
Nonuniform quantization Entropy coding
Run-length and Huffman coding
Bit reservoir (buffer)
18-796/Spring 1999/Chen
Frame Structure
Header Info Side Info Subband Sanples Aux Data
Header info: Sync bits, system info, CRC (cyclic redundancy code) Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) Subband samples: 32 12 for Layer I, 32 36 for Layer II and III Packetization: 4-byte header, 184-byte payload
18-796/Spring 1999/Chen
Stereo Redundancy Coding
Four modes: mono, stereo, dual with two separate channel, joint stereo Joint stereo mode
Human stereo perception > 2kHz is based on envelope Intensity stereo coding > 2kHz
Encode (L + R) Assign independent left- and right- scalefactors
Layer III supports (L+R) and (LR) coding
18-796/Spring 1999/Chen
MPEG-2 Audio
ISO/IEC 13818-3
Allows lower sampling rates
16, 22.05, and 24 kHz: about half of MPEG-1
From wideband speech to mediumband audio Higher frequency resolution Layer I, II, and III
Multichannel coding
2~5 channels; surround sound, multilingual, for visual/hearing-impaired
Backward compatible and non-backward compatible coding (13818-7: MPEG-2 AAC)
18-796/Spring 1999/Chen
10
Multichannel Audio
2/0-stereo
3/0
3/1
Surround
LFE: Low-frequency enhancement (woofer) 15~120 Hz Can be anywhere
3/2
3/2 with woofer (5.1 system)
18-796/Spring 1999/Chen
Compatibility
Forward compatibility
A new decoder can decode an old bitstream Usually simple to achieve
Backward compatibility
An old decoder can decode a new bitstream, at least partially Usually limits the coding efficiency
18-796/Spring 1999/Chen
11
MPEG-2 Backward Compatible Audio Coding
MPEG-1 Header MPEG-1 Data MPEG-1 Ancillary Data
MPEG-1/2 Frame
MPEG-2 Header
MPEG-2 Data
L C R LS RS Matrix
L0 R0 T3 T4 T5
MPEG-1 Encoder MPEG-2 Extension Encoder Mux
L0 = ( L + C + LS ) 1 1 or = 1; = = 0 = 1+ 2 ; = = 2 R0 = ( R + C + RS )
Backward Compatible Audio Coding (cont.)
L C R LS RS
L0 R0 T3 Matrix T4 T5
MPEG-1 Encoder MPEG-2 Extension Encoder Mux Demux
L0 L R0 C T3 Inverse R MPEG-2 T4 Matrix LS Extension RS Decoder T5 MPEG-1 Decoder
Matrixing
Dematrixing
18-796/Spring 1999/Chen
12
Non Backward Compatible (NBC) Coding
MPEG-2 Advanced Audio Coding (AAC)
ISO/IEC 13818-7 (April 1997) 320~384 kbits/s for 5 channels, 64kbits/channel NBC at 320 kbits/s as good as BC coding at 640 kbits/s 1~48 audio channels, 0~16 LFEs, 0~16 data streams
Same framework (perceptual subband coding) as MPEG-1, with some enhancements
18-796/Spring 1999/Chen
MPEG-2 AAC
Noiseless Decoding
Enhancements
Preprocessing High resolution filterbanks
1024-line MDCT / 128
Legend Data Control Inverse Quantizer
Scale Factors
Temporal noise shaping (TNS): time-dependent quantization Coupling channel
Intensity multichannel coding
M/S 13818-7 Coded Audio Stream Bitstream Demultiplex
Prediction
Backward adaptive prediction in subbands M/S stereo coding Noiseless coding (entropy coding): Huffman coding
Intensity/ Coupling
TNS
Filter Bank Output Time Signal
Gain Control
13
Input time signal
Encoder
Perceptual Model Gain Control Legend Filter Bank Data Control
TNS
Intensity/ Coupling Quantized Spectrum Prediction of Previous Frame M/S Iteration Loops Scale Factors
Bitstream Multiplex
13818-7 Coded Audio Stream
Rate/Distortion Control Process
Quantizer
Noiseless Coding
18-796/Spring 1999/Chen
MPEG-2 AAC Profiles
Main Low Complexity Scaleable Sampling Rate 20 kHz 18 kHz 12 kHz 6 kHz
Main profile
Best quality, highest complexity 1024 or 128 MDCT
Low-complexity profile
No temporal noise shaping, no prediction
Scalable sampling-rate profile
Scalable output sampling rates and complexity Uses hybrid filterbanks (like MPEG-1 Layer III) No prediction, no coupling channel
18-796/Spring 1999/Chen
14
Simcast
To achieve backward compatibility at the cost of higher bitrate
L0 R0 L C R LS RS MPEG-2 AAC Encoder Mux Demux MPEG-2 AAC Decoder MPEG-1 Encoder MPEG-1 Decoder L0 R0 L C R LS RS
18-796/Spring 1999/Chen
References
Peter Noll, MPEG digital audio coding, IEEE Signal Processing Magazine, Sept. 1997, pp. 59-81 D. Pan, A tutorial on MPEG/audio compression, IEEE Multimedia, v. 2, no. 2, 1995, pp. 60-74 http://www.mpeg.org/MPEG/audio.html http://www.cselt.it/mpeg/faq/faq-audio.htm http://www.tnt.uni-hannover.de/project/mpeg/audio/
18-796/Spring 1999/Chen
15