SlideShare a Scribd company logo
Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik
Introduction Digital Audio Compression Removal of redundant or otherwise irrelevant information from audio signal  Audio compression algorithms are often referred to as “audio encoders” Applications Reduces required storage space Reduces required transmission bandwidth
Audio Compression Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,  uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate  Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data compression / encoding
Audio Data Compression Redundant information Implicit in the remaining information Ex. oversampled audio signal Irrelevant information Perceptually insignificant Cannot be recovered from remaining information
Audio Data Compression Lossless Audio  Compression Removes redundant data Resulting signal is  same  as original – perfect reconstruction Lossy Audio  Encoding Removes irrelevant data Resulting signal is  similar  to original
Audio Data Compression Audio vs. Speech Compression Techniques Speech Compression uses a human vocal tract model to compress signals Audio Compression does not use this technique due to larger variety of possible signal variations
Generic Audio Encoder
Generic Audio Encoder Psychoacoustic Model Psychoacoustics – study of how sounds are perceived by humans Uses  perceptual coding eliminate information from audio signal that is inaudible to the ear Detects conditions under which different audio signal components  mask  each other
Psychoacoustic Model Signal Masking Threshold cut-off Spectral (Frequency / Simultaneous) Masking Temporal Masking Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain
Signal Masking Threshold cut-off Hearing threshold level – a function of frequency Any frequency components below the threshold will not be perceived by human ear
Signal Masking Spectral Masking A frequency component can be partly or fully masked by another component that is close to it in frequency This shifts the hearing threshold
Signal Masking Temporal Masking A quieter sound can be masked by a louder sound if they are temporally close Sounds that occur both (shortly)  before  and  after  volume increase can be masked
Spectral Analysis Tasks of Spectral Analysis To derive masking thresholds to determine which signal components can be eliminated To generate a representation of the signal to which masking thresholds can be applied Spectral Analysis is done through transforms or filter banks
Spectral Analysis  Transforms Fast Fourier Transform (FFT) Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only Modified Discrete Cosine Transform (MDCT)  [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT
Spectral Analysis Filter Banks Time sample blocks are passed through a set of  bandpass filters Masking thresholds are applied to resulting frequency subband signals Poly-phase and wavelet banks are most popular filter structures
Filter Bank Structures Polyphase Filter Bank  [used in all of the MPEG-1 encoders] Signal is separated into subbands, the widths of which are equal over the entire frequency range  The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)
Filter Bank Structures Wavelet Filter Bank  [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent]  Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies) This allows for better time resolution (ex. short attacks), but at expense of frequency resolution
Noise Allocation System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be transmitted Any noise below the threshold is irrelevant Frequency component  quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as  noise allocation
Noise Allocation Pre-echo In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)
Pre-echo Effect
Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination) Predictive Coding Coupling / Delta Encoding Huffman Encoding
Additional Encoding Techniques Predictive Coding  Often used in speech and image compression Estimates the expected value for each sample based on previous sample values Transmits/stores the difference between the expected and received value Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample Used for additional compression in MPEG2 AAC
Additional Encoding Techniques Coupling / Delta encoding Used in cases where audio signal consists of two or more channels (stereo or surround sound) Similarities between channels are used for compression A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode  This is a case of lossless encoding process
Additional Encoding Techniques Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table Implemented using a look-up tables in encoder and in decoder Provides substantial lossless compression, but requires high computational power and therefore is not very popular Used by MPEG1 and MPEG2 AAC
Encoding - Final Stages  Audio data packed into frames Frames stored or transmitted
Conclusion  HTML Bibliography http://www.music.mcgill.ca/~pkoles Questions

More Related Content

PPTX
Audio compression
ODP
Audio compression
PDF
Video Compression
PPTX
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
PPTX
Audio and Video Compression
PPTX
Linear Predictive Coding
PPTX
MPEG video compression standard
PPTX
Audio compression
Audio compression
Audio compression
Video Compression
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
Audio and Video Compression
Linear Predictive Coding
MPEG video compression standard
Audio compression

What's hot (20)

PPTX
Text compression
PPTX
Audio compression
PPTX
Watershed Segmentation Image Processing
PPT
Chapter 4 : SOUND
PPT
Audio compression 1
PPT
PPTX
Audio compression
PPT
Speech encoding techniques
PPTX
Speech Recognition
DOCX
speech enhancement
PDF
SPEECH CODING
PPTX
Noise
PPTX
Power delay profile,delay spread and doppler spread
PPT
Flow & Error Control
PPTX
Smoothing in Digital Image Processing
PPT
Multimedia Compression and Communication
PPTX
Error control
PPT
Small Scale Multi path measurements
PPTX
SPATIAL FILTERING IN IMAGE PROCESSING
Text compression
Audio compression
Watershed Segmentation Image Processing
Chapter 4 : SOUND
Audio compression 1
Audio compression
Speech encoding techniques
Speech Recognition
speech enhancement
SPEECH CODING
Noise
Power delay profile,delay spread and doppler spread
Flow & Error Control
Smoothing in Digital Image Processing
Multimedia Compression and Communication
Error control
Small Scale Multi path measurements
SPATIAL FILTERING IN IMAGE PROCESSING
Ad

Similar to Speech Compression (20)

PPT
Lecture 8 audio compression
PPT
Lecture 8 audio compression
PPS
MPEG/Audio Compression
PDF
Speech compression analysis using matlab
PDF
Speech compression analysis using matlab
PDF
Shereef_MP3_decoder
PDF
Analysis of PEAQ Model using Wavelet Decomposition Techniques
PPTX
Audio Compression_2023.pptx
PPTX
Final presentation
PDF
Mk3422222228
PPTX
Multimedia seminar ppt
PDF
H010234144
PPT
Soundpres
PPT
Module-4.ppt of mmc which is multi media communication
PPT
Digital audio
PDF
Compression
PDF
A1mpeg12 2004
PDF
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
PDF
Low power fpga solution for dab audio decoder
PDF
Data Compression using Multiple Transformation Techniques for Audio Applicati...
Lecture 8 audio compression
Lecture 8 audio compression
MPEG/Audio Compression
Speech compression analysis using matlab
Speech compression analysis using matlab
Shereef_MP3_decoder
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Audio Compression_2023.pptx
Final presentation
Mk3422222228
Multimedia seminar ppt
H010234144
Soundpres
Module-4.ppt of mmc which is multi media communication
Digital audio
Compression
A1mpeg12 2004
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
Low power fpga solution for dab audio decoder
Data Compression using Multiple Transformation Techniques for Audio Applicati...
Ad

More from anithabalaprabhu (20)

PPTX
Shannon Fano
PDF
Ch 04 Arithmetic Coding ( P P T)
PPT
Compression
PPT
Datacompression1
PDF
Z24 4 Speech Compression
PPT
PDF
Dictionary Based Compression
PDF
Module 4 Arithmetic Coding
PDF
Ch 04 Arithmetic Coding (Ppt)
PPT
Compression Ii
PDF
06 Arithmetic 1
PDF
Arithmetic Coding
PPT
Compression Ii
PPT
PPT
PPT
Losseless
PPT
Lec5 Compression
Shannon Fano
Ch 04 Arithmetic Coding ( P P T)
Compression
Datacompression1
Z24 4 Speech Compression
Dictionary Based Compression
Module 4 Arithmetic Coding
Ch 04 Arithmetic Coding (Ppt)
Compression Ii
06 Arithmetic 1
Arithmetic Coding
Compression Ii
Losseless
Lec5 Compression

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf

Speech Compression

  • 1. Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik
  • 2. Introduction Digital Audio Compression Removal of redundant or otherwise irrelevant information from audio signal Audio compression algorithms are often referred to as “audio encoders” Applications Reduces required storage space Reduces required transmission bandwidth
  • 3. Audio Compression Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data compression / encoding
  • 4. Audio Data Compression Redundant information Implicit in the remaining information Ex. oversampled audio signal Irrelevant information Perceptually insignificant Cannot be recovered from remaining information
  • 5. Audio Data Compression Lossless Audio Compression Removes redundant data Resulting signal is same as original – perfect reconstruction Lossy Audio Encoding Removes irrelevant data Resulting signal is similar to original
  • 6. Audio Data Compression Audio vs. Speech Compression Techniques Speech Compression uses a human vocal tract model to compress signals Audio Compression does not use this technique due to larger variety of possible signal variations
  • 8. Generic Audio Encoder Psychoacoustic Model Psychoacoustics – study of how sounds are perceived by humans Uses perceptual coding eliminate information from audio signal that is inaudible to the ear Detects conditions under which different audio signal components mask each other
  • 9. Psychoacoustic Model Signal Masking Threshold cut-off Spectral (Frequency / Simultaneous) Masking Temporal Masking Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain
  • 10. Signal Masking Threshold cut-off Hearing threshold level – a function of frequency Any frequency components below the threshold will not be perceived by human ear
  • 11. Signal Masking Spectral Masking A frequency component can be partly or fully masked by another component that is close to it in frequency This shifts the hearing threshold
  • 12. Signal Masking Temporal Masking A quieter sound can be masked by a louder sound if they are temporally close Sounds that occur both (shortly) before and after volume increase can be masked
  • 13. Spectral Analysis Tasks of Spectral Analysis To derive masking thresholds to determine which signal components can be eliminated To generate a representation of the signal to which masking thresholds can be applied Spectral Analysis is done through transforms or filter banks
  • 14. Spectral Analysis Transforms Fast Fourier Transform (FFT) Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT
  • 15. Spectral Analysis Filter Banks Time sample blocks are passed through a set of bandpass filters Masking thresholds are applied to resulting frequency subband signals Poly-phase and wavelet banks are most popular filter structures
  • 16. Filter Bank Structures Polyphase Filter Bank [used in all of the MPEG-1 encoders] Signal is separated into subbands, the widths of which are equal over the entire frequency range The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)
  • 17. Filter Bank Structures Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies) This allows for better time resolution (ex. short attacks), but at expense of frequency resolution
  • 18. Noise Allocation System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be transmitted Any noise below the threshold is irrelevant Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation
  • 19. Noise Allocation Pre-echo In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)
  • 21. Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination) Predictive Coding Coupling / Delta Encoding Huffman Encoding
  • 22. Additional Encoding Techniques Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based on previous sample values Transmits/stores the difference between the expected and received value Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample Used for additional compression in MPEG2 AAC
  • 23. Additional Encoding Techniques Coupling / Delta encoding Used in cases where audio signal consists of two or more channels (stereo or surround sound) Similarities between channels are used for compression A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode This is a case of lossless encoding process
  • 24. Additional Encoding Techniques Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table Implemented using a look-up tables in encoder and in decoder Provides substantial lossless compression, but requires high computational power and therefore is not very popular Used by MPEG1 and MPEG2 AAC
  • 25. Encoding - Final Stages Audio data packed into frames Frames stored or transmitted
  • 26. Conclusion HTML Bibliography http://www.music.mcgill.ca/~pkoles Questions

Editor's Notes

  • #2: Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  • #3: -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size