SlideShare a Scribd company logo
3
Example-Based Audio Editing
Ramin Anushiravani
TechFest - Jan’17
1
Who	are	you?
• Bachelor	in	Electrical	and	Computer	Engineering
§ University	of	Illinois	at	Urbana	Champaign
§ Thesis:	3D	Audio	Playback	through	Two	Loudspeaker
• Master’s	in	Electrical	and	Computer	Engineering
§ University	of	Illinois	at	Urbana	Champaign
§ Thesis:	Example-Based	Audio	Editing
• Internships
– Advance	Digital	Science	Center	in	Singapore	(x2)
• 3D	Audio	Recording	through	microphone	arrays	and	Playback	through	two	loudspeakers
– GN-Resound	
• Spatial	Hearing	with	Hearing	Aids
– Adobe
• Acoustics	Matching
– AARP
• Recommender	Systems
• Now	at	Dolby
– Patent	Engineer	for	Audio	
• MPEG	Standards,	Audio	and	Speech	Codecs,	and	Dynamic	Range	Control 2
Outline
• Example-Based	What?
• Acoustic	Matching
– Manual/Automatic	Equalization
– Manual/Automatic	Noise
– Manual/Automatic	Reverberation
• User	Study
3
Example-Based What?
4
5
Acoustics	Matching
• Equalization
• Background	Noise
• Reverberation
6
How?
7
Xex,wet
Xin,wet
Xin,effect
Xex,dry
Matching
Reconstruct
Xex,effect
Xin,dry Ymat
Graphic Equalizer
iTunes	Equalizer	setting	
8
Equalizer Matching
9
P[k] =
1
L
LX
⌧
|STFT{x[n]}(k, ⌧)|2
Demo
10
10
Equalizer Matching
Log	Mag-dB
Log	spaced	frequency-Hz
11
Noise	Matching
12
Denoising
Spectral Subtraction
Noise profile estimate
Estimate clean power spectrum Noise suppression
factor
Fourier transform of the noisy
signal in one frame
Z(!) = X(!) + D(!)
| ˆX(!)|2
= H2
(!)|Z(!)|2
H(!) =
s
1
| ˆD(!)|2
|Z(!)|2
In practice,
• Noise profile is estimated over multiple frequency bands.
• Spectral subtraction fails at low SNR regions by creating musical noises. This artifact is
reduced by post-filtering the spectral subtraction.
(Philipos C. Loizou, Speech Enhancement
Theory and Practice, 2013)
Additive stationary noise
( Esch and Vary, Efficient Musical Noise Suppression for
Speech Enhancement Systems, 2009)
13
Spectral	Subtraction
y(n) = x(n) + d(n)
noisy	Signal clean	Signal noise
Y (!) = X(!) + D(!)
|Y (!)|2
= |X(!)|2
+ |D(!)|2
+ X(!).D⇤
(!) + X⇤
(!).D(!)
2Re{X(!).D⇤
(!)}
A	common	assumption	in	most	papers:	
Noise	and	the	clean	signal	are	uncorrelated.	
| ˆX(!)|2
= H2
(!)|Y (!)|2
|Y (!)| |D(!)| <= |Y (!)|2
> | ˆD(!)|2
0 <= |Y (!)|2
< | ˆD(!)|2
(Philipos C.	Loizou,	Speech	Enhancement	
Theory	and	Practice,	2013)
H(!) =
s
1
| ˆD(!)|2
|Y (!)|2
Fourier	Transform	over	a	segment	of	x(n).		
AWGN.	Same	over	all	clean	input	segments.		
Estimated	Noise	PSD.	
In	practice	H	is	learned	
over	different	
frequency	bands.	
14
Musical	Noise	Reduction
( Esch and Vary, Efficient Musical Noise Suppression for
Speech Enhancement Systems, 2009)
Aim: Retain the naturalness of the
remaining background noise.
How?
• 1
Detect low SNR frames based on the
noisy signal and the estimated clean signal.
• 2
Design a smoothing window based on 1.
Lower the SNR, longer the window.
• 3
Design a post-filter to smooth the low SNR
frames, i.e. an FIR low pass filter designed
based on 2.
• 3
Element-wise multiply the noise suppression
factor by 2.
Step	3
Enhanced	Spectral	Subtraction 15
SS	+	Musical	Noise	Reduction
⌦ =
G.*H	Musical	Suppression	PostFilterSNR=	22	dB
Noisy	Input
Much	Better!	
.^2 .^2
(
(
.^0.5
16
Noise Matching
17
Demo
18
Non-Stationary Noise
19
Z. Duan1, G. J. Mysore, and P. Smaragdis, Speech enhancement by
online non-negative spectrogram decomposition in non-stationary noise
environments," in Interspeech Conference, 2012.
Reverberation
Krannert Center for the Performing Arts, Foellinger Great Hall
20
21
Reverberation
Falkland Palace Bottle Dungeon
reverb sound
dry sound reverb kernel
(OpenAir database,	www.openairlib.net)
Approximate	in	the	
magnitude	STFT	domain
Convolution between
time frames of
magnitude X and H at
each frequency index
y(n) =
N 1X
k=0
x(k)r(n k)
|Y (t⌧ , k)| ⇡
Lh 1X
⌧=0
|R(⌧, k)||X(t⌧ ⌧, k)| = |X| ? |R|
?
(R. Talmon, I. Cohen, and S. Gannot, “Relative
transfer function identification using convolutive
transfer function approximation,” IEEE Trans.
Audio, Speech, and Language Process, 2009.)
22
Reverb kernel
=
?
23
Metrics for Ideal Reverberation
time
Magnitude-dB
Energy Decay Relief
Energy Decay Curve
EDC at multiple frequency bands
24
Reverberation	Model	
• Time Domain Statistical Model
Where b(t) is a zero mean Gaussian noise. is related to reverberation time.
• Reverberation time = RT60= Length of time to drop below 60 dB below the original level.
Sabine Formula:
Volume of the enclosure
Effective absorbing area
Area
of each wall
Absorption
coefficient
Reflection Coefficients:
25
Image	Source	Method
Source
Microphone
Mirror	image	
of	the	original	source
Actual	path
Perceived	path
Image	source	produces	
another	image	source
(Allen, J and Berkley, D. 'Image Method
for efficiently simulating small-room acoustics'. The Journal of the
Acoustical Society of America, Vol 65, No.4, pp. 943-950, 1978)
(Pictures from: Alex Tu, Reverberation
simulation from impulse response using
the Image Source Method)
Parameters that control which image source in which dimension
Reflection	coefficients	of	the	six	surfaces	in	a	rectangular	
Time	delay	of	the	considered	image	source
26
Reverberation Matching
Adry
Ra
Bdry
Rb
Dereverberation
Dereverberation
Ideal case – Perfect decomposition of reverb sounds into dry sounds and
reverb kernels.
Running	out	of	letters!
input
example
Focus is on decomposing the magnitude spectrograms into magnitude spectrograms.
I took the signals back to time domain using the reverberated input phase information.
C = Adry ? Rb
?
⇡ result
27
Non-Negative Matrix Factorization
X 2 R 0,m⇥n
, W 2 R 0,m⇥k
, H 2 R 0,k⇥n
where k < min(m, n)
,
• Applying Gradient Descent under positive initial conditions for W and H and a ‘clever’ learning rate results in
the following multiplicative update rules,
(Lee	and	Seung,	1999)
X ⇡
T 1X
k=0
WkHk
min W 0,H 0,||X WH||2
H = H ⌦ WT
.
X
W.H
W = W ⌦
X
W.H
.HT
W =
Wmk
P
j Wjk
Normalize	W
28
Why	NMF?	 (Lee	and	Seung,	1999)
Visually	meaningful.	
Decomposition	can	only	be	
positive.		Part	based	
presentation.	
Statistically	meaningful.	
Eigen	faces	are	in	the	
direction	of	the	largest	
variance.	Subtraction	can	
occur.		
29
Why	NMF?	
m	,Frequency
n,	time	Frame
k,	Components	=	2 n,	time	framem	,	Frequency
⇡
hann(1024)
75% overlap
k,	Components	=	2
W HX
30
Temporal	Failure!
(Adopted from: Paul O’Grady & Barak Pearkmutter, Convolutive NMF
with a Sparseness Constraint, MLSP Conference, 2006)
31
Convolutive NMF
32
(Adopted from: Paul O’Grady & Barak Pearkmutter, Convolutive NMF
with a Sparseness Constraint, MLSP Conference, 2006)
Convolutive NMF
T
H
m
k
k
n
X ⇡
n
m ⌦
Ht!
0
T 1X
t=0
X ⇡
T 1X
t=0
W(t).Ht!
X 2 R 0,m⇥n
, W(t) 2 R 0,m⇥k
, H 2 R 0,k⇥n
H1!
W(t)
W(1)
33
Convolutive Non-negative Matrix Factorization
Update Equations:
,
Paul O’Grady & Barak Pearkmutter, Convolutive NMF with a
Sparseness Constraint, MLSP Conference, 2006
ˆY = ˆX ? ˆR ˆX 2 R 0,Lx⇥k
ˆR 2 R 0,Lh⇥k
Ly = Lx + Lh 1
Y ⇡ ˆX ? ˆR Y and ˆY 2 R 0,Ly⇥k
ˆR = ˆR
ˆXt
T
.{Y
ˆY
} t
ˆXt
T
.1
ˆXt = ˆXt
Y
ˆY
. ˆRT,t!
1. ˆRT,t!
Convolution of non-
negative matrices
Shift operator
Spectrum at time frame t
Matrix of size
Ly x k with all
its elements
set to 1.
34
Convolutive NMF
Iteration	1Iteration	2Iteration	3Iteration	10
35
Dereverberation
• Initialize with positive random values.
• Initialize with positive exponential decays.
• Apply CNMF on Y.
• On each iteration, enforce anti-sparsity on
I dropped indices and absolute values, but they’re there.
Y ⇡ ˆX ? ˆR
ˆR ! ˆR↵
, ↵ 2 [0.85, 0.98]
ˆX
ˆR
ˆR
36
Set of dry speech bases (trained offline)
Corresponding activation
Hr Reverberated activation matrix
Dereverberation
We can do better by using more prior knowledge.
Convolution is associative
average R over multiple
frequency bands
(Paris Smaragdis, “Convolutive speech
bases and their application to supervised
speech separation,” in Speech And Audio
Processing. IEEE, 2007)
ˆX ⇡ Wc ? Hc
Y ⇡ (Wc ? Hc) ? R⌃
Y ⇡ Wc ? (Hc ? R⌃
)
37𝑌" ≈ 𝑊%. 𝐻%
Demo
0 0.678 1.356
Time (sec)
10-4
10-2
10
0
LogAmplitude
PSD for kernels
Original
Estimated
Exponentiated
Estimated bases Wd
5 10 15 20 25
Component
1000
2000
4000
8000
Frequency(Hz)
Dereverb Sound
1 2 3
Time (sec)
1000
2000
4000
8000
Frequency(Hz)
Reverb Sound
1 2 3
Time (sec)
1000
2000
4000
8000
Frequency(Hz)
Estimated activation H
d
1 2 3
Time (sec)
5
10
15
20
25
Component
Estimated activation Hr
1 2 3
Time (sec)
5
10
15
20
25
Component
38
Reverberation Matching
39
40
GUI
User	Study
• 40	people	
• Beginner	to	Expert	listeners	
• 3	Tasks,	3	recordings	each
• Ranging	easy	to	hard
• Match	Equalization	and	
Reverberation
User	Study
41
ReverbEQ EQ+Reverb
Ease	of	use
Ease	of	use
Any	Questions?
• Equalization	matching	
• Noise	matching		
• Reverberation	matching	
42

More Related Content

PPTX
Beamforming and microphone arrays
PPTX
Plane wave decomposition and beamforming for directional spatial sound locali...
PPTX
Sound Source Localization
PPTX
3D Spatial Response
PDF
Sound Source Localization with microphone arrays
PPTX
PDF
Lecture 10
PDF
Suppression of Chirp Interferers in GPS Using the Fractional Fourier Transform
Beamforming and microphone arrays
Plane wave decomposition and beamforming for directional spatial sound locali...
Sound Source Localization
3D Spatial Response
Sound Source Localization with microphone arrays
Lecture 10
Suppression of Chirp Interferers in GPS Using the Fractional Fourier Transform

What's hot (20)

PPTX
Defense - Sound space rendering based on the virtual Sound space rendering ba...
DOCX
PDF
Signal Processing
PDF
Antenna Paper Solution
PPT
07 frequency domain DIP
PPTX
example based audio editing
PDF
Echo Cancellation Algorithms using Adaptive Filters: A Comparative Study
PDF
Application of Digital Signal Processing In Echo Cancellation: A Survey
PPT
Lecture13
PDF
Time reversed acoustics - Mathias Fink
PPT
Enhancement in frequency domain
PPTX
Acoustic echo cancellation
PPT
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
PPT
Image Denoising Using Wavelet
PPTX
Implementation and comparison of Low pass filters in Frequency domain
PPTX
Fast Sparse 2-D DFT Computation using Sparse-Graph Alias Codes
PDF
Ax26326329
PDF
J012455865
PPT
Denoising of image using wavelet
PDF
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
Defense - Sound space rendering based on the virtual Sound space rendering ba...
Signal Processing
Antenna Paper Solution
07 frequency domain DIP
example based audio editing
Echo Cancellation Algorithms using Adaptive Filters: A Comparative Study
Application of Digital Signal Processing In Echo Cancellation: A Survey
Lecture13
Time reversed acoustics - Mathias Fink
Enhancement in frequency domain
Acoustic echo cancellation
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
Image Denoising Using Wavelet
Implementation and comparison of Low pass filters in Frequency domain
Fast Sparse 2-D DFT Computation using Sparse-Graph Alias Codes
Ax26326329
J012455865
Denoising of image using wavelet
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
Ad

Similar to Techfest jan17 (20)

PDF
Audio Signal Processing
PPTX
E media seminar 20_12_2017_artificial_reverberation
PDF
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
PDF
F010334548
PPTX
Prior distribution design for music bleeding-sound reduction based on nonnega...
PDF
Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
PDF
DSP-Lec 02-Sampling and Reconstruction.pdf
PDF
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
PDF
International Journal of Computational Engineering Research(IJCER)
PPT
Tomas_IWAENC_keynote10.ppt
PDF
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
PDF
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
PDF
Audio Equalization Using LMS Adaptive Filtering
DOCX
The method of comparing two audio files
PPTX
Speech signal time frequency representation
PPTX
Digital Signal Processing Tutorial Using Python
PDF
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
PDF
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
DOCX
The method of comparing two audio files
PDF
Adaptive noise estimation algorithm for speech enhancement
Audio Signal Processing
E media seminar 20_12_2017_artificial_reverberation
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
F010334548
Prior distribution design for music bleeding-sound reduction based on nonnega...
Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
DSP-Lec 02-Sampling and Reconstruction.pdf
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
International Journal of Computational Engineering Research(IJCER)
Tomas_IWAENC_keynote10.ppt
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
Audio Equalization Using LMS Adaptive Filtering
The method of comparing two audio files
Speech signal time frequency representation
Digital Signal Processing Tutorial Using Python
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
The method of comparing two audio files
Adaptive noise estimation algorithm for speech enhancement
Ad

Recently uploaded (20)

PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
communication and presentation skills 01
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Current and future trends in Computer Vision.pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPT
Total quality management ppt for engineering students
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Artificial Intelligence
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
R24 SURVEYING LAB MANUAL for civil enggi
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Information Storage and Retrieval Techniques Unit III
communication and presentation skills 01
UNIT - 3 Total quality Management .pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Current and future trends in Computer Vision.pptx
III.4.1.2_The_Space_Environment.p pdffdf
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Total quality management ppt for engineering students
Safety Seminar civil to be ensured for safe working.
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT on Performance Review to get promotions
Artificial Intelligence
Automation-in-Manufacturing-Chapter-Introduction.pdf
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Categorization of Factors Affecting Classification Algorithms Selection
R24 SURVEYING LAB MANUAL for civil enggi

Techfest jan17