Deep Learning | Speaker Indentification

DLL for Speaker Identification
Presentation I
Sai Kiran Kadam
Description:
Apply Deep Learning algorithms to Speaker
Identification/Authentication for Cyber Security and IoT1

Recent Advances in Deep Learning for Speech
Recognition at Microsoft
Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer,
GeojjZweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero
 This paper emphasizes the importance of Deep Learning over the traditional methods of
speech recognition.
 DL with MFCC (Cepstral Analysis) demonstrated lower speech recognition and word rate
errors (WER) when compared to GMM – HMM model.
 Higher the languages, to train DNN architecture, the lower the WER – 3.5% fewer than the
DNN with a single language.
 DNN is a universal learner which handles heterogeneous data from different speech
sources and languages as hidden layers outperform other methods.
 Based on research and experimental results, DNN outperforms the GMM systems.
 Questions:
 What equipment/hardware was used during training in various experiments ?
 How were the WER’s tracked and compared? How to implement DL Algorithms with
different DNN’s?
2

Unsupervised feature learning for audio classification
using convolutional deep belief networks
Honglak Lee, Yan Largman, Peter Pham, Andrew Y. Ng
 Apply Convolutional Deep Belief Nets, CDBN is a CRBM, to unlabeled TIMIT audio data to
show the learned features correspond to phonemes.
 (The spectral features  spectrogram)  CDBN after PCA of TIMIT to lower the dimensions
of audio data.
 CDBN to Speaker Recognition  CDBN with more layers outperformed RAW, MFCC
 Test Accuracy: CDBN(L1+L2) > CDBN-L1 > CDBN-L2 > MFCC > RAW for Speech, Music
 CDBN achieved higher performance on multiple audio recognition tasks.
 Challenge: Apply CDBN to larger data sets.
 Questions: What Math & Statistics concepts are used to derive the Energy equations of visible
and hidden layers? Do we need to focus on Math?
 Does Visualization of Phonemes involve Signal Processing?
3

An Extensible Speaker Identification Sidekit In Python
Anthony Larcher, Kong Aik Lee, Sylvain Meignier
 SIDEKIT – SpeakerIDEntification toolKIT is 100% Python tested on several platforms under
Python 2.7 and > 3.4.
 Developed with minimum dependencies to external modules.
 Provides end-to-end tool-chain plus state-of-the-art methods for SIDE
 Easy to install, implement algorithms, enable large datasets, fast computation, compatibility
with existing tools, end-to-end speaker recognition, etc.
 SIDEKIT is better than other existing tools like ALIZE (C++), Kaldi (C++), Matlab-SR, etc.
 Has tools to Evaluate Equal error Rates, Decision Cost Function, and plot Detection Error
Trade-off Curves.
 Uses GMM-HMM based approaches.
 Questions: Can SIDEKIT be designed based on DNN methods?
 Can it be used/integrated for/with User Authentication using Keystrokes Dynamics?
4

Keystroke Dynamics User Authentication Based on
Gaussian Mixture Model and Deep Belief Nets
Yunbin Deng and Yu Zhong
 Keystroke dynamics, a behavioral biometric, offers many advancements in User
Authentication depending on habitual patterns – Digraphs, Trigraphs.
 Uses four approaches for classification and training, namely, Statistical method based on
distances, Neural Nets, Statistical ML and other algorithms.
 Discusses implementation of Keystroke dynamics using a 32 mixture GMM-UBM and DBN
methods to reasonably represent an imposters data.
 DBN outperforms the GMM and GMM- UBM methods by significantly reducing the EER to
3.5%, 58% relative ERR.(static text)
 Highly accurate authentication is achievable by DBN
 Questions: Similar methodology be applied to Speaker recognition? By identifying different
variations in a persons voice or speech signal?
5

Effects Of Equipment Variations On
Speaker Recognition Error Rates
Clark D. Shaver MS Thesis
 Clarks work focused on the effect of different microphone gains and variabilities on
Error Rates for Speaker Recognition.
 He used the traditional Gaussian method for evaluating the microphones in the
Recognition task.
 The DNN or DBN method can be implemented in the algorithm, Clark worked on.
 Depending on the previous slides/papers, if DBN/DNN is applied to his work, the
accuracy in Error Rates of different microphones can be improved significantly.
 SVM can also be applied for classification purpose in Speaker Recognition.
6

Related Research Papers on DL for Speaker Recognition
 Fred Richardson, Douglas Reynolds, and Najim Dehak, “Deep Neural Network
approaches to Speaker and Language Recognition,” IEEE, Oct 2015.
 Rafael G. C. P. Pinto, Hardy L C. P. Pinto and Luiz P. Ca1oba,” Using Neural
Networks for Automatic Speaker Recognition: A Practical Approach,” IEEE 1996.
 Sreenivas Sremath Tirumala, Seyed Reza Shahamiri, “A review on Deep Learning
approaches in Speaker Identification,” ICSP, Nov 2016
 Alex Fandrianto, Ashley Jin, and Aman Neelappa, “Speaker Recognition Using Deep
Belief Networks,” Fall 2012, Stanford University.
 Muhammad Muneeb Saleem,“Deep Learning For Speech Classification And
Speaker Recognition,” Thesis, UT Dallas, Dec 2014.
7

Deep Learning | Speaker Indentification

More Related Content

What's hot (20)

Similar to Deep Learning | Speaker Indentification (20)

Recently uploaded (20)

Deep Learning | Speaker Indentification