SlideShare a Scribd company logo
DLL for Speaker Identification
Presentation I
Sai Kiran Kadam
Description:
Apply Deep Learning algorithms to Speaker
Identification/Authentication for Cyber Security and IoT1
Recent Advances in Deep Learning for Speech
Recognition at Microsoft
Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer,
GeojjZweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero
 This paper emphasizes the importance of Deep Learning over the traditional methods of
speech recognition.
 DL with MFCC (Cepstral Analysis) demonstrated lower speech recognition and word rate
errors (WER) when compared to GMM – HMM model.
 Higher the languages, to train DNN architecture, the lower the WER – 3.5% fewer than the
DNN with a single language.
 DNN is a universal learner which handles heterogeneous data from different speech
sources and languages as hidden layers outperform other methods.
 Based on research and experimental results, DNN outperforms the GMM systems.
 Questions:
 What equipment/hardware was used during training in various experiments ?
 How were the WER’s tracked and compared? How to implement DL Algorithms with
different DNN’s?
2
Unsupervised feature learning for audio classification
using convolutional deep belief networks
Honglak Lee, Yan Largman, Peter Pham, Andrew Y. Ng
 Apply Convolutional Deep Belief Nets, CDBN is a CRBM, to unlabeled TIMIT audio data to
show the learned features correspond to phonemes.
 (The spectral features  spectrogram)  CDBN after PCA of TIMIT to lower the dimensions
of audio data.
 CDBN to Speaker Recognition  CDBN with more layers outperformed RAW, MFCC
 Test Accuracy: CDBN(L1+L2) > CDBN-L1 > CDBN-L2 > MFCC > RAW for Speech, Music
 CDBN achieved higher performance on multiple audio recognition tasks.
 Challenge: Apply CDBN to larger data sets.
 Questions: What Math & Statistics concepts are used to derive the Energy equations of visible
and hidden layers? Do we need to focus on Math?
 Does Visualization of Phonemes involve Signal Processing?
3
An Extensible Speaker Identification Sidekit In Python
Anthony Larcher, Kong Aik Lee, Sylvain Meignier
 SIDEKIT – SpeakerIDEntification toolKIT is 100% Python tested on several platforms under
Python 2.7 and > 3.4.
 Developed with minimum dependencies to external modules.
 Provides end-to-end tool-chain plus state-of-the-art methods for SIDE
 Easy to install, implement algorithms, enable large datasets, fast computation, compatibility
with existing tools, end-to-end speaker recognition, etc.
 SIDEKIT is better than other existing tools like ALIZE (C++), Kaldi (C++), Matlab-SR, etc.
 Has tools to Evaluate Equal error Rates, Decision Cost Function, and plot Detection Error
Trade-off Curves.
 Uses GMM-HMM based approaches.
 Questions: Can SIDEKIT be designed based on DNN methods?
 Can it be used/integrated for/with User Authentication using Keystrokes Dynamics?
4
Keystroke Dynamics User Authentication Based on
Gaussian Mixture Model and Deep Belief Nets
Yunbin Deng and Yu Zhong
 Keystroke dynamics, a behavioral biometric, offers many advancements in User
Authentication depending on habitual patterns – Digraphs, Trigraphs.
 Uses four approaches for classification and training, namely, Statistical method based on
distances, Neural Nets, Statistical ML and other algorithms.
 Discusses implementation of Keystroke dynamics using a 32 mixture GMM-UBM and DBN
methods to reasonably represent an imposters data.
 DBN outperforms the GMM and GMM- UBM methods by significantly reducing the EER to
3.5%, 58% relative ERR.(static text)
 Highly accurate authentication is achievable by DBN
 Questions: Similar methodology be applied to Speaker recognition? By identifying different
variations in a persons voice or speech signal?
5
Effects Of Equipment Variations On
Speaker Recognition Error Rates
Clark D. Shaver MS Thesis
 Clarks work focused on the effect of different microphone gains and variabilities on
Error Rates for Speaker Recognition.
 He used the traditional Gaussian method for evaluating the microphones in the
Recognition task.
 The DNN or DBN method can be implemented in the algorithm, Clark worked on.
 Depending on the previous slides/papers, if DBN/DNN is applied to his work, the
accuracy in Error Rates of different microphones can be improved significantly.
 SVM can also be applied for classification purpose in Speaker Recognition.
6
Related Research Papers on DL for Speaker Recognition
 Fred Richardson, Douglas Reynolds, and Najim Dehak, “Deep Neural Network
approaches to Speaker and Language Recognition,” IEEE, Oct 2015.
 Rafael G. C. P. Pinto, Hardy L C. P. Pinto and Luiz P. Ca1oba,” Using Neural
Networks for Automatic Speaker Recognition: A Practical Approach,” IEEE 1996.
 Sreenivas Sremath Tirumala, Seyed Reza Shahamiri, “A review on Deep Learning
approaches in Speaker Identification,” ICSP, Nov 2016
 Alex Fandrianto, Ashley Jin, and Aman Neelappa, “Speaker Recognition Using Deep
Belief Networks,” Fall 2012, Stanford University.
 Muhammad Muneeb Saleem,“Deep Learning For Speech Classification And
Speaker Recognition,” Thesis, UT Dallas, Dec 2014.
7

More Related Content

PPTX
Deep Learning - Speaker Recognition
PPTX
Deep Learning - Speaker Verification, Sound Event Detection
PPTX
Deep Learning for Automatic Speaker Recognition
PDF
Deep Learning in practice : Speech recognition and beyond - Meetup
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPT
Speech Recognition System By Matlab
PPTX
Voice recognition system
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
Deep Learning - Speaker Recognition
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning for Automatic Speaker Recognition
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Speech Recognition System By Matlab
Voice recognition system
SPEECH RECOGNITION USING NEURAL NETWORK

What's hot (20)

PPTX
Automatic speech recognition system
PPTX
Ai based character recognition and speech synthesis
PPT
Speech Recognition
PPTX
Speech recognition techniques
PDF
Speaker identification using mel frequency
PPTX
Speaker identification
PPTX
Speech Recognition Technology
PPTX
Speech recognition final
DOC
Speaker recognition.
PPTX
Automatic speech recognition
PPT
Automatic speech recognition
PPT
Speech recognition system
PDF
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
PDF
C5 giruba beulah
PPSX
Speech recognition an overview
PPTX
Speech recognition challenges
PPT
Automatic speech recognition
PDF
Information Retrieval with Deep Learning
PPTX
Convolutional neural networks for sentiment classification
Automatic speech recognition system
Ai based character recognition and speech synthesis
Speech Recognition
Speech recognition techniques
Speaker identification using mel frequency
Speaker identification
Speech Recognition Technology
Speech recognition final
Speaker recognition.
Automatic speech recognition
Automatic speech recognition
Speech recognition system
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
C5 giruba beulah
Speech recognition an overview
Speech recognition challenges
Automatic speech recognition
Information Retrieval with Deep Learning
Convolutional neural networks for sentiment classification
Ad

Similar to Deep Learning | Speaker Indentification (20)

PDF
Deep convolutional neural networks-based features for Indonesian large vocabu...
PDF
B034205010
PDF
Exploring and comparing various machine and deep learning technique algorithm...
PDF
deep_learning_in_speech_and_visual_applications
PPTX
AI for voice recognition.pptx
PDF
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
PDF
RNN-GAN Integration for Enhanced Voice-Based Email Accessibility: A Comparati...
PDF
RNN-GAN Integration for Enhanced Voice-Based Email Accessibility: A Comparati...
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
PPTX
Deepfake-Detection-Using-Deep-Learning.pptx
PDF
Kc3517481754
PDF
Text pre-processing of multilingual for sentiment analysis based on social ne...
PPTX
major project ppt final (SignLanguage Detection)
PDF
130817 latifa guerrouj - context-aware source code vocabulary normalization...
PPTX
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
PDF
AN EFFICIENT SPEECH RECOGNITION SYSTEM
PDF
A survey on Enhancements in Speech Recognition
PPTX
Mid-term SDP-2024 Group-IH18 Evaluation PPT.pptx
PDF
Smart Solutions for Question Duplication: Deep Learning in Action
PPTX
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
Deep convolutional neural networks-based features for Indonesian large vocabu...
B034205010
Exploring and comparing various machine and deep learning technique algorithm...
deep_learning_in_speech_and_visual_applications
AI for voice recognition.pptx
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...
RNN-GAN Integration for Enhanced Voice-Based Email Accessibility: A Comparati...
RNN-GAN Integration for Enhanced Voice-Based Email Accessibility: A Comparati...
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Deepfake-Detection-Using-Deep-Learning.pptx
Kc3517481754
Text pre-processing of multilingual for sentiment analysis based on social ne...
major project ppt final (SignLanguage Detection)
130817 latifa guerrouj - context-aware source code vocabulary normalization...
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
AN EFFICIENT SPEECH RECOGNITION SYSTEM
A survey on Enhancements in Speech Recognition
Mid-term SDP-2024 Group-IH18 Evaluation PPT.pptx
Smart Solutions for Question Duplication: Deep Learning in Action
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
Ad

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
UNIT 4 Total Quality Management .pptx
PDF
PPT on Performance Review to get promotions
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Sustainable Sites - Green Building Construction
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mechanical Engineering MATERIALS Selection
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Internet of Things (IOT) - A guide to understanding
Lesson 3_Tessellation.pptx finite Mathematics
UNIT 4 Total Quality Management .pptx
PPT on Performance Review to get promotions
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Sustainable Sites - Green Building Construction
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Structs to JSON How Go Powers REST APIs.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
573137875-Attendance-Management-System-original
OOP with Java - Java Introduction (Basics)
CYBER-CRIMES AND SECURITY A guide to understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
bas. eng. economics group 4 presentation 1.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf

Deep Learning | Speaker Indentification

  • 1. DLL for Speaker Identification Presentation I Sai Kiran Kadam Description: Apply Deep Learning algorithms to Speaker Identification/Authentication for Cyber Security and IoT1
  • 2. Recent Advances in Deep Learning for Speech Recognition at Microsoft Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, GeojjZweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero  This paper emphasizes the importance of Deep Learning over the traditional methods of speech recognition.  DL with MFCC (Cepstral Analysis) demonstrated lower speech recognition and word rate errors (WER) when compared to GMM – HMM model.  Higher the languages, to train DNN architecture, the lower the WER – 3.5% fewer than the DNN with a single language.  DNN is a universal learner which handles heterogeneous data from different speech sources and languages as hidden layers outperform other methods.  Based on research and experimental results, DNN outperforms the GMM systems.  Questions:  What equipment/hardware was used during training in various experiments ?  How were the WER’s tracked and compared? How to implement DL Algorithms with different DNN’s? 2
  • 3. Unsupervised feature learning for audio classification using convolutional deep belief networks Honglak Lee, Yan Largman, Peter Pham, Andrew Y. Ng  Apply Convolutional Deep Belief Nets, CDBN is a CRBM, to unlabeled TIMIT audio data to show the learned features correspond to phonemes.  (The spectral features  spectrogram)  CDBN after PCA of TIMIT to lower the dimensions of audio data.  CDBN to Speaker Recognition  CDBN with more layers outperformed RAW, MFCC  Test Accuracy: CDBN(L1+L2) > CDBN-L1 > CDBN-L2 > MFCC > RAW for Speech, Music  CDBN achieved higher performance on multiple audio recognition tasks.  Challenge: Apply CDBN to larger data sets.  Questions: What Math & Statistics concepts are used to derive the Energy equations of visible and hidden layers? Do we need to focus on Math?  Does Visualization of Phonemes involve Signal Processing? 3
  • 4. An Extensible Speaker Identification Sidekit In Python Anthony Larcher, Kong Aik Lee, Sylvain Meignier  SIDEKIT – SpeakerIDEntification toolKIT is 100% Python tested on several platforms under Python 2.7 and > 3.4.  Developed with minimum dependencies to external modules.  Provides end-to-end tool-chain plus state-of-the-art methods for SIDE  Easy to install, implement algorithms, enable large datasets, fast computation, compatibility with existing tools, end-to-end speaker recognition, etc.  SIDEKIT is better than other existing tools like ALIZE (C++), Kaldi (C++), Matlab-SR, etc.  Has tools to Evaluate Equal error Rates, Decision Cost Function, and plot Detection Error Trade-off Curves.  Uses GMM-HMM based approaches.  Questions: Can SIDEKIT be designed based on DNN methods?  Can it be used/integrated for/with User Authentication using Keystrokes Dynamics? 4
  • 5. Keystroke Dynamics User Authentication Based on Gaussian Mixture Model and Deep Belief Nets Yunbin Deng and Yu Zhong  Keystroke dynamics, a behavioral biometric, offers many advancements in User Authentication depending on habitual patterns – Digraphs, Trigraphs.  Uses four approaches for classification and training, namely, Statistical method based on distances, Neural Nets, Statistical ML and other algorithms.  Discusses implementation of Keystroke dynamics using a 32 mixture GMM-UBM and DBN methods to reasonably represent an imposters data.  DBN outperforms the GMM and GMM- UBM methods by significantly reducing the EER to 3.5%, 58% relative ERR.(static text)  Highly accurate authentication is achievable by DBN  Questions: Similar methodology be applied to Speaker recognition? By identifying different variations in a persons voice or speech signal? 5
  • 6. Effects Of Equipment Variations On Speaker Recognition Error Rates Clark D. Shaver MS Thesis  Clarks work focused on the effect of different microphone gains and variabilities on Error Rates for Speaker Recognition.  He used the traditional Gaussian method for evaluating the microphones in the Recognition task.  The DNN or DBN method can be implemented in the algorithm, Clark worked on.  Depending on the previous slides/papers, if DBN/DNN is applied to his work, the accuracy in Error Rates of different microphones can be improved significantly.  SVM can also be applied for classification purpose in Speaker Recognition. 6
  • 7. Related Research Papers on DL for Speaker Recognition  Fred Richardson, Douglas Reynolds, and Najim Dehak, “Deep Neural Network approaches to Speaker and Language Recognition,” IEEE, Oct 2015.  Rafael G. C. P. Pinto, Hardy L C. P. Pinto and Luiz P. Ca1oba,” Using Neural Networks for Automatic Speaker Recognition: A Practical Approach,” IEEE 1996.  Sreenivas Sremath Tirumala, Seyed Reza Shahamiri, “A review on Deep Learning approaches in Speaker Identification,” ICSP, Nov 2016  Alex Fandrianto, Ashley Jin, and Aman Neelappa, “Speaker Recognition Using Deep Belief Networks,” Fall 2012, Stanford University.  Muhammad Muneeb Saleem,“Deep Learning For Speech Classification And Speaker Recognition,” Thesis, UT Dallas, Dec 2014. 7