SlideShare a Scribd company logo
Team Jarvis 
Final Presentation 
Pragya Agrawal 
Dominic Calabrese 
David Martel 
Nathan Sawicki
Project Goals 
• Design and build real-time speech recognition system 
• Build with embedded hardware 
• Used Source-Filter model of speech and Support Vector Machine 
classifier to recognize commands “zero” through “nine” 
• Finished system executes in real-time and has GPIO-based actuation 
to demonstrate functional voice recognition
System Architecture
Source-Filter Model of Speech 
• Word characterization should be 
independent of volume, pitch, and duration 
of the word 
• Simplify speech production model to being: 
1.Source - vibration of vocal chords 
2.Filter – vocal tract (i.e. positioning of 
tongue, mouth, etc.) 
• Accurately modeling the filter provides a 
basis for word recognition[4] 
Broad sweeps of spectrum (formants) result 
from the filter configuration. Rapidly varying 
peaks come from source resonances
All-Pole Filter Coefficients 
• First n filter coefficients can be roughly 
calculated using the first n time shifts of 
the autocorrelation of a signal 
• Levinson-Durbin recursion algorithm 
calculates all-pole filter coefficients from 
autocorrelation 
• Want to capture spectral envelope, so 
want ~10 filter coefficients[5] 
Too many coefficients leads to over-fitting of 
curve
Cepstral Coefficients 
• Cepstrum is useful in separating the source 
and filter 
• Cepstral coefficients are a very compact 
representation of the spectral envelope and 
are highly uncorrelated 
• Filter coefficients are too sensitive to 
numerical precision 
• Better to transform LP coefficients into 
cepstral coefficients[5] 
Cepstral Analysis on source filter model 
(a) DFT (b) log magnitude of DFT (c) IDFT
Support Vector Machine Learning 
• Support Vector Machine (SVM) is a supervised 
learning algorithm used for classification and 
regression 
• We utilize Multi-class Support Vector Machine 
• Our algorithm uses one-against-one method to 
construct (k *(k-1)/2) classifiers (k = number of 
classes), one SVM for each pair of classes. 
• LIBSVM, an integrated software for multi-class 
support vector classification is used[6]
Library 
• Stored autocorrelation coefficients calculated through C5515 
• Calculated cepstral coefficients in MATLAB 
• Three male speakers with combined 1920 recordings 
• 64 instances of each digit for each speaker 
9 Coef 0 1 2 3 4 5 6 7 8 9 
0 154 0 0 4 0 0 22 6 0 6 
1 0 166 1 1 23 1 0 0 0 0 
2 1 0 168 22 0 0 1 0 0 0 
3 13 0 6 172 0 0 1 0 0 0 
4 1 9 0 0 181 0 0 1 0 0 
5 0 1 0 0 0 190 0 1 0 0 
6 4 0 1 0 0 0 187 0 0 0 
7 1 0 0 0 1 0 0 189 0 1 
8 0 0 1 0 0 0 0 0 191 0 
9 0 0 1 0 0 0 0 2 0 189
Rejected Methods 
• Classification based on correlation of cepstral coefficients 
• Took maximum correlation between new signal and library 
• Not very robust to small variations or scalable 
• Classification using SVM on CRM database 
• Words cut off early in database or contaminated by other words 
• Recording conditions do not match our method
C5515: Vocalization Identification 
• Implemented Word from non-Word 
Identification 
• Grab frame of 256 samples Compute 
RMS of frame, compare to threshold 
• If RMS > Threshold 
• Accumulate frame data 
• Else if RMS < Threshold and Frames 
Acquired > 3 
• Compute Autocorrelation, 
• Transmit Data 
• Else 
• Reset Stored Data 
• Specific values determined experimentally
C5515: UART Transmission 
• Transmit Autocorrelation Coefficients 
• UART is 115200 baud, 8 bit, No 
Parity, 1 stop bit 
• Data is signed 16 bit 
• Bit masking and Reconstruction 
on the Raspberry Pi 
• BlueSmirf Bluetooth-UART Pipes 
• Abstracts wireless transmission 
• Looks like UART to microcontroller 
• Effectively Plug&Play
C5515: Major Challenges Faced 
• Autocorrelation Coefficient Overflow 
• Function Generator Provide too large a voltage 
• Forces autocorrelation to overflow 
• Bit-shifting worked temporarily, but reduced data precision: poor 
classifier performance and threshold variability 
• Solution: Switched to Microphone 
• Bluesmirf Setup 
• Configuring Bluesmirf requires commands at precise times 
• Solution: Implemented long delay function on C5515
Raspberry Pi: Word Classification 
• Implemented All-pole Model of Speech 
Vocalization for Classification 
• Computes LPC Coefficients from 
Autocorrelation 
• Converts LPC Coefficients into Cepstral 
Coefficients 
• LIBSVM multistage classifier 
• Algorithm written in mixed C/C++ 
• LPC and Cepstral functions codegen’d 
from Matlab 
• Wrapper in hand written code 
• Waits for autocorrelation input from UART
Raspberry Pi: Actuation 
• State Machine implemented 
• Displays infamous EECS 452 Fall 2014 Image on sequence of “452” 
• Displays special Raspberry Pi Image on “314” 
• GPIO array drives LED Binary Counter 
• Capable of implemented more complicated functions 
• Planned for Coffee Machine Actuation, ran out of time 
• Renders graphics using OpenVG Library 
• Displays Startup Image 
• Displays Digit Image on Classification
Raspberry Pi: Major Challenges Faced 
• Initially planned to use Simulink Model to implement code 
• Worked great for algorithm 
• Did not work well for IO 
• S-Functions are tricky to work with 
• Solution 
• Codegen core algorithm 
• Hand write wrapper 
• Matlab Coder Toolbox 
• Converts Matlab code into ANSI C code, with processor specific 
optimizations available 
• Extremely useful for complex algorithms 
• Very finicky to configure properly 
• Solution: Study, study, study
Design Expo Pictures
Design Expo Pictures
Design Expo Pictures
Design Expo Demonstration
Looking Forward 
• Coffee Machine Actuation 
• Build Better Library 
• More speakers 
• Female speakers 
• Non-Midwestern speakers 
• Investigate Tuning SVM Parameters
Questions / Comments
References 
[1]http://www.spectrumdigital.com/product_info.php?cPath=31&products_i 
d=238 
[2] https://www.sparkfun.com/products/12577 
[3] http://www.adafruit.com/product/1914 
[4] Dutoit, T., Moreau, N., Kroon, P., How is speech processed in a cell 
phone conversation?, 2009 
[5] Rabiner, L., Schafer, R., Introduction to Digital Speech Processing, 
2007 
[6] http://www.csie.ntu.edu.tw/~cjlin/libsvm/

More Related Content

PDF
Callgraph analysis
PDF
Ml ch17
PDF
Towards Detecting Performance Anti-patterns Using Classification Techniques
PDF
Deep Reality Simulation for Automated Poacher Detection with Mark Hamilton an...
PPTX
Graph processing
PDF
Let's Be HAV1ng You - London Video Tech October 2019
PDF
magellan_mongodb_workload_analysis
PDF
Wwx2014 - Todd Kulick "Shipping One Million Lines of Haxe to (Over) One Milli...
Callgraph analysis
Ml ch17
Towards Detecting Performance Anti-patterns Using Classification Techniques
Deep Reality Simulation for Automated Poacher Detection with Mark Hamilton an...
Graph processing
Let's Be HAV1ng You - London Video Tech October 2019
magellan_mongodb_workload_analysis
Wwx2014 - Todd Kulick "Shipping One Million Lines of Haxe to (Over) One Milli...

What's hot (20)

PPTX
Decision Making & Loops
PDF
Chris brown ti
PDF
CNIT 126 6: Recognizing C Code Constructs in Assembly
PDF
Trace Scheduling
PDF
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
PPT
Coverage Solutions on Emulators
PPTX
Dobre praktyki projektowania architektury i wdrażania systemów IT dla chmury ...
PPTX
Incremental model compiler for executable UML
PPTX
Practical Malware Analysis: Ch 6: Recognizing C Code Constructs in Assembly
PDF
presentation
PPT
Top schools in gudgao
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
PPT
Top schools in noida
PPTX
Strel streaming
PDF
Guider: An Integrated Runtime Performance Analyzer on AGL
PDF
Integrating Apache NiFi and Apache Apex
PDF
Distributed Convex Optimization Thesis - Behroz Sikander
PDF
Meetup 2009
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
PPTX
Multi core programming 1
Decision Making & Loops
Chris brown ti
CNIT 126 6: Recognizing C Code Constructs in Assembly
Trace Scheduling
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
Coverage Solutions on Emulators
Dobre praktyki projektowania architektury i wdrażania systemów IT dla chmury ...
Incremental model compiler for executable UML
Practical Malware Analysis: Ch 6: Recognizing C Code Constructs in Assembly
presentation
Top schools in gudgao
Alexander Kolb – Flink. Yet another Streaming Framework?
Top schools in noida
Strel streaming
Guider: An Integrated Runtime Performance Analyzer on AGL
Integrating Apache NiFi and Apache Apex
Distributed Convex Optimization Thesis - Behroz Sikander
Meetup 2009
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Multi core programming 1
Ad

Similar to Real-Time Voice Actuation (20)

PPTX
Text independent speaker recognition system
PDF
FPGA-based implementation of speech recognition for robocar control using MFCC
PDF
Speaker Recognition Using Vocal Tract Features
PDF
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
PDF
Speaker Identification & Verification Using MFCC & SVM
PPTX
Voice recognition system
PDF
A comparison of different support vector machine kernels for artificial speec...
PPTX
Speech based password authentication system on FPGA
PDF
Speaker and Speech Recognition for Secured Smart Home Applications
PPTX
Speaker recognition using MFCC
DOCX
Voice biometric recognition
PDF
Team Jarvis Poster
PDF
Classification of Language Speech Recognition System
PPT
Speaker identification system with voice controlled functionality
DOC
Speaker recognition on matlab
PPTX
Voice Identification And Recognition System, Matlab
PDF
PhD-Thesis-ErhardRank
PPT
Speech Recognition System By Matlab
PDF
Course report-islam-taharimul (1)
PPT
Automatic speech recognition
Text independent speaker recognition system
FPGA-based implementation of speech recognition for robocar control using MFCC
Speaker Recognition Using Vocal Tract Features
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
Speaker Identification & Verification Using MFCC & SVM
Voice recognition system
A comparison of different support vector machine kernels for artificial speec...
Speech based password authentication system on FPGA
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker recognition using MFCC
Voice biometric recognition
Team Jarvis Poster
Classification of Language Speech Recognition System
Speaker identification system with voice controlled functionality
Speaker recognition on matlab
Voice Identification And Recognition System, Matlab
PhD-Thesis-ErhardRank
Speech Recognition System By Matlab
Course report-islam-taharimul (1)
Automatic speech recognition
Ad

Real-Time Voice Actuation

  • 1. Team Jarvis Final Presentation Pragya Agrawal Dominic Calabrese David Martel Nathan Sawicki
  • 2. Project Goals • Design and build real-time speech recognition system • Build with embedded hardware • Used Source-Filter model of speech and Support Vector Machine classifier to recognize commands “zero” through “nine” • Finished system executes in real-time and has GPIO-based actuation to demonstrate functional voice recognition
  • 4. Source-Filter Model of Speech • Word characterization should be independent of volume, pitch, and duration of the word • Simplify speech production model to being: 1.Source - vibration of vocal chords 2.Filter – vocal tract (i.e. positioning of tongue, mouth, etc.) • Accurately modeling the filter provides a basis for word recognition[4] Broad sweeps of spectrum (formants) result from the filter configuration. Rapidly varying peaks come from source resonances
  • 5. All-Pole Filter Coefficients • First n filter coefficients can be roughly calculated using the first n time shifts of the autocorrelation of a signal • Levinson-Durbin recursion algorithm calculates all-pole filter coefficients from autocorrelation • Want to capture spectral envelope, so want ~10 filter coefficients[5] Too many coefficients leads to over-fitting of curve
  • 6. Cepstral Coefficients • Cepstrum is useful in separating the source and filter • Cepstral coefficients are a very compact representation of the spectral envelope and are highly uncorrelated • Filter coefficients are too sensitive to numerical precision • Better to transform LP coefficients into cepstral coefficients[5] Cepstral Analysis on source filter model (a) DFT (b) log magnitude of DFT (c) IDFT
  • 7. Support Vector Machine Learning • Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression • We utilize Multi-class Support Vector Machine • Our algorithm uses one-against-one method to construct (k *(k-1)/2) classifiers (k = number of classes), one SVM for each pair of classes. • LIBSVM, an integrated software for multi-class support vector classification is used[6]
  • 8. Library • Stored autocorrelation coefficients calculated through C5515 • Calculated cepstral coefficients in MATLAB • Three male speakers with combined 1920 recordings • 64 instances of each digit for each speaker 9 Coef 0 1 2 3 4 5 6 7 8 9 0 154 0 0 4 0 0 22 6 0 6 1 0 166 1 1 23 1 0 0 0 0 2 1 0 168 22 0 0 1 0 0 0 3 13 0 6 172 0 0 1 0 0 0 4 1 9 0 0 181 0 0 1 0 0 5 0 1 0 0 0 190 0 1 0 0 6 4 0 1 0 0 0 187 0 0 0 7 1 0 0 0 1 0 0 189 0 1 8 0 0 1 0 0 0 0 0 191 0 9 0 0 1 0 0 0 0 2 0 189
  • 9. Rejected Methods • Classification based on correlation of cepstral coefficients • Took maximum correlation between new signal and library • Not very robust to small variations or scalable • Classification using SVM on CRM database • Words cut off early in database or contaminated by other words • Recording conditions do not match our method
  • 10. C5515: Vocalization Identification • Implemented Word from non-Word Identification • Grab frame of 256 samples Compute RMS of frame, compare to threshold • If RMS > Threshold • Accumulate frame data • Else if RMS < Threshold and Frames Acquired > 3 • Compute Autocorrelation, • Transmit Data • Else • Reset Stored Data • Specific values determined experimentally
  • 11. C5515: UART Transmission • Transmit Autocorrelation Coefficients • UART is 115200 baud, 8 bit, No Parity, 1 stop bit • Data is signed 16 bit • Bit masking and Reconstruction on the Raspberry Pi • BlueSmirf Bluetooth-UART Pipes • Abstracts wireless transmission • Looks like UART to microcontroller • Effectively Plug&Play
  • 12. C5515: Major Challenges Faced • Autocorrelation Coefficient Overflow • Function Generator Provide too large a voltage • Forces autocorrelation to overflow • Bit-shifting worked temporarily, but reduced data precision: poor classifier performance and threshold variability • Solution: Switched to Microphone • Bluesmirf Setup • Configuring Bluesmirf requires commands at precise times • Solution: Implemented long delay function on C5515
  • 13. Raspberry Pi: Word Classification • Implemented All-pole Model of Speech Vocalization for Classification • Computes LPC Coefficients from Autocorrelation • Converts LPC Coefficients into Cepstral Coefficients • LIBSVM multistage classifier • Algorithm written in mixed C/C++ • LPC and Cepstral functions codegen’d from Matlab • Wrapper in hand written code • Waits for autocorrelation input from UART
  • 14. Raspberry Pi: Actuation • State Machine implemented • Displays infamous EECS 452 Fall 2014 Image on sequence of “452” • Displays special Raspberry Pi Image on “314” • GPIO array drives LED Binary Counter • Capable of implemented more complicated functions • Planned for Coffee Machine Actuation, ran out of time • Renders graphics using OpenVG Library • Displays Startup Image • Displays Digit Image on Classification
  • 15. Raspberry Pi: Major Challenges Faced • Initially planned to use Simulink Model to implement code • Worked great for algorithm • Did not work well for IO • S-Functions are tricky to work with • Solution • Codegen core algorithm • Hand write wrapper • Matlab Coder Toolbox • Converts Matlab code into ANSI C code, with processor specific optimizations available • Extremely useful for complex algorithms • Very finicky to configure properly • Solution: Study, study, study
  • 20. Looking Forward • Coffee Machine Actuation • Build Better Library • More speakers • Female speakers • Non-Midwestern speakers • Investigate Tuning SVM Parameters
  • 22. References [1]http://www.spectrumdigital.com/product_info.php?cPath=31&products_i d=238 [2] https://www.sparkfun.com/products/12577 [3] http://www.adafruit.com/product/1914 [4] Dutoit, T., Moreau, N., Kroon, P., How is speech processed in a cell phone conversation?, 2009 [5] Rabiner, L., Schafer, R., Introduction to Digital Speech Processing, 2007 [6] http://www.csie.ntu.edu.tw/~cjlin/libsvm/