SlideShare a Scribd company logo
2
Most read
4
Most read
6
Most read
Project : Micro-speech Recognition
Command
Recognizer
“No”
“Yes”
Phase 2 :
Deploy to a Microcontroller
T
Command Recognizer
Recognize what people said.
3
Training
.wav data
To FFT Trained model
FFT
Feature
Command
Recognizer
Model
Get
.wav data
To FFT
FFT
Feature “Yes”
Training
Inference
https://bit.ly/2XBdE4q
Overall flow to this project
ADC PCM FFT and
pre-process
Audio
Spectrum
CNN model
output tensor
silence
unknown
yes
no
audio_provider feature_provider
Copy into input tensor
PopulateFeatureData
Interpreter
Invoke()
softmax
RecognizeCommands::
ProcessLatestResults
RespondToCommand
The audio features themselves are a two-
dimensional array, made up of horizontal slices
representing the frequencies at one point in time,
stacked on top of each other to form a spectrogram
showing how those frequencies changed over time.
How to get audio features ?
Fourier Transform on sound
Frequencies in sound
The magnitude spectrum of the signal
A magnitude spectrogram is a
visualization of the frequencies
in sound over time, and can be
useful as a feature for neural
network recognition on noise or
speech.
Examine the spectrogram “audio images"
Audio spectrum representants audio features
You can see how the 30-ms
sample window is moved
forward by 20 ms each time
until it has covered the full
one-second sample.
40
49
feature buffer(1 second)
we combine the results of running the FFT on 49 consecutive 30-ms slices
of audio, and this will pass into the model
each FFT row represents a
30ms sample of audio split into
40 frequency buckets.
int(
𝑙𝑒𝑛𝑔𝑡ℎ−𝑤𝑖𝑛𝑑𝑜𝑤_𝑠𝑖𝑧𝑒
𝑠𝑡𝑟𝑖𝑑𝑒
) + 1
30+48*20=990ms
running an FFT across a 30ms section of the audio
sample data
FFT FFT
Audio Recognition Model (CNN Model)
CNN Model
silence
unknown
yes
no
1 second audio=40x49 pixels image
40
49
Our Model
CNN
Model
Input
output
(1,49,40,1) (1,4)
Type: int8 Type: int8 (-128~127)
Input byte: (1x49x40)x1 byte=1960 0 1 2
unknown
silence yes no
3
1 second audio spectrogram (49x40)
tensorflow/lite/micro/examples/micro_speech
Project File Structure
main_function.cc Tensorflow Lite 框架主要程式
recognize_commands.cc  對推論結果進行處理
micro_features/model.cc  Tflite model
XXXX_test.cc  以_test.cc 為檔名結尾
是一些可以在開發主機上進行的測試程式
arduino, sparkfun_edge, zephyr_riscv,..
裡頭為特定硬體的處理檔案, 若在編譯時指定
TARGET=XXX, 則會以資料匣內的檔案取代原檔案
├── sparkfun_edge
| ├── command_responder.cc
| └── audio_provider.cc
├── micro_features
GetAudioSamples()
GenerateMicroFeatures()
Project Flow
程式流程
Audio Spectrum
ADC
PCM
sparkfun_edge/audio_provider.cc
GetAudioSamples ()
GenerateMicroFeatures() 40
49
kFeatureSliceCount
kFeatureSliceSize
kFeatureElementCount=49x40
1 second window
performs the FFT and returns the audio
frequency information.
feature_provider.cc FeatureProvider::PopulateFeatureData
model input
main_functions.cc
feature_provider.cc
The feature provider converts raw audio, obtained
from the audio provider, into spectrograms that can
be fed into our model. It is called during the main
loop
FeatureProvider::PopulateFeatureData() : Fills the
feature data with information from audio inputs,
and returns how many feature slices were updated.
The Feature Provider
PopulateFeatureData()
每次都是1秒鐘的語音資
料, 但不用每次又全部重
算FFT , 只針對有新的
audio slice計算其FFT 即可,
以節省計算量及時間
feature_provider.cc
PopulateFeatureData()
1 second window
it first requests audio for that slice from
the audio provider using GetAudioSamples()
, and then it calls GenerateMicroFeatures() to
perform the FFT and returns the audio
frequency information .
feature_provider.cc
1 second window
audio_samples
_size: 512
audio_samples
feature_data_
FFT
feature_provider.cc
micro_features/micro_model_settings.h
sparkfun_edge/audio_provider.cc
GetAudioSamples () is expected to return an array of
14-bit pulse code modulated (PCM) audio data.
The Audio Provider
audio_samples
FFT
Size: 512
20ms 40ms 60ms 80ms 100ms
Digital audio format
14 bit PCM(Pulse-Code Modulation)
kAudioSampleFrequency=16KHz
 audio sample size=16000 samples/second
=16 samples/ 1ms
Generating the Sample Rate for the ADC
Trigger frequency
am_hal_ctimer_period_set(3, AM_HAL_CTIMER_TIMERA, 750, 0);
12MHz/750 = 16KHz (sampling rate)
audio_provider.cc
d
MIC1
MIC0
Timer A3
GPIO11/ADC2
GPIO29/ADC1
14bit ADC
12MHz
32K
SRAM
DMA
FIFO
ADC set up as a repeat scan mode
trigger ADC periodically
slot number+ Sampling data
Microphone
GPIO29/ADC1
GPIO11/ADC2
the channel select bit field specifies
which one of the analog
multiplexer channels will be used
for the conversions requested for
an individual slot.
When each active slot obtains a
sample from the ADC, it is added to
the value in its accumulator.
All slots write their accumulated
results to the FIFO
sparkfun_edge/audio_provider.cc
Copy (size:kAdcSampleBufferSize)
GetAudioSamples()
sparkfun_edge/audio_provider.cc
g_ui32ADCSampleBuffer1 [kAdcSampleBufferSize]
g_audio_capture_buffer
g_audio_capture_buffer[g_audio_capture_buffer_start]
= temp.ui32Sample;
Copy(size: duration_ms)
30ms PCM audio data
GetAudioSamples
(int start_ms, int duration_ms)
g_audio_output_buffer
Copy when ADC Interrupt occurs
ui32Slot
ui32Sample
ADC data (Slot 1 +Slot2 )
g_ui32ADCSampleBuffer0 [kAdcSampleBufferSize]
ui32TargetAddress
kAdcSampleBufferSize =2 slot* 1024 samples per slot
16000
512
Audio data is transferred by
DMA transfer
GetAudioSamples()
start_ms
start_ms+duration_ms
g_audio_capture_buffer
g_audio_output_buffer
當ISR發生一次, time stamp 就加1, 16 次ISR 表示共讀了16 * 1000 samples, , 約略經過1ms
Time stamp 計算方式
16000
g_audio_output_buffer[kMaxAudioSampleSize]
kMaxAudioSampleSize =512 ( power of two)
Part of the word “yes” being captured in our window
One Problem : Audio is live streaming
YES
??
CNN model
output tensor
silence
unknown
yes
no
Interpreter
Invoke()
softmax
RecognizeCommands::
ProcessLatestResults
RespondToCommand
The length of the averaging window
(average_window_duration_ms)
The minimum average score that counts as a detection
(detection_threshold)
The amount of time we’ll wait after hearing a command
before recognizing a second one (suppression_ms)
The minimum number of inferences required in the window
for a result to count (3)
RecognizeCommands
recognize_commands.cc
產生燒錄檔 micro_speech_wire.bin
寫入燒錄檔到板子
Hands – on
https://drive.google.com/drive/folders/1FhkM
DQ5xZoQS8GLkPZJPoVvT3dD3pk3g
Study
tensorflow/lite/micro/examples/micro_speech
main_function.cc
feature_provider.cc
recognize_commands.cc
/sparkfun_edge/command_responder.cc
開啓終端機 (baud rate: 115200bps)
Demo 終端機會輸出以下訊息
將 Sparkfun edge 透過 USB 連接電源後
會看到有藍光一直在閃 ,表示此時板子在
正等待語音輸入

More Related Content

PDF
Tensorflow lite for microcontroller
PDF
LCU13: An Introduction to ARM Trusted Firmware
PDF
Embedded Linux BSP Training (Intro)
PDF
U-Boot - An universal bootloader
ODP
eBPF maps 101
PPTX
Tiny ML for spark Fun Edge
PPTX
QEMU - Binary Translation
PDF
Network Drivers
Tensorflow lite for microcontroller
LCU13: An Introduction to ARM Trusted Firmware
Embedded Linux BSP Training (Intro)
U-Boot - An universal bootloader
eBPF maps 101
Tiny ML for spark Fun Edge
QEMU - Binary Translation
Network Drivers

What's hot (20)

PPT
OpenWRT guide and memo
PPTX
Introduction to Makefile
PDF
LCU14 500 ARM Trusted Firmware
PPT
U boot porting guide for SoC
PDF
Jagan Teki - U-boot from scratch
PDF
The linux networking architecture
PPT
Pcie drivers basics
PPTX
Raspberry pi 基本操作
ODP
Dpdk performance
PDF
BPF Internals (eBPF)
PDF
Booting Android: bootloaders, fastboot and boot images
PDF
USB Drivers
PDF
Introduction to Modern U-Boot
PDF
I2c drivers
PDF
Android Boot Time Optimization
PDF
Uboot startup sequence
PPTX
U-Boot Porting on New Hardware
PDF
File systems for Embedded Linux
PDF
CMake - Introduction and best practices
OpenWRT guide and memo
Introduction to Makefile
LCU14 500 ARM Trusted Firmware
U boot porting guide for SoC
Jagan Teki - U-boot from scratch
The linux networking architecture
Pcie drivers basics
Raspberry pi 基本操作
Dpdk performance
BPF Internals (eBPF)
Booting Android: bootloaders, fastboot and boot images
USB Drivers
Introduction to Modern U-Boot
I2c drivers
Android Boot Time Optimization
Uboot startup sequence
U-Boot Porting on New Hardware
File systems for Embedded Linux
CMake - Introduction and best practices
Ad

Similar to TinyML - 4 speech recognition (20)

DOCX
Implement a voice-controlled system using Edge Impulse’s machine learning pla...
PDF
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
PDF
Py conjp2019 renyuanlyu_3
PPTX
Py conjp2019 renyuanlyu_3
PDF
Py conjp2019 renyuanlyu_3
ZIP
TinyOS 2.1 Tutorial: TOSSIM
PPTX
A Science Project: Building a sound card based on the Covox Speech Thing
PDF
SMART APP FOR PHYSICALLY CHALLENGED PEOPLE USING INTERNET OF THINGS
PDF
Listening at the Cocktail Party with Deep Neural Networks and TensorFlow
PDF
20120140503024 2-3
PDF
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
PDF
FPGA-based implementation of speech recognition for robocar control using MFCC
PPTX
ICLR 2 papers review in signal processing domain
PPTX
Implementation of STFT-Based Audio Processing with LSTM Acceleration on Cyclo...
PPT
Speaker identification system with voice controlled functionality
PDF
2023-1117 AI Music Intro.pdf
PPTX
CNN Dataflow Implementation on FPGAs
PPTX
Text independent speaker recognition system
PDF
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
PPTX
seniordesign_presentation_final
Implement a voice-controlled system using Edge Impulse’s machine learning pla...
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
TinyOS 2.1 Tutorial: TOSSIM
A Science Project: Building a sound card based on the Covox Speech Thing
SMART APP FOR PHYSICALLY CHALLENGED PEOPLE USING INTERNET OF THINGS
Listening at the Cocktail Party with Deep Neural Networks and TensorFlow
20120140503024 2-3
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
FPGA-based implementation of speech recognition for robocar control using MFCC
ICLR 2 papers review in signal processing domain
Implementation of STFT-Based Audio Processing with LSTM Acceleration on Cyclo...
Speaker identification system with voice controlled functionality
2023-1117 AI Music Intro.pdf
CNN Dataflow Implementation on FPGAs
Text independent speaker recognition system
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
seniordesign_presentation_final
Ad

More from 艾鍗科技 (20)

PPTX
AI 技術浪潮, 什麼是機器學習? 什麼是深度學習, 什麼是生成式AI, AI 能力認證
PPTX
Appendix 1 Goolge colab
PPTX
Project-IOT於餐館系統的應用
PPTX
02 IoT implementation
PDF
Openvino ncs2
PDF
Step motor
PDF
2. 機器學習簡介
PDF
5.MLP(Multi-Layer Perceptron)
PDF
3. data features
PPTX
心率血氧檢測與運動促進
PPTX
利用音樂&情境燈幫助放鬆
PPTX
IoT感測器驅動程式 在樹莓派上實作
PPTX
無線聲控遙控車
PPT
最佳光源的研究和實作
PPTX
無線監控網路攝影機與控制自走車
PPTX
Reinforcement Learning
PPTX
Linux Device Tree
PPTX
人臉辨識考勤系統
PPTX
智慧家庭Smart Home
PPT
智能健身
AI 技術浪潮, 什麼是機器學習? 什麼是深度學習, 什麼是生成式AI, AI 能力認證
Appendix 1 Goolge colab
Project-IOT於餐館系統的應用
02 IoT implementation
Openvino ncs2
Step motor
2. 機器學習簡介
5.MLP(Multi-Layer Perceptron)
3. data features
心率血氧檢測與運動促進
利用音樂&情境燈幫助放鬆
IoT感測器驅動程式 在樹莓派上實作
無線聲控遙控車
最佳光源的研究和實作
無線監控網路攝影機與控制自走車
Reinforcement Learning
Linux Device Tree
人臉辨識考勤系統
智慧家庭Smart Home
智能健身

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PDF
composite construction of structures.pdf
PPTX
web development for engineering and engineering
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Welding lecture in detail for understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PPT on Performance Review to get promotions
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Lecture Notes Electrical Wiring System Components
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
OOP with Java - Java Introduction (Basics)
composite construction of structures.pdf
web development for engineering and engineering
Embodied AI: Ushering in the Next Era of Intelligent Systems
Welding lecture in detail for understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Arduino robotics embedded978-1-4302-3184-4.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Foundation to blockchain - A guide to Blockchain Tech
PPT on Performance Review to get promotions
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
bas. eng. economics group 4 presentation 1.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT 4 Total Quality Management .pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Lecture Notes Electrical Wiring System Components
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf

TinyML - 4 speech recognition

  • 1. Project : Micro-speech Recognition Command Recognizer “No” “Yes” Phase 2 : Deploy to a Microcontroller
  • 2. T Command Recognizer Recognize what people said. 3 Training .wav data To FFT Trained model FFT Feature Command Recognizer Model Get .wav data To FFT FFT Feature “Yes” Training Inference https://bit.ly/2XBdE4q Overall flow to this project ADC PCM FFT and pre-process Audio Spectrum CNN model output tensor silence unknown yes no audio_provider feature_provider Copy into input tensor PopulateFeatureData Interpreter Invoke() softmax RecognizeCommands:: ProcessLatestResults RespondToCommand
  • 3. The audio features themselves are a two- dimensional array, made up of horizontal slices representing the frequencies at one point in time, stacked on top of each other to form a spectrogram showing how those frequencies changed over time. How to get audio features ? Fourier Transform on sound Frequencies in sound
  • 4. The magnitude spectrum of the signal A magnitude spectrogram is a visualization of the frequencies in sound over time, and can be useful as a feature for neural network recognition on noise or speech. Examine the spectrogram “audio images"
  • 5. Audio spectrum representants audio features You can see how the 30-ms sample window is moved forward by 20 ms each time until it has covered the full one-second sample. 40 49 feature buffer(1 second) we combine the results of running the FFT on 49 consecutive 30-ms slices of audio, and this will pass into the model each FFT row represents a 30ms sample of audio split into 40 frequency buckets. int( 𝑙𝑒𝑛𝑔𝑡ℎ−𝑤𝑖𝑛𝑑𝑜𝑤_𝑠𝑖𝑧𝑒 𝑠𝑡𝑟𝑖𝑑𝑒 ) + 1 30+48*20=990ms running an FFT across a 30ms section of the audio sample data FFT FFT Audio Recognition Model (CNN Model) CNN Model silence unknown yes no 1 second audio=40x49 pixels image 40 49
  • 6. Our Model CNN Model Input output (1,49,40,1) (1,4) Type: int8 Type: int8 (-128~127) Input byte: (1x49x40)x1 byte=1960 0 1 2 unknown silence yes no 3 1 second audio spectrogram (49x40) tensorflow/lite/micro/examples/micro_speech Project File Structure main_function.cc Tensorflow Lite 框架主要程式 recognize_commands.cc  對推論結果進行處理 micro_features/model.cc  Tflite model XXXX_test.cc  以_test.cc 為檔名結尾 是一些可以在開發主機上進行的測試程式 arduino, sparkfun_edge, zephyr_riscv,.. 裡頭為特定硬體的處理檔案, 若在編譯時指定 TARGET=XXX, 則會以資料匣內的檔案取代原檔案 ├── sparkfun_edge | ├── command_responder.cc | └── audio_provider.cc ├── micro_features GetAudioSamples() GenerateMicroFeatures()
  • 7. Project Flow 程式流程 Audio Spectrum ADC PCM sparkfun_edge/audio_provider.cc GetAudioSamples () GenerateMicroFeatures() 40 49 kFeatureSliceCount kFeatureSliceSize kFeatureElementCount=49x40 1 second window performs the FFT and returns the audio frequency information. feature_provider.cc FeatureProvider::PopulateFeatureData model input
  • 8. main_functions.cc feature_provider.cc The feature provider converts raw audio, obtained from the audio provider, into spectrograms that can be fed into our model. It is called during the main loop FeatureProvider::PopulateFeatureData() : Fills the feature data with information from audio inputs, and returns how many feature slices were updated. The Feature Provider
  • 9. PopulateFeatureData() 每次都是1秒鐘的語音資 料, 但不用每次又全部重 算FFT , 只針對有新的 audio slice計算其FFT 即可, 以節省計算量及時間 feature_provider.cc PopulateFeatureData() 1 second window it first requests audio for that slice from the audio provider using GetAudioSamples() , and then it calls GenerateMicroFeatures() to perform the FFT and returns the audio frequency information . feature_provider.cc
  • 10. 1 second window audio_samples _size: 512 audio_samples feature_data_ FFT feature_provider.cc micro_features/micro_model_settings.h
  • 11. sparkfun_edge/audio_provider.cc GetAudioSamples () is expected to return an array of 14-bit pulse code modulated (PCM) audio data. The Audio Provider audio_samples FFT Size: 512 20ms 40ms 60ms 80ms 100ms Digital audio format 14 bit PCM(Pulse-Code Modulation) kAudioSampleFrequency=16KHz  audio sample size=16000 samples/second =16 samples/ 1ms Generating the Sample Rate for the ADC Trigger frequency am_hal_ctimer_period_set(3, AM_HAL_CTIMER_TIMERA, 750, 0); 12MHz/750 = 16KHz (sampling rate) audio_provider.cc d MIC1 MIC0 Timer A3 GPIO11/ADC2 GPIO29/ADC1 14bit ADC 12MHz 32K SRAM DMA FIFO ADC set up as a repeat scan mode trigger ADC periodically slot number+ Sampling data
  • 12. Microphone GPIO29/ADC1 GPIO11/ADC2 the channel select bit field specifies which one of the analog multiplexer channels will be used for the conversions requested for an individual slot. When each active slot obtains a sample from the ADC, it is added to the value in its accumulator. All slots write their accumulated results to the FIFO
  • 13. sparkfun_edge/audio_provider.cc Copy (size:kAdcSampleBufferSize) GetAudioSamples() sparkfun_edge/audio_provider.cc g_ui32ADCSampleBuffer1 [kAdcSampleBufferSize] g_audio_capture_buffer g_audio_capture_buffer[g_audio_capture_buffer_start] = temp.ui32Sample; Copy(size: duration_ms) 30ms PCM audio data GetAudioSamples (int start_ms, int duration_ms) g_audio_output_buffer Copy when ADC Interrupt occurs ui32Slot ui32Sample ADC data (Slot 1 +Slot2 ) g_ui32ADCSampleBuffer0 [kAdcSampleBufferSize] ui32TargetAddress kAdcSampleBufferSize =2 slot* 1024 samples per slot 16000 512 Audio data is transferred by DMA transfer
  • 14. GetAudioSamples() start_ms start_ms+duration_ms g_audio_capture_buffer g_audio_output_buffer 當ISR發生一次, time stamp 就加1, 16 次ISR 表示共讀了16 * 1000 samples, , 約略經過1ms Time stamp 計算方式 16000 g_audio_output_buffer[kMaxAudioSampleSize] kMaxAudioSampleSize =512 ( power of two) Part of the word “yes” being captured in our window One Problem : Audio is live streaming YES ??
  • 15. CNN model output tensor silence unknown yes no Interpreter Invoke() softmax RecognizeCommands:: ProcessLatestResults RespondToCommand The length of the averaging window (average_window_duration_ms) The minimum average score that counts as a detection (detection_threshold) The amount of time we’ll wait after hearing a command before recognizing a second one (suppression_ms) The minimum number of inferences required in the window for a result to count (3) RecognizeCommands
  • 16. recognize_commands.cc 產生燒錄檔 micro_speech_wire.bin 寫入燒錄檔到板子 Hands – on https://drive.google.com/drive/folders/1FhkM DQ5xZoQS8GLkPZJPoVvT3dD3pk3g Study tensorflow/lite/micro/examples/micro_speech main_function.cc feature_provider.cc recognize_commands.cc /sparkfun_edge/command_responder.cc
  • 17. 開啓終端機 (baud rate: 115200bps) Demo 終端機會輸出以下訊息 將 Sparkfun edge 透過 USB 連接電源後 會看到有藍光一直在閃 ,表示此時板子在 正等待語音輸入