SlideShare a Scribd company logo
From Speech to
Knowledge
Latest Updates and Experiences in Launching Local Language Tools
Karel Bourgois • 20+ years in Telecom
Who am I ?
• Entrepreneur
• Ecosystem
Le Voice Lab
Voxist voicemail since 2016
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Main Features
ü Custom greetings
ü Speech to text
Products
Clients
ü B2C : 10’s of thousand users with over 5% paying
ü B2B : Consulting firms, Law firms, Entrepreneurs…
3
2022
Business Model
Telcos Voicemail apps have low ratings
SA1
SA2
SA3
DONNÉES
PRIVATE PUBLIC
AUGMENTED
Corporate Labs
MOTEURS
SERVICES
Unified Voice Related APIs (ASR, TTS, NLP,...)
APIs
Corporate
Labs
€
Corporate/Labs/Gov
€
MARKET PLACE
€ Corporate/Labs/Gov
Vocal Assistants – Emotions – Voice ID – Translation – Subtitles …
Open
Source
Le Voice Lab
APIs in the Cloud & On-premise
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Current Features
ü Transcriptions in French & English
ü Punctuation
ü Speaker separation (Diarization)
Coming soon
Ø Spanish, Portuguese, German, Italian
Ø TTS: create your own assistant voices
Ø Real-time translation
Products
Clients
ü French Vocal Assistant manufacturer
ü Le Voice Lab
Distributors
ü OVH
ü Eden.ai
7
Why Now
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
8
9
Traditional ASR approach
This solution split the optimization of ASR problem into 3 components
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
10
Traditional ASR approach
This solution split the optimization of ASR problem into 3 components
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Acoustic Model: Neural Network
transduce signals frames to sequence of
phonemes (Tri-phones), using EM
techniques + Lattice Free-MMI (Maximum
mutual information)
Phonetic Lexicon: it provides
the decomposition of words into
basic acoustic unit
Language Model: using n-
gram model, estimation of
probabilities based on
frequency
11
Traditional ASR issues
Large Annotated dataset require
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Traditional ASR requires annotated data for:
1. Acoustic modeling : large amount of audio with the corresponding texts and even
phonemes
2. Lexicon creation : all the ways of saying the same phonemes / words
Þ This requires also very specific skills in the linguistic domain
This is the approach of ASR toolkits like Kaldi, HTK, Sphinx, Julius, RASR that were
crated before E2E solutions where available
(Kaldi main contributor, Daniel Povey, now works at Xiomi in China and works on a new E2E ASR engine called K2)
12
New ASR approaches
End-to-End Neural Networks (E2E)
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Predict sequence of characters directly
from speech using Neural Network and
differentiable CTC Loss
13
Advantages of new ASR approach
Self-Supervised techniques
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
The idea is to learn Language
model directly from Speech:
- You need much less
annotated data
- Less specialized Linguistic
skills
- No phonetic lexicons
14
Voxist hybrid approach
Self-Supervised & Domain Specific
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Lexicon and Language
Model created for target
domain using client data
15
Voxist Results
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Models
WER on 40h
(GigaSpeech)
Google 18.9
Kaldi 14.9
MS 12.4
Pika 12.3
ESPnet 10.3
WeNet 10.6
Voxist basic 10.2
Voxist hybrid 9.8
Voxist tech can also bypass ASR and get Intents directly
Self-supervised applied to SLU
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Video to text & knowledge management
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Current Features
ü Video indexing and Semantic search
ü Video subtitles
Coming soon
Ø Audio search without ASR
Ø Multimodal Sentiment Analysis
Ø Auto translate
Products
17
What Next ?
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
A Telco Vocal Assistant ?
• ASR + TTS
• Conversational Agent
• Noise reduction / speech enhancement
All in the cloud-native mobile core networks
of tomorrow…
Products
18
Karel, BOURGOIS, Founder
karel@voxist.com
@bourgois

More Related Content

PDF
Speech recognition - how does it work?
PDF
Turn-Text-to-Speech-The-Future-of-AI-Voices
PDF
2023 State of Automatic Speech Recognition
PPTX
voxygen - leading tts innovation - 211212 1.0
PPTX
voxygen - leading tts innovation - 211212 1.2
PDF
Enterprise Voice Technology Solutions: A Primer
PPTX
Artificial Intelligence - An Introduction
PPTX
Artificial Intelligence- An Introduction
Speech recognition - how does it work?
Turn-Text-to-Speech-The-Future-of-AI-Voices
2023 State of Automatic Speech Recognition
voxygen - leading tts innovation - 211212 1.0
voxygen - leading tts innovation - 211212 1.2
Enterprise Voice Technology Solutions: A Primer
Artificial Intelligence - An Introduction
Artificial Intelligence- An Introduction

Similar to Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois (20)

PPTX
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
PPTX
VOICE BROWSER
PPTX
VOICE BROWSER
PDF
Build your own ASR engine
PPTX
Voice Recognition and Natural Language - Dallas TechFest 2016
PDF
Top 10 Best Speech Recognition Software
PPTX
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
PPTX
The rise of CPaaS / UCaaS as a Channel to Market for New Services, Karel Bour...
PDF
FUTURE OF COMMUNICATION: TEXT-TO-SPEECH SOFTWARE
PPTX
Text to Speech for Mobile Voice
PPTX
Personal Voice Assistant using python.pptx
PDF
General Speereo Technology
PPTX
Google Voice-to-text
PPTX
Speech recognition techniques
PDF
Voicexml 100423121930-phpapp01
PDF
speech technologies with end-to-end toolkit present
PDF
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
DOCX
Voisi AI Review - Bonus - Cyril Gupta.docx
PDF
EV3RPI: AI Is The New Mobile
PDF
AI secrets - become a pro at using AI with these slides
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
VOICE BROWSER
VOICE BROWSER
Build your own ASR engine
Voice Recognition and Natural Language - Dallas TechFest 2016
Top 10 Best Speech Recognition Software
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
The rise of CPaaS / UCaaS as a Channel to Market for New Services, Karel Bour...
FUTURE OF COMMUNICATION: TEXT-TO-SPEECH SOFTWARE
Text to Speech for Mobile Voice
Personal Voice Assistant using python.pptx
General Speereo Technology
Google Voice-to-text
Speech recognition techniques
Voicexml 100423121930-phpapp01
speech technologies with end-to-end toolkit present
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
Voisi AI Review - Bonus - Cyril Gupta.docx
EV3RPI: AI Is The New Mobile
AI secrets - become a pro at using AI with these slides
Ad

More from Alan Quayle (20)

PDF
What is a vCon?
PDF
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
PPTX
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
PDF
What makes a cellular IoT API great? Tobias Goebel
PDF
eSIM as Root of Trust for IoT security, João Casal
PPTX
Architecting your WebRTC application for scalability, Arin Sime
PPTX
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
PDF
Programmable Testing for Programmable Telcos, Andreas Granig
PDF
How to best maximize the conversation data stream for your business? Surbhi R...
PDF
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
PDF
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
PDF
Open Source Telecom Software Survey 2022, Alan Quayle
PDF
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
PDF
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
PDF
What happened since we last met TADSummit 2022, Alan Quayle
PDF
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
PDF
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
PDF
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
PDF
How to bring down your own RTC platform. Sandro Gauci
PPTX
Radisys - Engage Digital - TADSummit Nov 2022
What is a vCon?
Supercharging CPaaS Growth & Margins with Identity and Authentication, Aditya...
Building a sub-second virtual ThunderDome: Considerations for mass scale sub-...
What makes a cellular IoT API great? Tobias Goebel
eSIM as Root of Trust for IoT security, João Casal
Architecting your WebRTC application for scalability, Arin Sime
CPaaS Conversational Platforms and Conversational Customer Service – The Expe...
Programmable Testing for Programmable Telcos, Andreas Granig
How to best maximize the conversation data stream for your business? Surbhi R...
What Everyone Needs to Know about Protecting the CPaaS Ecosystem from Unlawfu...
Master the Audience Experience Multiverse: AX Best Practices and Success Stor...
Open Source Telecom Software Survey 2022, Alan Quayle
OpenSIPS 3.3 – Messaging in the IMS and UC ecosystems. Bogdan-Andrei Iancu
TADS 2022 - Shifting from Voice to Workflow Management, Filipe Leitao
What happened since we last met TADSummit 2022, Alan Quayle
Stacuity - TAD Summit 2022 - Time to ditch the dumb-pipe, Mike Bromwich
AWA – a Telco bootstrapping product development: Challenges with dynamic mark...
Founding a Startup in Telecoms. The good, the bad and the ugly. João Camarate
How to bring down your own RTC platform. Sandro Gauci
Radisys - Engage Digital - TADSummit Nov 2022
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
A Presentation on Artificial Intelligence
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

  • 1. From Speech to Knowledge Latest Updates and Experiences in Launching Local Language Tools
  • 2. Karel Bourgois • 20+ years in Telecom Who am I ? • Entrepreneur • Ecosystem Le Voice Lab
  • 3. Voxist voicemail since 2016 Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Main Features ü Custom greetings ü Speech to text Products Clients ü B2C : 10’s of thousand users with over 5% paying ü B2B : Consulting firms, Law firms, Entrepreneurs… 3
  • 5. Telcos Voicemail apps have low ratings
  • 6. SA1 SA2 SA3 DONNÉES PRIVATE PUBLIC AUGMENTED Corporate Labs MOTEURS SERVICES Unified Voice Related APIs (ASR, TTS, NLP,...) APIs Corporate Labs € Corporate/Labs/Gov € MARKET PLACE € Corporate/Labs/Gov Vocal Assistants – Emotions – Voice ID – Translation – Subtitles … Open Source Le Voice Lab
  • 7. APIs in the Cloud & On-premise Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Current Features ü Transcriptions in French & English ü Punctuation ü Speaker separation (Diarization) Coming soon Ø Spanish, Portuguese, German, Italian Ø TTS: create your own assistant voices Ø Real-time translation Products Clients ü French Vocal Assistant manufacturer ü Le Voice Lab Distributors ü OVH ü Eden.ai 7
  • 8. Why Now Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. 8
  • 9. 9 Traditional ASR approach This solution split the optimization of ASR problem into 3 components Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
  • 10. 10 Traditional ASR approach This solution split the optimization of ASR problem into 3 components Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Acoustic Model: Neural Network transduce signals frames to sequence of phonemes (Tri-phones), using EM techniques + Lattice Free-MMI (Maximum mutual information) Phonetic Lexicon: it provides the decomposition of words into basic acoustic unit Language Model: using n- gram model, estimation of probabilities based on frequency
  • 11. 11 Traditional ASR issues Large Annotated dataset require Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Traditional ASR requires annotated data for: 1. Acoustic modeling : large amount of audio with the corresponding texts and even phonemes 2. Lexicon creation : all the ways of saying the same phonemes / words Þ This requires also very specific skills in the linguistic domain This is the approach of ASR toolkits like Kaldi, HTK, Sphinx, Julius, RASR that were crated before E2E solutions where available (Kaldi main contributor, Daniel Povey, now works at Xiomi in China and works on a new E2E ASR engine called K2)
  • 12. 12 New ASR approaches End-to-End Neural Networks (E2E) Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Predict sequence of characters directly from speech using Neural Network and differentiable CTC Loss
  • 13. 13 Advantages of new ASR approach Self-Supervised techniques Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. The idea is to learn Language model directly from Speech: - You need much less annotated data - Less specialized Linguistic skills - No phonetic lexicons
  • 14. 14 Voxist hybrid approach Self-Supervised & Domain Specific Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Lexicon and Language Model created for target domain using client data
  • 15. 15 Voxist Results Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Models WER on 40h (GigaSpeech) Google 18.9 Kaldi 14.9 MS 12.4 Pika 12.3 ESPnet 10.3 WeNet 10.6 Voxist basic 10.2 Voxist hybrid 9.8
  • 16. Voxist tech can also bypass ASR and get Intents directly Self-supervised applied to SLU Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
  • 17. Video to text & knowledge management Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. Current Features ü Video indexing and Semantic search ü Video subtitles Coming soon Ø Audio search without ASR Ø Multimodal Sentiment Analysis Ø Auto translate Products 17
  • 18. What Next ? Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved. A Telco Vocal Assistant ? • ASR + TTS • Conversational Agent • Noise reduction / speech enhancement All in the cloud-native mobile core networks of tomorrow… Products 18