Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

From Speech to
Knowledge
Latest Updates and Experiences in Launching Local Language Tools

Karel Bourgois • 20+ years in Telecom
Who am I ?
• Entrepreneur
• Ecosystem
Le Voice Lab

Voxist voicemail since 2016
Confidential and Proprietary. Copyright (c) by Voxist. All Rights Reserved.
Main Features
ü Custom greetings
ü Speech to text
Products
Clients
ü B2C : 10’s of thousand users with over 5% paying
ü B2B : Consulting firms, Law firms, Entrepreneurs…
3

Telcos Voicemail apps have low ratings

SA1
SA2
SA3
DONNÉES
PRIVATE PUBLIC
AUGMENTED
Corporate Labs
MOTEURS
SERVICES
Unified Voice Related APIs (ASR, TTS, NLP,...)
APIs
Corporate
Labs
€
Corporate/Labs/Gov
€
MARKET PLACE
€ Corporate/Labs/Gov
Vocal Assistants – Emotions – Voice ID – Translation – Subtitles …
Open
Source
Le Voice Lab

APIs in the Cloud & On-premise
Current Features
ü Transcriptions in French & English
ü Punctuation
ü Speaker separation (Diarization)
Coming soon
Ø Spanish, Portuguese, German, Italian
Ø TTS: create your own assistant voices
Ø Real-time translation
Products
Clients
ü French Vocal Assistant manufacturer
ü Le Voice Lab
Distributors
ü OVH
ü Eden.ai
7

Why Now
8

9
Traditional ASR approach
This solution split the optimization of ASR problem into 3 components

10
Traditional ASR approach
This solution split the optimization of ASR problem into 3 components
Acoustic Model: Neural Network
transduce signals frames to sequence of
phonemes (Tri-phones), using EM
techniques + Lattice Free-MMI (Maximum
mutual information)
Phonetic Lexicon: it provides
the decomposition of words into
basic acoustic unit
Language Model: using n-
gram model, estimation of
probabilities based on
frequency

11
Traditional ASR issues
Large Annotated dataset require
Traditional ASR requires annotated data for:
1. Acoustic modeling : large amount of audio with the corresponding texts and even
phonemes
2. Lexicon creation : all the ways of saying the same phonemes / words
Þ This requires also very specific skills in the linguistic domain
This is the approach of ASR toolkits like Kaldi, HTK, Sphinx, Julius, RASR that were
crated before E2E solutions where available
(Kaldi main contributor, Daniel Povey, now works at Xiomi in China and works on a new E2E ASR engine called K2)

12
New ASR approaches
End-to-End Neural Networks (E2E)
Predict sequence of characters directly
from speech using Neural Network and
differentiable CTC Loss

13
Advantages of new ASR approach
Self-Supervised techniques
The idea is to learn Language
model directly from Speech:
- You need much less
annotated data
- Less specialized Linguistic
skills
- No phonetic lexicons

14
Voxist hybrid approach
Self-Supervised & Domain Specific
Lexicon and Language
Model created for target
domain using client data

15
Voxist Results
Models
WER on 40h
(GigaSpeech)
Google 18.9
Kaldi 14.9
MS 12.4
Pika 12.3
ESPnet 10.3
WeNet 10.6
Voxist basic 10.2
Voxist hybrid 9.8

Voxist tech can also bypass ASR and get Intents directly
Self-supervised applied to SLU

Video to text & knowledge management
Current Features
ü Video indexing and Semantic search
ü Video subtitles
Coming soon
Ø Audio search without ASR
Ø Multimodal Sentiment Analysis
Ø Auto translate
Products
17

What Next ?
A Telco Vocal Assistant ?
• ASR + TTS
• Conversational Agent
• Noise reduction / speech enhancement
All in the cloud-native mobile core networks
of tomorrow…
Products
18

Karel, BOURGOIS, Founder
karel@voxist.com
@bourgois

Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois

More Related Content

Similar to Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois (20)

More from Alan Quayle (20)

Recently uploaded (20)

Latest Updates and Experiences in Launching Local Language Tools, Karel Bourgois