AI in RTC - RTC Korea 2018

cwh.consulting
Artificial Intelligence in
Real Time Communications
(AI in RTC)
RTC Korea
1 November 2018

cwh.consulting
A blog for WebRTC developers
webrtcHacks.com
@webrtcHacks
AI & RTC blog
cogint.ai
@cogintai
WebRTC and ML for Developer Event
November 16, 2018 in San Francisco
krankygeek.com
About Me
Chad Hart
Analyst & Product Consultant
https://cwh.consulting
@chadwallacehart
chad@cwh.consulting

cwh.consulting
AI in RTC Research Study
• Authors
• Chad Hart – cwh.consulting
• Tsahi Levent-Levi - BlogGeek.me
• Methodology
• 40+ 1-on-1 vendor interviews
• ~100 respondent web survey
• Analysis of 126 companies & all major
products
• Output: 147-page report

cwh.consulting
+ =
Image source:
pixabay.com/en/a-i-ai-anatomy-2729782
What is AI in RTC?
RTC

cwh.consulting
AI in RTC use case categories
speech analytics
voicebots
RTC optimization
computer vision
Image source:
pixabay.com/en/a-i-ai-anatomy-2729782

cwh.consulting
• Call center agent
monitoring
• Transcription
• Translation
• Agent coaching
• Customer engagement
Speech Analytics

cwh.consulting
Promise:
machine transcription at human levels
Source: Google I/O 2017 keynote

cwh.consulting
Reality:
transcription quality is often not so great
My name is a chat heart of you might be
familiar with Dave from a brand or if you
are, a web or to see people I've done
about five years, I'm or so a of an
independent analyst. So I'm mostly do
park management strategy type. For a
product, marketing.
My name is Chad Hart. You might be
familiar with me from a brand -- if you are
WebRTC people; I've done webrtcHacks
now for about five years or so. Outside of
webrtcHacks, I have been an independent
analyst. I mostly do product management
and strategy type work and product
marketing.
Machine Transcription Actual Transcription
https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful

cwh.consulting
My name is Chad Hart. You might be
familiar with me from a brand -- if you are
WebRTC people; I've done webrtcHacks
now for about five years or so. Outside of
webrtcHacks, I have been an independent
analyst. I mostly do product management
and strategy type work and product
marketing.
Reality:
transcription quality is often not so great
My name is a chat heart of you might be
familiar with Dave from a brand or if you
are, a web or to see people I've done
about five years, I'm or so a of an
independent analyst. So I'm mostly do
park management strategy type. For a
product, marketing.
Machine Transcription Actual Transcription
Non-standard
spelling
Industry
Jargon
Speech
disfluencies
US-English
language
assumption
https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful

cwh.consulting
Higher-level speech analytics
• Perfect transcription is not needed to
provide useful analysis.
• Higher-level speech analytics systems look
for patterns in speech.
• These patterns can be matched to
business outcomes, such as did a caller
end up purchasing or did they give a good
customer satisfaction score.
• There are often meaningful patterns
beyond the words that were spoken – like
how fast each party was speaking, or how
often the agent talked compared to the
customer.
• There is also a lot of work going into
looking at caller emotion and sentiment.
Source: CallMiner

cwh.consulting
• IVR replacement
• Starting meetings
• In-call assistance
Voicebots – Smart Speakers & Assistants

cwh.consulting
• Another area we examined was voice bots.
• These are smart speakers like the google home which was recently made available in
South Korea and AI assistants like Bixby or Siri.
• Building a voicebot is complex. You not only need to transcribe the speech and run
some natural language understanding on it like in speech analytics, but you need to
also generate speech and deal with interactivity with the customer in real time.
• There is very broad interest in using these voicebots
• Every telephony device maker is interested in adding a voice user interface to their
products – and this is a natural fit since people “talk” to these devices already.
• Typical conference room equipment is already setup to capture good quality audio
with minimal noise from a variety of locations throughout the room with microphone
arrays
• However, most companies are just starting to figure out how to use them in their
products.
Voicebots – Smart Speakers & Assistants

cwh.consulting
Flattening the IVR:
humans don’t speak in menus
https://cogint.ai/dialogflow-phone-bot/
Menu
DTMF
Menu
DTMF
Response Response Menu
DTMF
Response Response Response
Menu
DTMF
Response Response Response Menu
DTMF
Response Response
Utterance
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Traditional IVR Menu Voicebot
time
10 potential responses in an IVR menu hierarchy vs. a voicebot

cwh.consulting
Flattening the IVR:
humans don’t speak in menus
• One major area where voicebots will have an impact is in IVRs.
• Traditional IVRs were designed for DTMF input and are usually setup with multiple
levels of menus.
• Because people cannot remember more than a few menu options at a time, you
cannot put too many options in each menu.
• As a result, to fit many options, you need to have a complex menu with many
layers.
• Users hate this because they are difficult to navigate and takes too long.
• Voicebots help to flatten the IVR into a just a few layers.
• Rather than navigating a complex menu, user can just say what they want and use
natural language to get the information they need.
• This is good for call centers too because users are more likely to stay in the IVR
instead of immediately dropping out to an operator.
https://cogint.ai/dialogflow-phone-bot/

cwh.consulting
New voicebots: consumer ⇨ businessNotable Consumer Voicebot Market Milestones
krankygeek.com/research
KRANKY GEEK RESEARCH
Notable voicebot milestones

cwh.consulting
New voicebot technology threatens IVRs
Time
Abilitytooffloadhumantasks
today

cwh.consulting
• Funny hats
• Face detection
• Gestures
• Object detection
• Emotion analysis
Computer vision

cwh.consulting
Object detection over WebRTC with TensorFlow
Blog post:
https://webrtchacks.com/webrtc-cv-tensorflow/
Demo video: https://youtu.be/vzTXW0hGINM
• Using open source libraries and existing work,
without having a PhD in computer vision it is
relatively simple to setup your own server
and process real time video.
• Here is an example of a server I setup to do
real time analysis of a WebRTC stream.

cwh.consulting
Object detection over WebRTC with TensorFlow – example
architecture
https://webrtchacks.com/webrtc-cv-tensorflow/
TensorFlow
Object
Detection
Flask
Server Browser
local.js
index.html
objDetect.js
POST with image
object details
web assets
GET web assets
• This is just a very basic example that uses an
HTTP post to send several images per
second to a cloud-based server for
processing.
• As you saw in the video, there can be a little
bit of lag.
• Using a GPU-accelerated server or even
something like Google’s TPU that were
specifically designed to accelerate heavy
machine learning graphs would have helped
• But ultimately streaming a high-quality
image can always have its limits.
• Wouldn’t it be nice if you do the heavy
processing locally with hardware
acceleration, just like you can hardware
accelerate codecs like H.264?

cwh.consulting
ML processing moving to the edge,
with faster, local processing
• That’s exactly what you can do with some new chipsets from vendors like
Intel.
• This is an example of a kit from Google called the AIY Vision Kit that
includes the Intel Movidius processor.
• The Movidius is designed to run deep neural networks locally and is
especially well-suited to low-power computer vision applications.
• This kits runs on a tiny, single core Raspberry Pi 0 with only 512MB of RAM.
• Google used to sell just the vision bonnet add-on part of the chip for $45.
Now you can buy the complete kit with the Raspberry Pi for $90 in the US.
• Note that Amazon also has a computer vision kit it calls Deep Lense. That
runs on something more like an Intel NUC mini-PC and costs $250.

cwh.consulting
ML processing moving to the edge,
with faster, local processing
https://webrtchacks.com/aiy-vision-kit-uv4l-web-server/

cwh.consulting
Improvements with edge hardware (demonstration)
• Let’s look at this in action
• This all runs locally on the Pi.
• So in this case, I am doing the computer
vision process locally while sending the
stream and annotation remotely
Blog post:
https://webrtchacks.com/aiy-vision-kit-uv4l-
web-server
Video:
https://youtu.be/h0O18R1rI9U

cwh.consulting
Fun use cases with native mobile libraries
• With new native mobile libraries like
Apple’s CoreML and Google’s ML Kit, it
is relatively simple.
• Some of the engineers at Houseparty
wrote a blog post demonstrating how
to do smile detection
• Similar libraries are available that
detect facial boundaries and let you
put hats, sunglasses, beards, and other
silly masks on people – I am sure you
have seen some of these!
• Similar techniques can be used in a
business context to blur out
backgrounds for remote workers who
call into a video conference.
https://webrtchacks.com/ml-kit-smile-detection/

cwh.consulting
MLKit CPU consumption: high framerates are not practical (without
special hardware)
CPU Usage for different framerates processed by ML Kit
CPUUsage%

cwh.consulting
Resource consumption
MLKit is small compared to WebRTC

cwh.consulting
WebRTC CV is coming to the browser
https://w3c.github.io/webrtc-nv-use-cases/#funnyhats*
This is from a W3C document examining use cases for the next version of WebRTC

cwh.consulting
RTC optimization
• Noise suppression
• Echo cancellation
• Error correction
• Route optimization

cwh.consulting
Mozilla RNNoise – real time, low-power noise suppression with
deep learning
• One example is a research project
from Mozilla that uses Deep Learning
to provide better real-time noise
suppression.
• This is designed for lower power
devices and does not require any
specialized hardware.
• We do not have time now, but you can
go to that link and try some demos.
• Unfortunately this was just a research
project, but it gives you some idea of
what could be done in this and other
areas.
https://people.xiph.org/~jm/demo/rnnoise/

cwh.consulting
Special discount
for RTC Korea
Use code RTC-KOREA
until November 7
for $1000.00 off
krankygeek.com/research
or email me
purchase at

AI in RTC - RTC Korea 2018

More Related Content

What's hot (20)

Similar to AI in RTC - RTC Korea 2018 (20)

More from Chad Hart (11)

Recently uploaded (20)

AI in RTC - RTC Korea 2018

Editor's Notes