SlideShare a Scribd company logo
Cloud Robotics for Building
Conversational Robots
Komei Sugiura
National Institute of Information and Communications Tech., Japan
Beyond the Language Barrier:
NICT’s free software and cloud services
1. Speech to speech translation system: VoiceTra (2010)
>1M downloads.
High performance in translation to/from Asian languages
2. MCML Speech interaction SDK (2013)
The SDK enable the user to build WFST-
based multilingual dialogue systems.
3. Smartphone dialogue apps (2011)
Spoken dialogues and recommendation in tourist
guidance domains
4. Cloud robotics platform rospeex (2013)
40K unique users. Top level quality as dialogue-based TTS
in Japanese.
[New] Automatic captioning SDK for developers
http://www2.nict.go.jp/astrec-ast/mcml-sdk/index_en.html
Free of charge, but authentication required
Video
Motivation:
How can we build communicative robots to help people?
Smartphones and other consumer devices
Speech interfaces give benefit to
consumers
cf. Market size of speech recognition
¥88B@2013→¥170B@2018 (€1.5B)*
Show me today’s
schedule
* Estimation by NEDO, TSC Foresight Vol.8, 2015
Sushi restaurants
around here
Benefit for
QA/search
GPS Contacts Other context
info.
Current communication with robots
Insufficient benefit to consumers
??
??Throw
them away.
Is there any milk
in the fridge?
• Bad recognition accuracy
• User needs to specify [what,
where, how] as well as start/end
conditions
ROSPEEX:
A CLOUD ROBOTICS PLATFORM FOR
MULTILINGUAL SPOKEN DIALOGUES
5
Background: Speech recognition/synthesis is bottleneck
for reducing cost in human-robot interactions
• Synthesized speech sounds
monotonous and unfriendly
• Speech recognition does not work
well than expected
XIMERA 3
(Text-reading)
Voice
talent
Target = Interactions with service robots
Rospeex:
A cloud robotics platform for multilingual spoken dialogues
• >40,000 unique users have used rospeex
• WER =7.9% (accuracy=92.1%) for IWSLT tst2011 (1st Place
Winner in IWSLT12, 13, 14)
• Top-level quality dialogue-oriented TTS
Python & C++ samples
are available
rospeex Search
* Free of charge for research
Rospeex’s positioning in robot dialogue quadrants
8
Cloud APIs
(Google, Microsoft, IBM,
NTT docomo, Wit.ai,…)
Free software
Commercial software
OpenHRI,
PocketSphinx, Festival
Cloud-based
Stand-alone
Robot
middleware-
compatible
Incompatibl
e
Does not work with
very low-spec PCs 
Robotics-specific
logs are lost 
Authentication
Low quality 
Expensive 
8
Distribution of rospeex users
rospeex applications (40k unique users)
Conversational agents in elderly care
facilities, service robots, humanoid,
dialogue agents, speech interface for car
navigation systems or smarthome devices,
…
Analysis: TTS requests depend heavily on individuals
• Question: Do developers use same sentences for TTS? If so, we can
speed up by introducing local cache.
Cache hit
Cache miss
• Analysis on top 88 users
– New requests = 50.4% on average
– An individual uses max. 200 unique sentences
Without a cloud platform, we
cannot conduct large-scale
analysis of robot developers
Introducing cache will
reduce comm. time
MULTIMODAL SPOKEN DIALOGUES
WITH ROBOTS
10
Multimodal language understanding
Kollar+ 2010
HRI 2010 Best Paper
• Input: Text, LRF, Image
• Output: path planning
• E.g. “Go down the hallway”
Iwahashi &
Sugiura+ 2010
• Input: Image and speech
• Output: object manipulation
• E.g. “Place-on Elmo”
Visual QA[2015-] • Input: Image and question
• Output: Answer
• E.g. “How many elephants are there?” -> “2”
Video
LCore: Multimodal Robot Language Acquisition
[Iwahashi, Sugiura, et al 2010]
Key features
• Fully grounded vocabulary
• Imitation learning
• Incremental & interactive learning
• Language independent
• Learning when to ask questions
12
HMM “Place-
on” Place X on Y
Imitation learning for spoken language understanding:
Re-ranking hypotheses using planned trajectories’ likelihood
• Transformation of reference-point-dependent HMMs*
– Input: verb ID, object ID(s)
e.g. <place-on, Object 1, Object 3>
– Transforms HMM from intrinsic coordinate system into world
coordinate system
HMM “Place-on”
World CS
Situation
Place X on Y
* Sugiura et al, IROS 2011 RoboCup Best Paper
HMM-based trajectory generation using dynamic features*
: state sequence
: HMM parameters
: time series of
(position,velocity,acceleration)
Maximum likelihood trajectory
*Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000
: vector of mean vectors
: matrix of covariance
matrices of each OPDF
: matrix of coefficients in
difference approximation
: time series of position
ROBOCUP@HOME
BUILDING DOMESTIC SERVICE ROBOTS
15
RoboCup@Home: Benchmark tests for domestic robots
• RoboCup@Home: The largest competition for domestic robots
– One of the major RoboCup leagues
– Focuses on human-robot interaction and mobile manipulation
– Robots are evaluated by 8 standardized and 3 demonstration tasks
• Scientific challenges
– Navigation in unknown environments (e.g. real shop), handling
everyday objects, spoken dialogues in very noisy environments, …
16
RoboCup@Home Standard Platform Leagues start in 2017
• Many teams need low-cost standardized platforms
• Companies know NAO’s success after selected as soccer-
Standard Platform (Softbank bought Aldebaran @100M USD )
Toyota HSR
• Main use case = partner robot for those who need care
• Lease-based
Softbank Pepper
• Already deployed in restaurants and shops
• Very low price
Both compatible with ROS
CFPs for HSR/Pepper users will be open soon
Summary
• Data-driven approaches
• Multimodal spoken dialogue with robots
• RoboCup and domestic service robots
• …and we’re hiring!

More Related Content

PDF
New challenge in RoboCup 2017 Nagoya: RoboCup@Home Standard Platform
PDF
Cloud Robotics for Human-Robot Dialogues
PDF
AI & robotics: Past, Present and Future
PDF
rospeex: a cloud-based speech communication toolkit for ROS
PPTX
PDF
Tutorial on Text Categorization, EACL, 2003
PDF
Bt35408413
PPTX
Pres Surdophone en
New challenge in RoboCup 2017 Nagoya: RoboCup@Home Standard Platform
Cloud Robotics for Human-Robot Dialogues
AI & robotics: Past, Present and Future
rospeex: a cloud-based speech communication toolkit for ROS
Tutorial on Text Categorization, EACL, 2003
Bt35408413
Pres Surdophone en

Similar to 20161014IROS_WS (20)

PDF
Human-Machine Interface For Presentation Robot
PDF
An ontology-based approach to improve the accessibility of ROS-based robotic ...
PDF
Efficient Intralingual Text To Speech Web Podcasting And Recording
PDF
An Integrated Prototyping Environment For Programmable Automation
PDF
robocity2013-jderobot
PDF
Copy of BITS Robocon Orientation 2022.pdf
PDF
Iitdmj 1
PDF
IRJET- Virtual Vision for Blinds
PDF
Building Robotics Application at Scale using OpenSource from Zero to Hero
PDF
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
PDF
Key Features Of The Pseudo Code
PPT
Robots in Human Environments
PDF
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
PDF
Live, Work, Play with Intelligent Robots
PDF
Session 2.1 ontological representation of the telecom domain for advanced a...
PDF
MR + AI: Machine Learning for Language in HoloLens & VR Apps
PPTX
IT TRENDS AND PERSPECTIVES 2016
PDF
HoloLens.pdf
PPTX
Mobility today & what's next. Application ecosystems.
DOC
Figure 1
Human-Machine Interface For Presentation Robot
An ontology-based approach to improve the accessibility of ROS-based robotic ...
Efficient Intralingual Text To Speech Web Podcasting And Recording
An Integrated Prototyping Environment For Programmable Automation
robocity2013-jderobot
Copy of BITS Robocon Orientation 2022.pdf
Iitdmj 1
IRJET- Virtual Vision for Blinds
Building Robotics Application at Scale using OpenSource from Zero to Hero
SOFIA - Semantic Technologies and Techniques for Interoperable Information in...
Key Features Of The Pseudo Code
Robots in Human Environments
Desarrollo de robots sociales con RoboComp - Dr. Pablo Bustos García de Castro
Live, Work, Play with Intelligent Robots
Session 2.1 ontological representation of the telecom domain for advanced a...
MR + AI: Machine Learning for Language in HoloLens & VR Apps
IT TRENDS AND PERSPECTIVES 2016
HoloLens.pdf
Mobility today & what's next. Application ecosystems.
Figure 1
Ad

More from Komei Sugiura (19)

PDF
ロボティクスにおける言語の利活用
PDF
生活支援ロボットにおける 大規模データ収集に向けて
PDF
生活支援ロボットのマルチモーダル言語理解技術
PDF
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
PDF
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
PDF
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
PDF
言葉や能力の壁を越えるデータ指向知能
PDF
20160907rsj16ロボット聴覚OS
PDF
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
PDF
20160221statistic imitation learning and human-robot communication
PDF
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
PDF
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
PDF
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
PDF
実世界の意味を扱う理論と機械知能の構築
PDF
20151129インテリジェントホームロボティクス研究会
PDF
Japan Robot Week 2014けいはんなロボットフォーラム
PDF
Language acquisition framework for robots: From grounded language acquisition...
PDF
Introduction to RoboCup@Home
PDF
ロボカップ@ホーム入門
ロボティクスにおける言語の利活用
生活支援ロボットにおける 大規模データ収集に向けて
生活支援ロボットのマルチモーダル言語理解技術
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
ロボットの音声コミュニケーション技術:言葉や能力の壁を越えるデータ指向知能に向けて
Spatio-Temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous S...
言葉や能力の壁を越えるデータ指向知能
20160907rsj16ロボット聴覚OS
20160606劣モジュラ性を利用したドローンによるばらまき型センサ配置
20160221statistic imitation learning and human-robot communication
20140513大規模異分野データ横断検索における時空間情報を用いた擬似適合性フィードバック
20150531Deep Recurrent Neural Networkによる環境モニタリングデータの予測
階層型評価構造に基づく観光スポット推薦システムの構築と長期実証実験
実世界の意味を扱う理論と機械知能の構築
20151129インテリジェントホームロボティクス研究会
Japan Robot Week 2014けいはんなロボットフォーラム
Language acquisition framework for robots: From grounded language acquisition...
Introduction to RoboCup@Home
ロボカップ@ホーム入門
Ad

Recently uploaded (20)

PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
1. Introduction to Computer Programming.pptx
PDF
project resource management chapter-09.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Getting started with AI Agents and Multi-Agent Systems
Developing a website for English-speaking practice to English as a foreign la...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
1. Introduction to Computer Programming.pptx
project resource management chapter-09.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
DP Operators-handbook-extract for the Mautical Institute
TLE Review Electricity (Electricity).pptx
Hindi spoken digit analysis for native and non-native speakers
Univ-Connecticut-ChatGPT-Presentaion.pdf
NewMind AI Weekly Chronicles - August'25-Week II
1 - Historical Antecedents, Social Consideration.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Zenith AI: Advanced Artificial Intelligence
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A novel scalable deep ensemble learning framework for big data classification...
Assigned Numbers - 2025 - Bluetooth® Document
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Enhancing emotion recognition model for a student engagement use case through...
Getting started with AI Agents and Multi-Agent Systems

20161014IROS_WS

  • 1. Cloud Robotics for Building Conversational Robots Komei Sugiura National Institute of Information and Communications Tech., Japan
  • 2. Beyond the Language Barrier: NICT’s free software and cloud services 1. Speech to speech translation system: VoiceTra (2010) >1M downloads. High performance in translation to/from Asian languages 2. MCML Speech interaction SDK (2013) The SDK enable the user to build WFST- based multilingual dialogue systems. 3. Smartphone dialogue apps (2011) Spoken dialogues and recommendation in tourist guidance domains 4. Cloud robotics platform rospeex (2013) 40K unique users. Top level quality as dialogue-based TTS in Japanese.
  • 3. [New] Automatic captioning SDK for developers http://www2.nict.go.jp/astrec-ast/mcml-sdk/index_en.html Free of charge, but authentication required Video
  • 4. Motivation: How can we build communicative robots to help people? Smartphones and other consumer devices Speech interfaces give benefit to consumers cf. Market size of speech recognition ¥88B@2013→¥170B@2018 (€1.5B)* Show me today’s schedule * Estimation by NEDO, TSC Foresight Vol.8, 2015 Sushi restaurants around here Benefit for QA/search GPS Contacts Other context info. Current communication with robots Insufficient benefit to consumers ?? ??Throw them away. Is there any milk in the fridge? • Bad recognition accuracy • User needs to specify [what, where, how] as well as start/end conditions
  • 5. ROSPEEX: A CLOUD ROBOTICS PLATFORM FOR MULTILINGUAL SPOKEN DIALOGUES 5
  • 6. Background: Speech recognition/synthesis is bottleneck for reducing cost in human-robot interactions • Synthesized speech sounds monotonous and unfriendly • Speech recognition does not work well than expected XIMERA 3 (Text-reading) Voice talent Target = Interactions with service robots
  • 7. Rospeex: A cloud robotics platform for multilingual spoken dialogues • >40,000 unique users have used rospeex • WER =7.9% (accuracy=92.1%) for IWSLT tst2011 (1st Place Winner in IWSLT12, 13, 14) • Top-level quality dialogue-oriented TTS Python & C++ samples are available rospeex Search * Free of charge for research
  • 8. Rospeex’s positioning in robot dialogue quadrants 8 Cloud APIs (Google, Microsoft, IBM, NTT docomo, Wit.ai,…) Free software Commercial software OpenHRI, PocketSphinx, Festival Cloud-based Stand-alone Robot middleware- compatible Incompatibl e Does not work with very low-spec PCs  Robotics-specific logs are lost  Authentication Low quality  Expensive  8 Distribution of rospeex users rospeex applications (40k unique users) Conversational agents in elderly care facilities, service robots, humanoid, dialogue agents, speech interface for car navigation systems or smarthome devices, …
  • 9. Analysis: TTS requests depend heavily on individuals • Question: Do developers use same sentences for TTS? If so, we can speed up by introducing local cache. Cache hit Cache miss • Analysis on top 88 users – New requests = 50.4% on average – An individual uses max. 200 unique sentences Without a cloud platform, we cannot conduct large-scale analysis of robot developers Introducing cache will reduce comm. time
  • 11. Multimodal language understanding Kollar+ 2010 HRI 2010 Best Paper • Input: Text, LRF, Image • Output: path planning • E.g. “Go down the hallway” Iwahashi & Sugiura+ 2010 • Input: Image and speech • Output: object manipulation • E.g. “Place-on Elmo” Visual QA[2015-] • Input: Image and question • Output: Answer • E.g. “How many elephants are there?” -> “2” Video
  • 12. LCore: Multimodal Robot Language Acquisition [Iwahashi, Sugiura, et al 2010] Key features • Fully grounded vocabulary • Imitation learning • Incremental & interactive learning • Language independent • Learning when to ask questions 12
  • 13. HMM “Place- on” Place X on Y Imitation learning for spoken language understanding: Re-ranking hypotheses using planned trajectories’ likelihood • Transformation of reference-point-dependent HMMs* – Input: verb ID, object ID(s) e.g. <place-on, Object 1, Object 3> – Transforms HMM from intrinsic coordinate system into world coordinate system HMM “Place-on” World CS Situation Place X on Y * Sugiura et al, IROS 2011 RoboCup Best Paper
  • 14. HMM-based trajectory generation using dynamic features* : state sequence : HMM parameters : time series of (position,velocity,acceleration) Maximum likelihood trajectory *Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000 : vector of mean vectors : matrix of covariance matrices of each OPDF : matrix of coefficients in difference approximation : time series of position
  • 16. RoboCup@Home: Benchmark tests for domestic robots • RoboCup@Home: The largest competition for domestic robots – One of the major RoboCup leagues – Focuses on human-robot interaction and mobile manipulation – Robots are evaluated by 8 standardized and 3 demonstration tasks • Scientific challenges – Navigation in unknown environments (e.g. real shop), handling everyday objects, spoken dialogues in very noisy environments, … 16
  • 17. RoboCup@Home Standard Platform Leagues start in 2017 • Many teams need low-cost standardized platforms • Companies know NAO’s success after selected as soccer- Standard Platform (Softbank bought Aldebaran @100M USD ) Toyota HSR • Main use case = partner robot for those who need care • Lease-based Softbank Pepper • Already deployed in restaurants and shops • Very low price Both compatible with ROS CFPs for HSR/Pepper users will be open soon
  • 18. Summary • Data-driven approaches • Multimodal spoken dialogue with robots • RoboCup and domestic service robots • …and we’re hiring!