SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3033
Language-Based Actions for Self-Driving Robot
Dhruv Solanki1, Mohit Tiwari2, Aditya Vartak3, Saurabh Wanivadekar4, Prof. Naina Kaushik5
1,2,3,4Student, Dept. Of Computer Engineering, Rajiv Gandhi Institute of Technology, Mumbai, Maharashtra, India
5Professor, Dept. Of Computer Engineering, Rajiv Gandhi Institute of Technology, Mumbai, Maharashtra, India
Key Words: Artificial Intelligence, Natural Language
Human-Robot interaction, Robot Simulation, Sequence
Modeling, Path finding.
1.INTRODUCTION
An autonomous robot which act and behave with higher
degree of independence. In today’s changing world it is
necessary for an autonomous robot to work on the
Natural Language commands of Humans. Autonomous
robots are desirable in fields such as space travel,
household work like cleaning, waste water treatment and
delivering goods and services. A fully autonomous robot
can gain information about environment, work for
extended period without human intervention, move
either all or part of itself throughout its operating
environment without human assistance, avoid situations
that are harmful to people, property, or itself unless those
are part of its design specification, an autonomous robot
may also learn or gain new knowledge like adjusting for
new methods of accomplishing its tasks or adapting to
changing surroundings. Barrett et al [1] proposed a
solution for robot navigation using Natural Language and
our work is inspired from them. We are going to simulate
the framework using Unity Game Engine [2].
Data flow diagram of the figure is given in Fig-1. In this
framework the meaning of a sentence that describe path
driven by a robot through an environment containing
number of objects. An example of such sentence is “Go to
the fire which is behind the box and in front of the stool
then go towards the table”. Such sentences are sequence
of description in terms of objects in the environment.
Nouns in the descriptions indicate the class of objects
involved, such as box, stool, table and fire. Prepositions,
such as “in front of” will describe the changing position of
robot in time(e.g. in front of the stool),as well as to
describe the relative positions of objects in the
environment (e.g. the fire which is behind the box).We use
Spacy[3] to process and apply Natural Language
processing over the sentences to extract nouns and
preposition phrases and their target and referent nouns.
Sequence Modeling uses large sets of hidden Markov
models (HMMs), one for each path sentence pair in
training. During training, robot learns meanings of words
and phrases. These learned meanings are used in
Cognizance phase where the robot drives according to the
input sentences. The Implementation details and Results
are explained in the following sections.
Fig-1: Data Flow diagram
2. CREATING VIRTUAL ENVIRONMENT AND DATA
ENGINEERING
2.1 Need for Virtual Environment
Before any system is to be implemented in hardware, it is
always a best practice to simulate the system in possible
environments to gauge the readiness of deployment in
hardware. It is also a safe way of testing the performance
and is economical for the organizations on possible failed
projects. This is widely followed particularly in the field of
Robotics. Hence, we decided to create a virtual
environment and robot simulator initially. We chose Unity
Game Engine for this task because it is flexible enough for
creating virtual environments, collecting data and also
supports navigation tools like NavMesh for A* path
finding.
2.2 Environment Creation
We created the environment in Unity consisting of a floor
or terrain and then placing prefabs of different objects
like cone, stool, table etc. on the floor. The Prefab Asset is
a template from which you can create new Prefab objects
in the Unity Scene. Then we randomized the initializing
positions. We then created a robot in the environment
----------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - We present a framework to automatically learn
to drive a mobile robot under natural-language command.
This framework contains Sequence Modeling (to learn the
meaning of sentence and based on sentence try to locate
the nouns in environment and identify the associated
prepositions) and Cognizance (to generate path and move
on that path to accomplish target navigational goal along
with an activity in the scenario. This activity can be
extinguishing fire, moving objects in the environment etc.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3034
with a view from its view port. Thus, we created an
environment with proper objects placed randomly on the
floor and a robot which can move around the
environment using controls on the host.
2.3 Data Collection
We also set up scripts to make sure raw data i.e.
Floorplan, Path and Sentences are to be collected for
every run of the environment. The coordinates of objects
along with label is collected as floorplan data. The
sentences entered in input box is collected as sentences
data. We would also move the robot in Unity environment
according to entered sentence to capture the path of the
robot and thus collect the path data too. We collect total of
3 sentences per path driven by robot. All the three data
files of floorplan, sentence and path is passed to server
over a network for data preprocessing. The server would
have all the framework required for the robot navigation.
2.4 Data Preprocessing
After we obtain raw data from the simulated environment
on Unity Engine, we need to make this data suitable to be
used and processed for training of the robot. Hence, we
do the following steps for the 3 different raw data –
sentences, floorplan and path. For Path Data, the
collection of path coordinates needs to be preprocessed to
remove repeated coordinates. Coordinates may get
repeated in consecutive occurrences if robot hasn’t moved
in the frame of time. This allows us to reduce redundancy
of data. For Floorplan, since the floorplan is the
distribution of objects in the environment, each floorplan
object is captured as a tuple of its location in x-y plane
and its class i.e. (x, y, class). Thus, representation of each
object becomes understandable.
For Sentences, they are captured from the Input box. We
need to be able to extract meaningful sentences from the
data and discard meaningless sentences like “This is
robot” as they are of no use to the system as they won’t
contribute to goals of the system. This leaves us with a
few types of sentences. The above process could have
been done with techniques for sentence classification by
Machine Learning Approach. However, this approach is
Data-Intensive and requires high Computation cost.
Hence, we chose to go with Rule-based Classification.
Algorithm 2 is the Rule Based Classification Algorithm.
This is because, we identified that the types of sentences
for a particular goal are not vast and mostly have
structures which can be parsed by grammars. Hence, a
grammar can be generated for types of sentences to be
encountered for the goal. Some Examples of sentence
structures and its corresponding sentences are given in
the Table -1.
Table-1: Examples of Sentence Structures found in use of
framework to extinguish Fire
Sr. No Sentence Structure Example of Sentence
1)
VERB DET NOUN WH-
determiner VERB ADP
DET NOUN ADP DET
NOUN …
“Extinguish the fire
which is to the left of the
chair”
2)
VERB ADP DET NOUN
WH-determiner VERB
ADP DET NOUN ADP
DET NOUN...
“Go to the fire which is to
the left of the stool”
3. SEQUENCE MODELING
We apply Hidden Markov Models (HMM) for each
temporal segment and the state in the HMMs are the
temporal segments of sentence and the observation of
each hidden state is sequence of waypoints. A transition
matrix and emission matrix are generated for each state.
The transition matrix and emission matrix are passed to
forward-backward algorithm to compute the probability
of HMM being in each state at each point in path. It also
estimates how true the entire sentence is of the path.
3.1 Creating Graphical Model
Once we have broken down the sentence into temporal
segments, we need to model them in mathematical form
for processing the data. Hence, we convert every segment
of sentence into a Probabilistic Graphical Model [4]. Fig-2
shows the graphical model. Then, a we create a floorplan
variable (O1–On) for each noun in each segment. Then we
apply the label distribution of noun to the variable’s set of
labels. Finally, we need to use the subject and object of
preposition in the sentence, and find each preposition
distributions over relative positions and velocities are
applied between its the subject and the object. The
subject noun act as target object and object noun acts as
referent object for the preposition.
Every Probabilistic Graphical Model starts with a root
state Pi followed by a preposition or list of prepositions as
children. Each preposition is followed by a noun which is
its object and preceded by a noun which is its subject for
e.g. “fire which is to the left of table” where subject of left
is fire and object is table. Such a preposition is having
adjectival use in the sentence. In case of preposition
phrases like “towards the bag”, there is no need to have a
subject noun for it. Just the object would suffice. Such
prepositions are called adverbial usage of preposition.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3035
Fig-2: Graphical models created for Sentence. Green box
denotes Prepositions and Blue box denotes nouns in a
sentence.
For each of the floorplan variable O1 to On, a class label is
assigned to each floorplan variable. The probability
distribution of noun I for which the floorplan variable that
was created gives the score(wij) of the label l assigned to
the variable. We create a dictionary of words which stores
the meanings of words in mathematical form. For nouns,
this information is represented in dictionary of words as a
discrete probability distribution over class labels.
Now, for the presentation of the prepositions and
prepositional phrases, we use location (denoted by μ) and
concentration (denoted by κ) parameters for two
independent Von Mises distributions [5]. The Von Mises
distribution for angular distribution α is given as:
𝑣(𝜇, 𝜅) =
𝑒 𝜅𝑐𝑜𝑠( 𝛼−𝜇)
2𝜋𝐼0(𝜅)
(1)
When the ith preposition in the lexicon is applied between
two variables, whose physical relationship is specified by
the position angle θ and velocity angle γ between them,
its score zi is given by
(2)
Conditional probability of graphical model will be
calculated using path variable 𝑝 (𝑝 𝑥, 𝑝 𝑦, 𝑝 𝑣 𝑥
, 𝑝 𝑣 𝑦
)of the
model on the K floor plan objects (O1...Ok) and m
assignments of labels to N floor plan variables. In
mapping mn, every element of mapping is the index of
floorplan object mapped to floorplan variable n. This
helps in deciding which objects in floor plan represent
which noun in sentence.
Let α be*𝑝, 𝑜 𝑚1
, . . . , 𝑜 𝑚 𝑛
+a set of path variable and
mappings of floor plan objects to variables and 𝑏 𝑐1
, 𝑏 𝑐2
be
indices of target and referent in a. Position of target is
found by coordinates (𝑎 𝑏𝑐1
𝑥
𝑎 𝑏𝑐1
𝑦
) and referent
by(𝑎 𝑏𝑐2
𝑥
𝑎 𝑏𝑐2
𝑦
). Similarly, velocity of target can be found
by(𝑎 𝑏𝑐2
𝑥
𝑎 𝑏𝑐2
𝑦
). Hence, position angle θ and velocity angle
γ is
𝜃 = 𝑡𝑎𝑛−1
(
𝑎 𝑏𝑐1
𝑦
− 𝑎 𝑏𝑐2
𝑦
𝑎 𝑏𝑐1
𝑥
− 𝑎 𝑏𝑐2
𝑥 )
𝛾 = 𝑡𝑎𝑛−1
(
𝑣 𝑦
𝑣𝑥
) − 𝑡𝑎𝑛−1
(
𝑎 𝑏𝑐2
𝑦
− 𝑎 𝑏𝑐1
𝑦
𝑎 𝑏𝑐2
𝑥
− 𝑎 𝑏𝑐1
𝑥 )
Now we calculate the segment conditional probability
given mapping m, floorplan f, and dictionary Λ using the
formula
𝜓(𝑝, 𝑚, 𝑓, 𝛬) = ∏ 𝑧 𝑑𝑐
𝐶
𝑐=1
(𝜃𝑐, 𝛾𝑐) ∏ 𝑤 𝑒 𝑛 𝑙 𝑚𝑛
𝑁
𝑛=1
(3)
3.2 Training HMMs
Now, for each path-sentence pair in training data, we
generate Hidden Markov Model (HMM). The states of each
such HMM consists of probabilistic graphical model of
each segment of sentence in proper alignment. Thus,
number of states in HMM is equal to number of temporal
segments. The HMMs infers the alignment between the
densely sampled points in each path and the sequence of
temporal segments in its corresponding sentence. The
Output model for temporal segment t is given by
𝑅𝑡(𝑝𝑡, 𝑓, 𝛬) = ∑ 𝜓 𝑡
𝑚
(𝑝𝑡|𝑚, 𝑓, 𝛬) (4)
Each HMM has its own transition and emission matrices
(a and b respectively). Transition matrix in this problem is
only allowed to self-loop or transition to next temporal
segment state and must start at first segment and ends at
last. HMM computes a score at every state to measure the
degree of truthfulness of segment. The Expectation
Maximization [6] Algorithm helps to find Log Likelihood
of HMM being in each state. Once we have the likelihood
and the transition model, we use it in Baum-Welch
(Forward-Backward) Algorithm which is given as
Algorithm 1.
Before learning the word meanings, all preposition and
noun distributions are randomly initialized. During
learning, the model is updated iteratively to increase the
overall HMM likelihood as a product over all training
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3036
samples. At each iteration, this concentrates the
Probability Mass Distribution of each HMM state’s
preposition distributions at angles seen at regions of the
path during which that state is of high probability. It also
concentrates the probability mass of the object label
distributions in containers associated with the mappings
corresponding to high HMM likelihoods. Thus, the
parameters of nouns and prepositions in the dictionary is
updated accordingly. Fig-3 illustrates how HMM is
created.
Fig-3 Creating HMM from a graphical model.
Algorithm 1: Baum-Welch (Forward-Backward)
Algorithm
Algorithm 2: Rule based Classification
4. COGNIZANCE AND PATHFINDING
We will now use a set of waypoints generated from
sequence modelling for generating path for robot
locomotion. This phase determines the sparse sequence
of waypoints which satisfy the sentence.
4.1 Gradient Ascent
Gradient ascent algorithm is used to maximize the
likelihood function. By maximizing, we try to find out the
local maximum for the function. The likelihood function
we try to maximize is Equation (4). We apply gradient
ascent on score function of each temporal segment it to
obtain small set of waypoints for each temporal segment.
Thus, for every sentence we get sparse set of waypoints
that satisfy the semantics of sentence.
4.2 Collision Awareness
The sparse set of waypoints which are generated by
gradient ascent does not consider the fact that by
following the waypoints, the robot can collide into an
obstacle or other floorplan object. This is very important
to be considered as obstacle avoidance should be a task
that has to be done by the robot to function in real-world
environments. Hence, we introduce barrier radius around
each object in floorplan. An alternative to this is Barrier
Penalty as suggested in [1]. A barrier penalty B(r) is
added between each pair of a waypoint and floorplan
object as well as between pairs of temporally adjacent
waypoints to prevent them from becoming too close.
𝐵(𝑟) = 𝑆𝑀𝑂𝑂𝑇𝐻𝑀𝐴𝑋 (1,1 +
2𝑟1 + 2𝑟2
𝑟
) (5)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3037
Where r is the distance either a waypoint and an object or
between two waypoints and where r1 and r2 are the radii
of the two things being kept apart, either the robot or an
object. This barrier is approximately 1 until the distance
between the two waypoints become small, at which point
it decreases rapidly, pushing them away from each other
by approximately the robot radius.
4.3 Path Finding
Path finding algorithms are the algorithms that find the
least cost route to target from given position in the
environment. Thus, optimized path is generated if start
and end points are given. Some of the commonly used
Path-Finding Algorithm is A* Algorithm. The sparse set of
waypoints is passed to path finding algorithm which
generate an optimized path between initial position to
target position.
Unity provides us with the tool to create a Navigation
Mesh on the floor or terrain of the environment. NavMesh
is a data structure which helps robotic agents for
pathfinding through complicated areas, turns etc. It
defines what part of environment is traversable and non-
traversable. Using NavMesh we can set the target and the
waypoints obtained from gradient ascent as the points
through which robot agent should definitely pass through.
The NavMesh Implementation is shown in Fig-4. Every
Floorplan object is considered an obstacle with a fixed
radius around it not a part of NavMesh hence creating
barrier for collisions.
Fig 2.2 NavMesh in Unity Environment (Blue) and barrier
radii for objects (Black)
Thus, Path is created from sparse set of waypoints and
robot is able to understand the sentence and move
accordingly in the environment.
5. FUTURE WORK
The framework we have discussed certainly is not
complete in itself. The framework deals only with nouns
and prepositions in the sentences but fails to recognize
the role of verb in the sentence. This framework can be
made generalizable framework for any robotic task by
natural language if we are able to model and learn
meanings of verbs, adverbs etc. in context of robot. If
robot can understand what the verb in the sentence
means then any task can be understood and done by same
robot. Similarly, in Sequence Modelling, we could have
used Temporal Convolutional Networks (TCN) [7] as it is
better than Recurrent Neural Network with LSTM and can
replace HMM for sequence modelling task. Given a
sentence as an input, this would generate corresponding
waypoints for each segment. Adding speech recognition
engine would make this system more user-friendly and
connected. However, these tasks are not currently in the
scope of the paper.
6. CONCLUSION
We have understood how the framework would enable
the robot to do tasks based on natural language
understanding. Data collection is done for collecting raw
data followed by preprocessing on the data. The robot will
try to identify the words in sentences and find out nouns
and prepositions, following which it is grounded by use of
mathematical probability distributions and Hidden
Markov Model for sequence. Now optimization by
gradient ascent for the likelihood function is done for
each temporal segment to arrive at sparse set of
waypoints. These sparse set of waypoints are then given
for calculating path through all these waypoints without
colliding with obstacles and other objects. Hence the
robot first learns the dictionary of word meanings and
then use it to understand the natural language sentence
and move accordingly.
7. REFERENCES
[1] Paul Barrett, Daniel & Alan Bronikowski, Scott &
Yu, Haonan & Siskind, Jeffrey. (2017). Driving Under the
Influence (of Language). IEEE Transactions on Neural
Networks and Learning Systems. PP. 1-16.
10.1109/TNNLS.2017.2693278.
[2] Unity Game Engine: https://unity3d.com/unity
[3] “Industrial-Strength Natural Language
Processing”: https://spacy.io/
[4] Koller, Daphne and Nir Friedman. “Probabilistic
Graphical Models - Principles and Techniques.” (2009).
[5] M. Abramowitz and I. A. Stegun, Handbook of
Mathematical Functions. New York, NY, USA: Dover, 1972
[6] Dempster,A.P. ; Liard,N.M.; Rubin,D.B.(1977)
Maximum Likelihood from Incomplete Data via the EM
Algorithm.
[7] Bai, Shaojie & Zico Kolter, J & Koltun, Vladlen.
(2018). An Empirical Evaluation of Generic Convolutional
and Recurrent Networks for Sequence Modelling.

More Related Content

PDF
Tool Use Learning for a Real Robot
PPTX
Machine learning
PDF
IRJET- Automatic Text Summarization using Text Rank
PPTX
Machine learning introduction
PDF
Data Science - Part XVII - Deep Learning & Image Processing
PPT
11 syllabus
PDF
Machine Learning
PPTX
Introduction to Machine Learning
Tool Use Learning for a Real Robot
Machine learning
IRJET- Automatic Text Summarization using Text Rank
Machine learning introduction
Data Science - Part XVII - Deep Learning & Image Processing
11 syllabus
Machine Learning
Introduction to Machine Learning

What's hot (10)

PDF
Report finger print
PPTX
Introduction to machine learning
PPTX
1 cs xii_python_functions_introduction _types of func
PPTX
Introduction to Machine Learning
PDF
Huffman Text Compression Technique
PPTX
Machine Learning: A Fast Review
PPTX
Introduction to Machine Learning
PDF
International Journal of Computational Engineering Research(IJCER)
PPTX
Introduction to Machine Learning
PDF
Python interview questions and answers
Report finger print
Introduction to machine learning
1 cs xii_python_functions_introduction _types of func
Introduction to Machine Learning
Huffman Text Compression Technique
Machine Learning: A Fast Review
Introduction to Machine Learning
International Journal of Computational Engineering Research(IJCER)
Introduction to Machine Learning
Python interview questions and answers
Ad

Similar to Language-Based Actions for Self-Driving Robot (20)

PDF
Scene Description From Images To Sentences
PDF
International Journal of Advance Robotics & Expert Systems (JARES)
DOC
Advance data structure
PDF
Java Programming Paradigms Chapter 1
PDF
Java Progamming Paradigms, OOPS Concept, Introduction to Java, Structure of J...
PPT
Introduction to programming languages part 2
PDF
IRJET - Deep Learning based Chatbot
PDF
A novel approach on a robot for the blind people which can trained and operat...
DOCX
Ooad notes
PDF
Learning of robots by using & sharing the cloud computing techniques
PPTX
2 oop
PDF
IRJET- QUEZARD : Question Wizard using Machine Learning and Artificial Intell...
PDF
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
PDF
CONCEPTUAL SIMILARITY MEASUREMENT ALGORITHM FOR DOMAIN SPECIFIC ONTOLOGY
PDF
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
PDF
IRJET- Factoid Question and Answering System
PPTX
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
PDF
IRJET- Semantics based Document Clustering
PPTX
Networking chapter jkl; dfghyubLec 1.pptx
PDF
Partial Object Detection in Inclined Weather Conditions
Scene Description From Images To Sentences
International Journal of Advance Robotics & Expert Systems (JARES)
Advance data structure
Java Programming Paradigms Chapter 1
Java Progamming Paradigms, OOPS Concept, Introduction to Java, Structure of J...
Introduction to programming languages part 2
IRJET - Deep Learning based Chatbot
A novel approach on a robot for the blind people which can trained and operat...
Ooad notes
Learning of robots by using & sharing the cloud computing techniques
2 oop
IRJET- QUEZARD : Question Wizard using Machine Learning and Artificial Intell...
FLOWER VOICE: VIRTUAL ASSISTANT FOR OPEN DATA
CONCEPTUAL SIMILARITY MEASUREMENT ALGORITHM FOR DOMAIN SPECIFIC ONTOLOGY
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
IRJET- Factoid Question and Answering System
[IROS2017] Online Spatial Concept and Lexical Acquisition with Simultaneous L...
IRJET- Semantics based Document Clustering
Networking chapter jkl; dfghyubLec 1.pptx
Partial Object Detection in Inclined Weather Conditions
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Geodesy 1.pptx...............................................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Construction Project Organization Group 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT
Project quality management in manufacturing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Geodesy 1.pptx...............................................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Lecture Notes Electrical Wiring System Components
Construction Project Organization Group 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Project quality management in manufacturing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...

Language-Based Actions for Self-Driving Robot

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3033 Language-Based Actions for Self-Driving Robot Dhruv Solanki1, Mohit Tiwari2, Aditya Vartak3, Saurabh Wanivadekar4, Prof. Naina Kaushik5 1,2,3,4Student, Dept. Of Computer Engineering, Rajiv Gandhi Institute of Technology, Mumbai, Maharashtra, India 5Professor, Dept. Of Computer Engineering, Rajiv Gandhi Institute of Technology, Mumbai, Maharashtra, India Key Words: Artificial Intelligence, Natural Language Human-Robot interaction, Robot Simulation, Sequence Modeling, Path finding. 1.INTRODUCTION An autonomous robot which act and behave with higher degree of independence. In today’s changing world it is necessary for an autonomous robot to work on the Natural Language commands of Humans. Autonomous robots are desirable in fields such as space travel, household work like cleaning, waste water treatment and delivering goods and services. A fully autonomous robot can gain information about environment, work for extended period without human intervention, move either all or part of itself throughout its operating environment without human assistance, avoid situations that are harmful to people, property, or itself unless those are part of its design specification, an autonomous robot may also learn or gain new knowledge like adjusting for new methods of accomplishing its tasks or adapting to changing surroundings. Barrett et al [1] proposed a solution for robot navigation using Natural Language and our work is inspired from them. We are going to simulate the framework using Unity Game Engine [2]. Data flow diagram of the figure is given in Fig-1. In this framework the meaning of a sentence that describe path driven by a robot through an environment containing number of objects. An example of such sentence is “Go to the fire which is behind the box and in front of the stool then go towards the table”. Such sentences are sequence of description in terms of objects in the environment. Nouns in the descriptions indicate the class of objects involved, such as box, stool, table and fire. Prepositions, such as “in front of” will describe the changing position of robot in time(e.g. in front of the stool),as well as to describe the relative positions of objects in the environment (e.g. the fire which is behind the box).We use Spacy[3] to process and apply Natural Language processing over the sentences to extract nouns and preposition phrases and their target and referent nouns. Sequence Modeling uses large sets of hidden Markov models (HMMs), one for each path sentence pair in training. During training, robot learns meanings of words and phrases. These learned meanings are used in Cognizance phase where the robot drives according to the input sentences. The Implementation details and Results are explained in the following sections. Fig-1: Data Flow diagram 2. CREATING VIRTUAL ENVIRONMENT AND DATA ENGINEERING 2.1 Need for Virtual Environment Before any system is to be implemented in hardware, it is always a best practice to simulate the system in possible environments to gauge the readiness of deployment in hardware. It is also a safe way of testing the performance and is economical for the organizations on possible failed projects. This is widely followed particularly in the field of Robotics. Hence, we decided to create a virtual environment and robot simulator initially. We chose Unity Game Engine for this task because it is flexible enough for creating virtual environments, collecting data and also supports navigation tools like NavMesh for A* path finding. 2.2 Environment Creation We created the environment in Unity consisting of a floor or terrain and then placing prefabs of different objects like cone, stool, table etc. on the floor. The Prefab Asset is a template from which you can create new Prefab objects in the Unity Scene. Then we randomized the initializing positions. We then created a robot in the environment ----------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - We present a framework to automatically learn to drive a mobile robot under natural-language command. This framework contains Sequence Modeling (to learn the meaning of sentence and based on sentence try to locate the nouns in environment and identify the associated prepositions) and Cognizance (to generate path and move on that path to accomplish target navigational goal along with an activity in the scenario. This activity can be extinguishing fire, moving objects in the environment etc.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3034 with a view from its view port. Thus, we created an environment with proper objects placed randomly on the floor and a robot which can move around the environment using controls on the host. 2.3 Data Collection We also set up scripts to make sure raw data i.e. Floorplan, Path and Sentences are to be collected for every run of the environment. The coordinates of objects along with label is collected as floorplan data. The sentences entered in input box is collected as sentences data. We would also move the robot in Unity environment according to entered sentence to capture the path of the robot and thus collect the path data too. We collect total of 3 sentences per path driven by robot. All the three data files of floorplan, sentence and path is passed to server over a network for data preprocessing. The server would have all the framework required for the robot navigation. 2.4 Data Preprocessing After we obtain raw data from the simulated environment on Unity Engine, we need to make this data suitable to be used and processed for training of the robot. Hence, we do the following steps for the 3 different raw data – sentences, floorplan and path. For Path Data, the collection of path coordinates needs to be preprocessed to remove repeated coordinates. Coordinates may get repeated in consecutive occurrences if robot hasn’t moved in the frame of time. This allows us to reduce redundancy of data. For Floorplan, since the floorplan is the distribution of objects in the environment, each floorplan object is captured as a tuple of its location in x-y plane and its class i.e. (x, y, class). Thus, representation of each object becomes understandable. For Sentences, they are captured from the Input box. We need to be able to extract meaningful sentences from the data and discard meaningless sentences like “This is robot” as they are of no use to the system as they won’t contribute to goals of the system. This leaves us with a few types of sentences. The above process could have been done with techniques for sentence classification by Machine Learning Approach. However, this approach is Data-Intensive and requires high Computation cost. Hence, we chose to go with Rule-based Classification. Algorithm 2 is the Rule Based Classification Algorithm. This is because, we identified that the types of sentences for a particular goal are not vast and mostly have structures which can be parsed by grammars. Hence, a grammar can be generated for types of sentences to be encountered for the goal. Some Examples of sentence structures and its corresponding sentences are given in the Table -1. Table-1: Examples of Sentence Structures found in use of framework to extinguish Fire Sr. No Sentence Structure Example of Sentence 1) VERB DET NOUN WH- determiner VERB ADP DET NOUN ADP DET NOUN … “Extinguish the fire which is to the left of the chair” 2) VERB ADP DET NOUN WH-determiner VERB ADP DET NOUN ADP DET NOUN... “Go to the fire which is to the left of the stool” 3. SEQUENCE MODELING We apply Hidden Markov Models (HMM) for each temporal segment and the state in the HMMs are the temporal segments of sentence and the observation of each hidden state is sequence of waypoints. A transition matrix and emission matrix are generated for each state. The transition matrix and emission matrix are passed to forward-backward algorithm to compute the probability of HMM being in each state at each point in path. It also estimates how true the entire sentence is of the path. 3.1 Creating Graphical Model Once we have broken down the sentence into temporal segments, we need to model them in mathematical form for processing the data. Hence, we convert every segment of sentence into a Probabilistic Graphical Model [4]. Fig-2 shows the graphical model. Then, a we create a floorplan variable (O1–On) for each noun in each segment. Then we apply the label distribution of noun to the variable’s set of labels. Finally, we need to use the subject and object of preposition in the sentence, and find each preposition distributions over relative positions and velocities are applied between its the subject and the object. The subject noun act as target object and object noun acts as referent object for the preposition. Every Probabilistic Graphical Model starts with a root state Pi followed by a preposition or list of prepositions as children. Each preposition is followed by a noun which is its object and preceded by a noun which is its subject for e.g. “fire which is to the left of table” where subject of left is fire and object is table. Such a preposition is having adjectival use in the sentence. In case of preposition phrases like “towards the bag”, there is no need to have a subject noun for it. Just the object would suffice. Such prepositions are called adverbial usage of preposition.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3035 Fig-2: Graphical models created for Sentence. Green box denotes Prepositions and Blue box denotes nouns in a sentence. For each of the floorplan variable O1 to On, a class label is assigned to each floorplan variable. The probability distribution of noun I for which the floorplan variable that was created gives the score(wij) of the label l assigned to the variable. We create a dictionary of words which stores the meanings of words in mathematical form. For nouns, this information is represented in dictionary of words as a discrete probability distribution over class labels. Now, for the presentation of the prepositions and prepositional phrases, we use location (denoted by μ) and concentration (denoted by κ) parameters for two independent Von Mises distributions [5]. The Von Mises distribution for angular distribution α is given as: 𝑣(𝜇, 𝜅) = 𝑒 𝜅𝑐𝑜𝑠( 𝛼−𝜇) 2𝜋𝐼0(𝜅) (1) When the ith preposition in the lexicon is applied between two variables, whose physical relationship is specified by the position angle θ and velocity angle γ between them, its score zi is given by (2) Conditional probability of graphical model will be calculated using path variable 𝑝 (𝑝 𝑥, 𝑝 𝑦, 𝑝 𝑣 𝑥 , 𝑝 𝑣 𝑦 )of the model on the K floor plan objects (O1...Ok) and m assignments of labels to N floor plan variables. In mapping mn, every element of mapping is the index of floorplan object mapped to floorplan variable n. This helps in deciding which objects in floor plan represent which noun in sentence. Let α be*𝑝, 𝑜 𝑚1 , . . . , 𝑜 𝑚 𝑛 +a set of path variable and mappings of floor plan objects to variables and 𝑏 𝑐1 , 𝑏 𝑐2 be indices of target and referent in a. Position of target is found by coordinates (𝑎 𝑏𝑐1 𝑥 𝑎 𝑏𝑐1 𝑦 ) and referent by(𝑎 𝑏𝑐2 𝑥 𝑎 𝑏𝑐2 𝑦 ). Similarly, velocity of target can be found by(𝑎 𝑏𝑐2 𝑥 𝑎 𝑏𝑐2 𝑦 ). Hence, position angle θ and velocity angle γ is 𝜃 = 𝑡𝑎𝑛−1 ( 𝑎 𝑏𝑐1 𝑦 − 𝑎 𝑏𝑐2 𝑦 𝑎 𝑏𝑐1 𝑥 − 𝑎 𝑏𝑐2 𝑥 ) 𝛾 = 𝑡𝑎𝑛−1 ( 𝑣 𝑦 𝑣𝑥 ) − 𝑡𝑎𝑛−1 ( 𝑎 𝑏𝑐2 𝑦 − 𝑎 𝑏𝑐1 𝑦 𝑎 𝑏𝑐2 𝑥 − 𝑎 𝑏𝑐1 𝑥 ) Now we calculate the segment conditional probability given mapping m, floorplan f, and dictionary Λ using the formula 𝜓(𝑝, 𝑚, 𝑓, 𝛬) = ∏ 𝑧 𝑑𝑐 𝐶 𝑐=1 (𝜃𝑐, 𝛾𝑐) ∏ 𝑤 𝑒 𝑛 𝑙 𝑚𝑛 𝑁 𝑛=1 (3) 3.2 Training HMMs Now, for each path-sentence pair in training data, we generate Hidden Markov Model (HMM). The states of each such HMM consists of probabilistic graphical model of each segment of sentence in proper alignment. Thus, number of states in HMM is equal to number of temporal segments. The HMMs infers the alignment between the densely sampled points in each path and the sequence of temporal segments in its corresponding sentence. The Output model for temporal segment t is given by 𝑅𝑡(𝑝𝑡, 𝑓, 𝛬) = ∑ 𝜓 𝑡 𝑚 (𝑝𝑡|𝑚, 𝑓, 𝛬) (4) Each HMM has its own transition and emission matrices (a and b respectively). Transition matrix in this problem is only allowed to self-loop or transition to next temporal segment state and must start at first segment and ends at last. HMM computes a score at every state to measure the degree of truthfulness of segment. The Expectation Maximization [6] Algorithm helps to find Log Likelihood of HMM being in each state. Once we have the likelihood and the transition model, we use it in Baum-Welch (Forward-Backward) Algorithm which is given as Algorithm 1. Before learning the word meanings, all preposition and noun distributions are randomly initialized. During learning, the model is updated iteratively to increase the overall HMM likelihood as a product over all training
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3036 samples. At each iteration, this concentrates the Probability Mass Distribution of each HMM state’s preposition distributions at angles seen at regions of the path during which that state is of high probability. It also concentrates the probability mass of the object label distributions in containers associated with the mappings corresponding to high HMM likelihoods. Thus, the parameters of nouns and prepositions in the dictionary is updated accordingly. Fig-3 illustrates how HMM is created. Fig-3 Creating HMM from a graphical model. Algorithm 1: Baum-Welch (Forward-Backward) Algorithm Algorithm 2: Rule based Classification 4. COGNIZANCE AND PATHFINDING We will now use a set of waypoints generated from sequence modelling for generating path for robot locomotion. This phase determines the sparse sequence of waypoints which satisfy the sentence. 4.1 Gradient Ascent Gradient ascent algorithm is used to maximize the likelihood function. By maximizing, we try to find out the local maximum for the function. The likelihood function we try to maximize is Equation (4). We apply gradient ascent on score function of each temporal segment it to obtain small set of waypoints for each temporal segment. Thus, for every sentence we get sparse set of waypoints that satisfy the semantics of sentence. 4.2 Collision Awareness The sparse set of waypoints which are generated by gradient ascent does not consider the fact that by following the waypoints, the robot can collide into an obstacle or other floorplan object. This is very important to be considered as obstacle avoidance should be a task that has to be done by the robot to function in real-world environments. Hence, we introduce barrier radius around each object in floorplan. An alternative to this is Barrier Penalty as suggested in [1]. A barrier penalty B(r) is added between each pair of a waypoint and floorplan object as well as between pairs of temporally adjacent waypoints to prevent them from becoming too close. 𝐵(𝑟) = 𝑆𝑀𝑂𝑂𝑇𝐻𝑀𝐴𝑋 (1,1 + 2𝑟1 + 2𝑟2 𝑟 ) (5)
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3037 Where r is the distance either a waypoint and an object or between two waypoints and where r1 and r2 are the radii of the two things being kept apart, either the robot or an object. This barrier is approximately 1 until the distance between the two waypoints become small, at which point it decreases rapidly, pushing them away from each other by approximately the robot radius. 4.3 Path Finding Path finding algorithms are the algorithms that find the least cost route to target from given position in the environment. Thus, optimized path is generated if start and end points are given. Some of the commonly used Path-Finding Algorithm is A* Algorithm. The sparse set of waypoints is passed to path finding algorithm which generate an optimized path between initial position to target position. Unity provides us with the tool to create a Navigation Mesh on the floor or terrain of the environment. NavMesh is a data structure which helps robotic agents for pathfinding through complicated areas, turns etc. It defines what part of environment is traversable and non- traversable. Using NavMesh we can set the target and the waypoints obtained from gradient ascent as the points through which robot agent should definitely pass through. The NavMesh Implementation is shown in Fig-4. Every Floorplan object is considered an obstacle with a fixed radius around it not a part of NavMesh hence creating barrier for collisions. Fig 2.2 NavMesh in Unity Environment (Blue) and barrier radii for objects (Black) Thus, Path is created from sparse set of waypoints and robot is able to understand the sentence and move accordingly in the environment. 5. FUTURE WORK The framework we have discussed certainly is not complete in itself. The framework deals only with nouns and prepositions in the sentences but fails to recognize the role of verb in the sentence. This framework can be made generalizable framework for any robotic task by natural language if we are able to model and learn meanings of verbs, adverbs etc. in context of robot. If robot can understand what the verb in the sentence means then any task can be understood and done by same robot. Similarly, in Sequence Modelling, we could have used Temporal Convolutional Networks (TCN) [7] as it is better than Recurrent Neural Network with LSTM and can replace HMM for sequence modelling task. Given a sentence as an input, this would generate corresponding waypoints for each segment. Adding speech recognition engine would make this system more user-friendly and connected. However, these tasks are not currently in the scope of the paper. 6. CONCLUSION We have understood how the framework would enable the robot to do tasks based on natural language understanding. Data collection is done for collecting raw data followed by preprocessing on the data. The robot will try to identify the words in sentences and find out nouns and prepositions, following which it is grounded by use of mathematical probability distributions and Hidden Markov Model for sequence. Now optimization by gradient ascent for the likelihood function is done for each temporal segment to arrive at sparse set of waypoints. These sparse set of waypoints are then given for calculating path through all these waypoints without colliding with obstacles and other objects. Hence the robot first learns the dictionary of word meanings and then use it to understand the natural language sentence and move accordingly. 7. REFERENCES [1] Paul Barrett, Daniel & Alan Bronikowski, Scott & Yu, Haonan & Siskind, Jeffrey. (2017). Driving Under the Influence (of Language). IEEE Transactions on Neural Networks and Learning Systems. PP. 1-16. 10.1109/TNNLS.2017.2693278. [2] Unity Game Engine: https://unity3d.com/unity [3] “Industrial-Strength Natural Language Processing”: https://spacy.io/ [4] Koller, Daphne and Nir Friedman. “Probabilistic Graphical Models - Principles and Techniques.” (2009). [5] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York, NY, USA: Dover, 1972 [6] Dempster,A.P. ; Liard,N.M.; Rubin,D.B.(1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. [7] Bai, Shaojie & Zico Kolter, J & Koltun, Vladlen. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modelling.