SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 601
Blind Navigation System Using Artificial Intelligence
Ashwani Kumar1, Ankush Chourasia2
1,2 Dept. of Electronics and Communication Engineering, IMS Engineering College, INDIA
---------------------------------------------------------------------***--------------------------------------------------------------------
Abstract - In order to provide the blind people hearable
environment, this project focuses on the field of assistive
devices for visual impairment people. It converts the visual
data by image and video processing into an alternate
rendering modality that will be appropriate for a blind user.
The alternate modalities can be auditory, haptic, or a
combination of both. Therefore, the use of artificial
intelligence for modality conversion, from the visual modality
to another.
Key Words: Hearable Environment, Visual Data, Image
Processing, Video Processing, Artificial Intelligence
1. INTRODUCTION
Imagine how is the life of a blind person, their life is full of
risk, they can't even walk alone through a busy street or
through a park. they always need some assistance from
others. They are also curious about the beauty of the world,
there is excitement to explore the world, and to be aware of
what is happening in front of them. Even though they can
find their own things without anyone's need. The predicted
cases will rise from 36 million to 115 million by 2050 for
blind peoples if treatment is not improved by betterfunding.
A growing ageing population is behind the rising numbers.
Some of the highest rates of blindnessandvisionimpairment
are in South Asia and sub-Saharan Africa. The percentage of
the world's population with visual impairments is actually
falling, according to the study. But because the global
population is growing and more people are living well into
old age, researchers predict the number of people with sight
problems will soar in the coming decades. Analysis of data
from 188 countries suggests there aremorethan200million
people with moderate to severe vision impairment. That
figure is expected to rise to more than 550 million by 2050.
This project contains three main parts, a raspberry pi 3
(powered by android things), camera and artificial
intelligence. When the person presses the button on device,
the camera module starts to take a pictures and analyze the
image using Tensorflow (open source library for numerical
computation ) and detect what is that picture is about, and
then using a speaker or headphone, the device will voice
assist the person about that picture 1).
1.1 Raspberry Pi
The Raspberry Pi was developed in the United Kingdom by
Raspberry Pi Foundation. It is a series of small single board
computers, basically developed to promote basic computer
science in schools and other developing countries.
The Raspberry Pi can be used for various purpose.
Depending on the user requirements it can be customized.
We are using Raspberry Pi for image and video processing.
The Raspberry Pi 3 uses a Broadcom BCM2837 SoC with a
1.2 GHz 64-bit quad-core ARM Cortex-A53 processor, with
512 KB shared L2 cache.
Raspberry Pi 3 Model B has1 GB of RAM. The Raspberry Pi 3
(wireless) is equipped with 2.4 GHz WiFi 802.11n (150
Mbit/s) and Bluetooth 4.1 (24 Mbit/s) based on Broadcom
BCM43438 FullMAC chip with no officialsupportforMonitor
mode but implemented throughunofficialfirmwarepatching
and the Pi 3 also has a 10/100 Ethernet port.
The Raspberry Pi may be operated with any generic USB
computer keyboard and mouse. ItmayalsobeusedwithUSB
storage, USB to MIDI converters, and virtually any other
device/component with USB capabilities. Other peripherals
can be attached to the various pins and connectors on the
surface of the Raspberry Pi 0.
The Raspberry Pi is connected to the system via a Rj45cable.
Raspberry pi consists of different slots for performing
different functions. Raspberry pi Foundation provided a
Raspbian operating system for Raspberry Pi. Python and
Scratch are the main programming language used, and also
support many other languages.
1.2 Raspberry Pi Camera Module
In April 2016, the original Camera Module was replaced by
Raspberry Pi Camera Module. The Camera Module consists
of Sony IMX219 8-megapixel sensor (compared to the 5-
megapixel Omni-Vision OV5647 sensor of the original
camera). Camera module can take video and still
photographs. Libraries bundled in the camera canbeusedto
create effects. It supports 1080p30, 720p60, and VGA90
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 602
video modes, as well as still capture. It attaches via a 15cm
ribbon cable to the CSI port on the Raspberry Pi.
The camera works with all models of Raspberry Pi 1, 2, and
3. It can be accessed through the MMAL and V4L APIs, and
there are numerous third-party libraries built for it,
including the Pi Camera Python library. The camera module
is very popular in home security applications, and in wildlife
camera traps.
1.3 Artificial Intelligence
The field of creation of intelligent machines that work like
humans and respond quickly, in computer science is known
as Artificial intelligence. The core part of AI research is
Knowledge engineering. Machines can react and act like
humans only when they have abundant information related
to the world. To implement knowledgeengineering,Artificial
intelligence should have access to objects, categories,
properties, and relations. To initiate common sense,
reasoning and problem-solving power in machines, it is a
difficult and tedious task. Machine learning is anotheroneof
the core parts of AI. Learning without any kind of
supervision requires an ability to identify patterns in
streams of inputs, whereas learning with adequate
supervision involves classification and numerical
regressions. Classification determinesthe categoryanobject
belongs to and regression deals with obtaining a set of
numerical input or output examples, thereby discovering
functions enabling the generation of suitable outputs from
respective inputs. Mathematical analysisofmachinelearning
algorithms and their performance is a well-definedbranchof
theoretical computer science often referred to as
computational learning theory.
Machine perception deals with the capability to use sensory
inputs to deduce the different aspects of the world, while
computer vision is the power to analyze visual inputs with a
few sub-problems such as facial, object and gesture
recognition.
Artificial neural networks (ANNs)
or connectionist systems are computingsystemsinspiredby
the biological neural networks. An ANN is based on a
collection of connected units or nodes called artificial
neurons. Each connection (analogousto a synapse)between
artificial neurons can transmit a signal from one to another.
The artificial neuron that receives the signal can process it
and then signal artificial neurons connected to it0.
In common ANN implementations, the signal at aconnection
between artificial neurons is a real number, and the output
of each artificial neuron is calculated by a non-linear
function of the sum of its inputs. Artificial neurons and
connections typically have a weight that adjusts as learning
proceeds. The weight increases or decreases the strength of
the signal at a connection. Artificial neurons have a
threshold. Only if aggregate signal crosses that threshold,
then the signal is sent. Artificial neurons are generally
organized in layers. Different layers have different functions
and perform different kinds of transformations on their
inputs. Signals travel from the first (input) to the last
(output) layer, possibly after traversing the layer’s multiple
times.
2. The model used for Image Classification:
Convolutional Neural Network (CNN)
A convolutional neural network is a class of deep, feed-
forward artificial neural networks that have successfully
been applied to analyze the visual image.
CNNs use a multilayer perceptron’s to obtain
minimal preprocessing. They are also known as space
invariant artificial neural networks (SIANN), due to their
shared-weight architecture and translation invariance
characteristics. The deep convolutional neural network can
achieve reasonable performance on hard visual recognition
tasks, matching or exceeding human performance in some
domains. This network that we build is a very smallnetwork
that can run on a CPU and on GPU as well.0.
CNN is composed of convolutional modules of the stack that
performs feature extraction. Each module has a
convolutional layer followed by a pooling layer. The last
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 603
convolutional module is followed by single or denser layers
that perform classification. The final dense layer in CNN
contains a single node for each target classinthemodel,with
a softmax activation function to generate avaluebetween0–
1 for each node (the sum of all these softmax values is equal
to 1). We can interpret the softmax values for a given image
as relative measurements of how likely it is that the image
falls into each target class.
Convolutional layers, which apply a specified number of
convolution filters to the image. For each sub-region,
mathematical operations are performed by layer to produce
a single value in the output feature map. Convolutional
layers then typically apply a ReLU activation function to the
output to introduce nonlinearities into the model.
The rectifier is an activation function defined as thepositive
part of its argument:
f(z)= z+ = max(0,z), where z is the input to a neuron.
A smooth approximation to the rectifier is the analytic
function f(z) = log (1+ exp z), which is called
the softplus function. The derivative of softplus is
f’(z)= exp (z) / (1+exp z) = 1/(1+exp(-z))
i.e. the logistic function.
Convolutional Layer: Applies 32 5x5 filters (extracting 5x5-
pixel subregions), with ReLU activation function.
We are using CIFAR-10 classification to classify RGB 32x32
pixel images. The reason CIFAR-10 was selectedwasthatitis
complex enough to exercise much of TensorFlow'sability to
scale to large models. At the same time, the model is small
enough to train fast, which is ideal for trying out new ideas
and experimenting with new techniques.
Model Architecture
The model CIFAR-10 is a multi-layer architecture consisting
of alternating convolutions and nonlinearities. These layers
are followed by fully connected layersleading into asoftmax
classifier 0. This model achieves a peak performance of
about 86% accuracy within a few hoursof training time on a
GPU. It consists of 1,068,298 learnable parameters and
requires about 19.5M multiply-add operations to compute
inference on a single image. By using CIFAR-10 Model the
images are processed as follows:
1. They are cropped to 32 x 32 pixels, centrallyforevaluation
or randomly for training.
2. They are approximately whitened to make the model
insensitive to dynamic range.
This is a good practice to verify that inputs are built
correctly.
Reading imagesfrom disk and distorting themcanuseanon-
trivial amount of processing time. To prevent these
operations from slowing down training, we run them inside
16 separate threads which continuously fill a Tensor
Flow queue.
Here is a diagram of this model:
Poolinglayers, which down sample the imagedata extracted
by the convolutional layers to reduce the dimensionality of
the feature map in order to decrease processing time. We
used max pooling algorithm, which extracts sub-regions of
the feature map (e.g., 2x2-pixel tiles), keeps their maximum
value, and discards all other values 0.
The most common form of pooling is Max pooling where we
take a filter of size F*F and apply the maximum operation
over the F*F sized part of the image.
If you take the average in place of taking maximum, it will be
called average pooling, but it’snot very popular.Ifyourinput
is of size w1*h1*d1 and the size of the filter is f*f with stride
S. Then the output sizes w2*h2*d2 will be:
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 604
W2= (w1-f)/S +1
h2=(h1-f)/s+1
d2=d1
Most common pooling is done with the filter of size 2*2 with
a stride of 2. As you can calculate using the above formula, it
essentially reduces the size of input by half 0.
Dense (fully connected) layers, which perform classification
on the features extracted by the convolutional layers and
down sampled by the pooling layers. In a dense layer, every
node in the layer is connected to every node in thepreceding
layer.
Dense Layer 1: 1,024 neurons, with dropout regularization
rate of 0.4 (probability of 0.4 that any given element will be
dropped during training)
Dense Layer (Logits Layer) 2: 10 neurons, one for each digit
target class (0–9).
Logits Layer, the final layer of our neural network is the
logits layer, which will return the raw values for our
predictions. The logit model is a regression model where
the dependent variable (DV) is categorical.wheretheoutput
can take only two values, "0" and "1", which represent
outcomes such as pass/fail or win/loss. Cases, where the
dependent variable has more than two outcome categories,
may be analyzed in multinomial logistic regression, or, ifthe
multiple categories are ordered, in ordinal logistic
regression 0.
Our final output tensor of the CNN, logits, has shape [batch
size, 10]
Generate Predictions, the logits layer of our model returns
our predictions as raw values in a [batch size, 10]-
dimensional tensors. Let'sconvert these raw valuesintotwo
different formats that our model function can return:
The predicted class for each example: a digit from 0–9.
The probabilities for each possible target class for each
example: the probability that the example is a 0, is a 1, is a 2,
etc.
Calculate Loss, for both training and evaluation, we need to
define a loss function that measureshow closelythemodel's
predictions match the target classes. To calculate cross
entropy, we use One hot encoding technique. One-hot is a
group of bits among which the legal combinations of values
are only those with a single high (1) bit and all the others
low (0). This cost that will be minimized to reach the
optimum value of weights.
Configure the Training Op, we configure our model to
optimize this loss value during training. We'll use a learning
rate of 0.001 and stochastic gradient descent as the
optimization algorithm 0.
This is a small network and is not state-of-the-art to buildan
image classifier but it’s very good for learning especially
when you are just getting started. For our training set,weget
more than 90% accuracy on the validation set. As we save
the model during training, we shall use this to run on our
own images.
3. CONCLUSIONS
The goal of this research is to provide the better image
processing using artificial intelligence. By using CNN image
classifier, we predict the correct answer with more than
90% accuracy rate. By doing so we achieved the state-of-art
result on the CIFAR-10 dataset. We also use the trained
model with real time image and obtained the correct label.
We integrated tensorflow in Android studio withourtrained
model. And it is deployed in raspberry pi with the help of
Android Things Operating System.
REFERENCES
1) Dean, Jeff; Monga, Rajat; et al. (9 November 2015).
"TensorFlow: Large-scale machine learning on
heterogeneous systems" (PDF). TensorFlow.org.
Google Research. Retrieved 10 November 2015.
2) Getting Started with Raspberry Pi; Matt Richardson
and Shawn Wallace; 176 pages; 2013; ISBN 978-
1449344214.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 605
3) Raspberry Pi User Guide; Eben Upton and Gareth
Halfacree; 312 pages; 2014; ISBN 9781118921661.
4) Xilinx. "HDL Synthesis for FPGAs Design Guide".
section 3.13: "Encoding State Machines". Appendix
A: "Accelerate FPGA Macros with One-Hot
Approach". 1995.
5) Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.;
Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.;
Sainath, T.; Kingsbury, B. (2012). "Deep Neural
Networks for Acoustic Modeling in Speech
Recognition --- The shared views of four research
groups". IEEE Signal ProcessingMagazine.29(6):82-
97. doi:10.1109/msp.2012.2205597
6) R. J. Williams and D. Zipser. Gradient-basedlearning
algorithms for recurrent networks and their
computational complexity. In Back-propagation:
Theory, Architectures, and Applications. Hillsdale,
NJ: Erlbaum, 1994
7) Hilbe, Joseph M. (2009), LogisticRegressionModels,
CRC Press, p. 3, ISBN 9781420075779.
8) Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural
Architecture Search with Reinforcement
Learning". arXiv:1611.01578 
9) Hope, Tom; Resheff, Yehezkel S.; Lieder, Itay (2017-
08-09). Learning TensorFlow: A Guide to Building
Deep Learning Systems. "O'Reilly Media, Inc.".
pp. 64–. ISBN 9781491978481.
10) Sutton, R. S., and Barto A. G. Reinforcement
Learning: An Introduction. The MIT Press,
Cambridge, MA, 1998
11) Zeiler, Matthew D.; Fergus, Rob (2013-01-15).
"Stochastic Pooling for Regularization of Deep
Convolutional Neural Networks". arXiv:1301.3557
12) Lawrence, Steve; C. Lee Giles; Ah Chung Tsoi;
Andrew D. Back (1997). "Face Recognition: A
Convolutional Neural Network Approach". Neural
Networks, IEEE Transactions on. 8 (1): 98–
113. CiteSeerX 10.1.1.92.5813 
13) Krizhevsky, Alex. "ImageNet Classification with
Deep Convolutional NeuralNetworks".Retrieved 17
November 2013.
14) Yosinski, Jason; Clune, Jeff; Nguyen, Anh; Fuchs,
Thomas; Lipson, Hod(2015-06-22)."Understanding
Neural Networks Through Deep
Visualization". arXiv:1506.06579 
15) Graupe, Daniel (2013). Principles of Artifircial
Neural Networks. WorldScientific.pp. 1–. ISBN 978-
981-4522-74-8.
16) Dominik Scherer, Andreas C. Müller, and Sven
Behnke: "Evaluation of Pooling Operatiorns in
Convolutional Architectures for Object
Recognition," In 20th International Conference
Artificial Neural Networks (ICANN), pp. 92-101,
2010. https://doi.org/10.1007/978-3-642-15825-
4_10
17) Graupe, Daniel (7 July 2016). DeepLearrningNeural
Networks: Design and Case Studies.WorldScientific
Publishing Co Inc. pp. 57–110. ISBN 978-9r81-314-
647-1
18) Clevert, Djork-Arné; Unterthiner, Thomas;
Hochreiter, Sepp (2015). "Fast and Accurate Deep
Network Learning by Exponential Linear Units
(ELUs)". arXiv:1511.07289 
19) Yoshua Bengio (2009). LearningDeepArchitectures
for AI. Now Publishers Inc. pp. 1–3. ISBN 978-1-
60198-294-0.
20) Christopher Bishop (1995). Neural Networks for
Pattern Recognition, Oxford University
Press. ISBN 0-19-853864-2
BIOGRAPHIES
Ashwani Kumar, born in 1996,
India. He is currently pursuing
B.tech in ECE from IMS
Engineering College. His interestis
in Data Science and Machine
Learning.
Ankush Chourasia, born in 1996,
India. He is currently pursuing
B.tech in ECE from IMS
Engineering College. His interestis
in developing AndroidApplication.

More Related Content

DOCX
Image Recognition Expert System based on deep learning
PDF
CV _Manoj
PDF
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
PPTX
Ai ml dl_bct and mariners-1
DOCX
deep learning
PPTX
Deep learning
PDF
Future Trends in Artificial Intelligence
PDF
Deep Learning Hardware: Past, Present, & Future
Image Recognition Expert System based on deep learning
CV _Manoj
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
Ai ml dl_bct and mariners-1
deep learning
Deep learning
Future Trends in Artificial Intelligence
Deep Learning Hardware: Past, Present, & Future

What's hot (20)

PDF
101 Webinar - Artificial Intelligence, Deep Learning and Geospatial
PDF
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
PDF
Smart Assistant for Blind Humans using Rashberry PI
PPTX
Artificial Intelligence, Machine Learning and Deep Learning with CNN
PDF
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
PDF
IRJET-Raspberry Pi Based Reader for Blind People
PDF
Intro deep learning
PPTX
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
PPT
brain-computerinterface-SUBHAM KAR
PDF
IRJET- Voice Assisted Text Reading and Google Home Smart Socket Control Syste...
DOC
PDF
IRJET- A Cloud based Virtual Brain Connectivity with EEG Sensor using Interne...
PDF
IRJET- Text Reading for Visually Impaired Person using Raspberry Pi
PDF
Deep learning - what is it and why now?
PDF
Integrative detection of Human, Object movement and Fire Sensing Using LoRaWA...
PDF
Machine Learning approach for Assisting Visually Impaired
PDF
Password Based Hand Gesture Controlled Robot
DOCX
Ai applications study
PDF
Recognizing of Text and Product Label from Hand Held Entity Intended for Visi...
PDF
Contents of Internet of Things(IoT) By Thakur Pawan & Pathania Susheela
101 Webinar - Artificial Intelligence, Deep Learning and Geospatial
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
Smart Assistant for Blind Humans using Rashberry PI
Artificial Intelligence, Machine Learning and Deep Learning with CNN
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
IRJET-Raspberry Pi Based Reader for Blind People
Intro deep learning
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
brain-computerinterface-SUBHAM KAR
IRJET- Voice Assisted Text Reading and Google Home Smart Socket Control Syste...
IRJET- A Cloud based Virtual Brain Connectivity with EEG Sensor using Interne...
IRJET- Text Reading for Visually Impaired Person using Raspberry Pi
Deep learning - what is it and why now?
Integrative detection of Human, Object movement and Fire Sensing Using LoRaWA...
Machine Learning approach for Assisting Visually Impaired
Password Based Hand Gesture Controlled Robot
Ai applications study
Recognizing of Text and Product Label from Hand Held Entity Intended for Visi...
Contents of Internet of Things(IoT) By Thakur Pawan & Pathania Susheela
Ad

Similar to IRJET- Blind Navigation System using Artificial Intelligence (20)

PDF
IRJET- Survey on Face-Recognition and Emotion Detection
PDF
ASSISTANCE SYSTEM FOR DRIVERS USING IOT
PDF
A Smart Assistance for Visually Impaired
PDF
Blue Brain Project
PDF
IRJET - Hand Gesture Recognition and Voice Conversion System using IoT
PDF
Smart Home for Senior Citizens
PDF
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
PDF
IRJET- Smart Traffic Control System using Yolo
PDF
IRJET- Virtual Eye for Blind- A Multi Functionality Interactive Aid using Pi
PDF
IRJET- IOT based Intrusion Detection and Tracking System
PDF
Ijaems apr-2016-17 Raspberry PI Based Artificial Vision Assisting System for ...
PDF
IRJET- Wearable AI Device for Blind
PDF
IRJET- Smart Mirror using Voice Interface
PDF
Smart Navigation Assistance System for Blind People
PDF
IRJET- Review on Portable Camera based Assistive Text and Label Reading f...
PDF
IRJET- Object Detection and Recognition for Blind Assistance
PDF
Social Distancing Detector Management System
PDF
IRJET - Gesture Controlled Home Automation using CNN
PDF
IRJET - Securing Computers from Remote Access Trojans using Deep Learning...
PDF
IMPLEMENTATION OF IDS (INTRUDER DETECTION SYSTEM)
IRJET- Survey on Face-Recognition and Emotion Detection
ASSISTANCE SYSTEM FOR DRIVERS USING IOT
A Smart Assistance for Visually Impaired
Blue Brain Project
IRJET - Hand Gesture Recognition and Voice Conversion System using IoT
Smart Home for Senior Citizens
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Smart Traffic Control System using Yolo
IRJET- Virtual Eye for Blind- A Multi Functionality Interactive Aid using Pi
IRJET- IOT based Intrusion Detection and Tracking System
Ijaems apr-2016-17 Raspberry PI Based Artificial Vision Assisting System for ...
IRJET- Wearable AI Device for Blind
IRJET- Smart Mirror using Voice Interface
Smart Navigation Assistance System for Blind People
IRJET- Review on Portable Camera based Assistive Text and Label Reading f...
IRJET- Object Detection and Recognition for Blind Assistance
Social Distancing Detector Management System
IRJET - Gesture Controlled Home Automation using CNN
IRJET - Securing Computers from Remote Access Trojans using Deep Learning...
IMPLEMENTATION OF IDS (INTRUDER DETECTION SYSTEM)
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
web development for engineering and engineering
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Sustainable Sites - Green Building Construction
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
additive manufacturing of ss316l using mig welding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
Automation-in-Manufacturing-Chapter-Introduction.pdf
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mechanical Engineering MATERIALS Selection
R24 SURVEYING LAB MANUAL for civil enggi
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Model Code of Practice - Construction Work - 21102022 .pdf
OOP with Java - Java Introduction (Basics)
web development for engineering and engineering
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT 4 Total Quality Management .pptx
Sustainable Sites - Green Building Construction

IRJET- Blind Navigation System using Artificial Intelligence

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 601 Blind Navigation System Using Artificial Intelligence Ashwani Kumar1, Ankush Chourasia2 1,2 Dept. of Electronics and Communication Engineering, IMS Engineering College, INDIA ---------------------------------------------------------------------***-------------------------------------------------------------------- Abstract - In order to provide the blind people hearable environment, this project focuses on the field of assistive devices for visual impairment people. It converts the visual data by image and video processing into an alternate rendering modality that will be appropriate for a blind user. The alternate modalities can be auditory, haptic, or a combination of both. Therefore, the use of artificial intelligence for modality conversion, from the visual modality to another. Key Words: Hearable Environment, Visual Data, Image Processing, Video Processing, Artificial Intelligence 1. INTRODUCTION Imagine how is the life of a blind person, their life is full of risk, they can't even walk alone through a busy street or through a park. they always need some assistance from others. They are also curious about the beauty of the world, there is excitement to explore the world, and to be aware of what is happening in front of them. Even though they can find their own things without anyone's need. The predicted cases will rise from 36 million to 115 million by 2050 for blind peoples if treatment is not improved by betterfunding. A growing ageing population is behind the rising numbers. Some of the highest rates of blindnessandvisionimpairment are in South Asia and sub-Saharan Africa. The percentage of the world's population with visual impairments is actually falling, according to the study. But because the global population is growing and more people are living well into old age, researchers predict the number of people with sight problems will soar in the coming decades. Analysis of data from 188 countries suggests there aremorethan200million people with moderate to severe vision impairment. That figure is expected to rise to more than 550 million by 2050. This project contains three main parts, a raspberry pi 3 (powered by android things), camera and artificial intelligence. When the person presses the button on device, the camera module starts to take a pictures and analyze the image using Tensorflow (open source library for numerical computation ) and detect what is that picture is about, and then using a speaker or headphone, the device will voice assist the person about that picture 1). 1.1 Raspberry Pi The Raspberry Pi was developed in the United Kingdom by Raspberry Pi Foundation. It is a series of small single board computers, basically developed to promote basic computer science in schools and other developing countries. The Raspberry Pi can be used for various purpose. Depending on the user requirements it can be customized. We are using Raspberry Pi for image and video processing. The Raspberry Pi 3 uses a Broadcom BCM2837 SoC with a 1.2 GHz 64-bit quad-core ARM Cortex-A53 processor, with 512 KB shared L2 cache. Raspberry Pi 3 Model B has1 GB of RAM. The Raspberry Pi 3 (wireless) is equipped with 2.4 GHz WiFi 802.11n (150 Mbit/s) and Bluetooth 4.1 (24 Mbit/s) based on Broadcom BCM43438 FullMAC chip with no officialsupportforMonitor mode but implemented throughunofficialfirmwarepatching and the Pi 3 also has a 10/100 Ethernet port. The Raspberry Pi may be operated with any generic USB computer keyboard and mouse. ItmayalsobeusedwithUSB storage, USB to MIDI converters, and virtually any other device/component with USB capabilities. Other peripherals can be attached to the various pins and connectors on the surface of the Raspberry Pi 0. The Raspberry Pi is connected to the system via a Rj45cable. Raspberry pi consists of different slots for performing different functions. Raspberry pi Foundation provided a Raspbian operating system for Raspberry Pi. Python and Scratch are the main programming language used, and also support many other languages. 1.2 Raspberry Pi Camera Module In April 2016, the original Camera Module was replaced by Raspberry Pi Camera Module. The Camera Module consists of Sony IMX219 8-megapixel sensor (compared to the 5- megapixel Omni-Vision OV5647 sensor of the original camera). Camera module can take video and still photographs. Libraries bundled in the camera canbeusedto create effects. It supports 1080p30, 720p60, and VGA90
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 602 video modes, as well as still capture. It attaches via a 15cm ribbon cable to the CSI port on the Raspberry Pi. The camera works with all models of Raspberry Pi 1, 2, and 3. It can be accessed through the MMAL and V4L APIs, and there are numerous third-party libraries built for it, including the Pi Camera Python library. The camera module is very popular in home security applications, and in wildlife camera traps. 1.3 Artificial Intelligence The field of creation of intelligent machines that work like humans and respond quickly, in computer science is known as Artificial intelligence. The core part of AI research is Knowledge engineering. Machines can react and act like humans only when they have abundant information related to the world. To implement knowledgeengineering,Artificial intelligence should have access to objects, categories, properties, and relations. To initiate common sense, reasoning and problem-solving power in machines, it is a difficult and tedious task. Machine learning is anotheroneof the core parts of AI. Learning without any kind of supervision requires an ability to identify patterns in streams of inputs, whereas learning with adequate supervision involves classification and numerical regressions. Classification determinesthe categoryanobject belongs to and regression deals with obtaining a set of numerical input or output examples, thereby discovering functions enabling the generation of suitable outputs from respective inputs. Mathematical analysisofmachinelearning algorithms and their performance is a well-definedbranchof theoretical computer science often referred to as computational learning theory. Machine perception deals with the capability to use sensory inputs to deduce the different aspects of the world, while computer vision is the power to analyze visual inputs with a few sub-problems such as facial, object and gesture recognition. Artificial neural networks (ANNs) or connectionist systems are computingsystemsinspiredby the biological neural networks. An ANN is based on a collection of connected units or nodes called artificial neurons. Each connection (analogousto a synapse)between artificial neurons can transmit a signal from one to another. The artificial neuron that receives the signal can process it and then signal artificial neurons connected to it0. In common ANN implementations, the signal at aconnection between artificial neurons is a real number, and the output of each artificial neuron is calculated by a non-linear function of the sum of its inputs. Artificial neurons and connections typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons have a threshold. Only if aggregate signal crosses that threshold, then the signal is sent. Artificial neurons are generally organized in layers. Different layers have different functions and perform different kinds of transformations on their inputs. Signals travel from the first (input) to the last (output) layer, possibly after traversing the layer’s multiple times. 2. The model used for Image Classification: Convolutional Neural Network (CNN) A convolutional neural network is a class of deep, feed- forward artificial neural networks that have successfully been applied to analyze the visual image. CNNs use a multilayer perceptron’s to obtain minimal preprocessing. They are also known as space invariant artificial neural networks (SIANN), due to their shared-weight architecture and translation invariance characteristics. The deep convolutional neural network can achieve reasonable performance on hard visual recognition tasks, matching or exceeding human performance in some domains. This network that we build is a very smallnetwork that can run on a CPU and on GPU as well.0. CNN is composed of convolutional modules of the stack that performs feature extraction. Each module has a convolutional layer followed by a pooling layer. The last
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 603 convolutional module is followed by single or denser layers that perform classification. The final dense layer in CNN contains a single node for each target classinthemodel,with a softmax activation function to generate avaluebetween0– 1 for each node (the sum of all these softmax values is equal to 1). We can interpret the softmax values for a given image as relative measurements of how likely it is that the image falls into each target class. Convolutional layers, which apply a specified number of convolution filters to the image. For each sub-region, mathematical operations are performed by layer to produce a single value in the output feature map. Convolutional layers then typically apply a ReLU activation function to the output to introduce nonlinearities into the model. The rectifier is an activation function defined as thepositive part of its argument: f(z)= z+ = max(0,z), where z is the input to a neuron. A smooth approximation to the rectifier is the analytic function f(z) = log (1+ exp z), which is called the softplus function. The derivative of softplus is f’(z)= exp (z) / (1+exp z) = 1/(1+exp(-z)) i.e. the logistic function. Convolutional Layer: Applies 32 5x5 filters (extracting 5x5- pixel subregions), with ReLU activation function. We are using CIFAR-10 classification to classify RGB 32x32 pixel images. The reason CIFAR-10 was selectedwasthatitis complex enough to exercise much of TensorFlow'sability to scale to large models. At the same time, the model is small enough to train fast, which is ideal for trying out new ideas and experimenting with new techniques. Model Architecture The model CIFAR-10 is a multi-layer architecture consisting of alternating convolutions and nonlinearities. These layers are followed by fully connected layersleading into asoftmax classifier 0. This model achieves a peak performance of about 86% accuracy within a few hoursof training time on a GPU. It consists of 1,068,298 learnable parameters and requires about 19.5M multiply-add operations to compute inference on a single image. By using CIFAR-10 Model the images are processed as follows: 1. They are cropped to 32 x 32 pixels, centrallyforevaluation or randomly for training. 2. They are approximately whitened to make the model insensitive to dynamic range. This is a good practice to verify that inputs are built correctly. Reading imagesfrom disk and distorting themcanuseanon- trivial amount of processing time. To prevent these operations from slowing down training, we run them inside 16 separate threads which continuously fill a Tensor Flow queue. Here is a diagram of this model: Poolinglayers, which down sample the imagedata extracted by the convolutional layers to reduce the dimensionality of the feature map in order to decrease processing time. We used max pooling algorithm, which extracts sub-regions of the feature map (e.g., 2x2-pixel tiles), keeps their maximum value, and discards all other values 0. The most common form of pooling is Max pooling where we take a filter of size F*F and apply the maximum operation over the F*F sized part of the image. If you take the average in place of taking maximum, it will be called average pooling, but it’snot very popular.Ifyourinput is of size w1*h1*d1 and the size of the filter is f*f with stride S. Then the output sizes w2*h2*d2 will be:
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 604 W2= (w1-f)/S +1 h2=(h1-f)/s+1 d2=d1 Most common pooling is done with the filter of size 2*2 with a stride of 2. As you can calculate using the above formula, it essentially reduces the size of input by half 0. Dense (fully connected) layers, which perform classification on the features extracted by the convolutional layers and down sampled by the pooling layers. In a dense layer, every node in the layer is connected to every node in thepreceding layer. Dense Layer 1: 1,024 neurons, with dropout regularization rate of 0.4 (probability of 0.4 that any given element will be dropped during training) Dense Layer (Logits Layer) 2: 10 neurons, one for each digit target class (0–9). Logits Layer, the final layer of our neural network is the logits layer, which will return the raw values for our predictions. The logit model is a regression model where the dependent variable (DV) is categorical.wheretheoutput can take only two values, "0" and "1", which represent outcomes such as pass/fail or win/loss. Cases, where the dependent variable has more than two outcome categories, may be analyzed in multinomial logistic regression, or, ifthe multiple categories are ordered, in ordinal logistic regression 0. Our final output tensor of the CNN, logits, has shape [batch size, 10] Generate Predictions, the logits layer of our model returns our predictions as raw values in a [batch size, 10]- dimensional tensors. Let'sconvert these raw valuesintotwo different formats that our model function can return: The predicted class for each example: a digit from 0–9. The probabilities for each possible target class for each example: the probability that the example is a 0, is a 1, is a 2, etc. Calculate Loss, for both training and evaluation, we need to define a loss function that measureshow closelythemodel's predictions match the target classes. To calculate cross entropy, we use One hot encoding technique. One-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0). This cost that will be minimized to reach the optimum value of weights. Configure the Training Op, we configure our model to optimize this loss value during training. We'll use a learning rate of 0.001 and stochastic gradient descent as the optimization algorithm 0. This is a small network and is not state-of-the-art to buildan image classifier but it’s very good for learning especially when you are just getting started. For our training set,weget more than 90% accuracy on the validation set. As we save the model during training, we shall use this to run on our own images. 3. CONCLUSIONS The goal of this research is to provide the better image processing using artificial intelligence. By using CNN image classifier, we predict the correct answer with more than 90% accuracy rate. By doing so we achieved the state-of-art result on the CIFAR-10 dataset. We also use the trained model with real time image and obtained the correct label. We integrated tensorflow in Android studio withourtrained model. And it is deployed in raspberry pi with the help of Android Things Operating System. REFERENCES 1) Dean, Jeff; Monga, Rajat; et al. (9 November 2015). "TensorFlow: Large-scale machine learning on heterogeneous systems" (PDF). TensorFlow.org. Google Research. Retrieved 10 November 2015. 2) Getting Started with Raspberry Pi; Matt Richardson and Shawn Wallace; 176 pages; 2013; ISBN 978- 1449344214.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 605 3) Raspberry Pi User Guide; Eben Upton and Gareth Halfacree; 312 pages; 2014; ISBN 9781118921661. 4) Xilinx. "HDL Synthesis for FPGAs Design Guide". section 3.13: "Encoding State Machines". Appendix A: "Accelerate FPGA Macros with One-Hot Approach". 1995. 5) Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.; Kingsbury, B. (2012). "Deep Neural Networks for Acoustic Modeling in Speech Recognition --- The shared views of four research groups". IEEE Signal ProcessingMagazine.29(6):82- 97. doi:10.1109/msp.2012.2205597 6) R. J. Williams and D. Zipser. Gradient-basedlearning algorithms for recurrent networks and their computational complexity. In Back-propagation: Theory, Architectures, and Applications. Hillsdale, NJ: Erlbaum, 1994 7) Hilbe, Joseph M. (2009), LogisticRegressionModels, CRC Press, p. 3, ISBN 9781420075779. 8) Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv:1611.01578  9) Hope, Tom; Resheff, Yehezkel S.; Lieder, Itay (2017- 08-09). Learning TensorFlow: A Guide to Building Deep Learning Systems. "O'Reilly Media, Inc.". pp. 64–. ISBN 9781491978481. 10) Sutton, R. S., and Barto A. G. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998 11) Zeiler, Matthew D.; Fergus, Rob (2013-01-15). "Stochastic Pooling for Regularization of Deep Convolutional Neural Networks". arXiv:1301.3557 12) Lawrence, Steve; C. Lee Giles; Ah Chung Tsoi; Andrew D. Back (1997). "Face Recognition: A Convolutional Neural Network Approach". Neural Networks, IEEE Transactions on. 8 (1): 98– 113. CiteSeerX 10.1.1.92.5813  13) Krizhevsky, Alex. "ImageNet Classification with Deep Convolutional NeuralNetworks".Retrieved 17 November 2013. 14) Yosinski, Jason; Clune, Jeff; Nguyen, Anh; Fuchs, Thomas; Lipson, Hod(2015-06-22)."Understanding Neural Networks Through Deep Visualization". arXiv:1506.06579  15) Graupe, Daniel (2013). Principles of Artifircial Neural Networks. WorldScientific.pp. 1–. ISBN 978- 981-4522-74-8. 16) Dominik Scherer, Andreas C. Müller, and Sven Behnke: "Evaluation of Pooling Operatiorns in Convolutional Architectures for Object Recognition," In 20th International Conference Artificial Neural Networks (ICANN), pp. 92-101, 2010. https://doi.org/10.1007/978-3-642-15825- 4_10 17) Graupe, Daniel (7 July 2016). DeepLearrningNeural Networks: Design and Case Studies.WorldScientific Publishing Co Inc. pp. 57–110. ISBN 978-9r81-314- 647-1 18) Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp (2015). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)". arXiv:1511.07289  19) Yoshua Bengio (2009). LearningDeepArchitectures for AI. Now Publishers Inc. pp. 1–3. ISBN 978-1- 60198-294-0. 20) Christopher Bishop (1995). Neural Networks for Pattern Recognition, Oxford University Press. ISBN 0-19-853864-2 BIOGRAPHIES Ashwani Kumar, born in 1996, India. He is currently pursuing B.tech in ECE from IMS Engineering College. His interestis in Data Science and Machine Learning. Ankush Chourasia, born in 1996, India. He is currently pursuing B.tech in ECE from IMS Engineering College. His interestis in developing AndroidApplication.