SlideShare a Scribd company logo
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 11, No. 4, August 2021, pp. 3434~3442
ISSN: 2088-8708, DOI: 10.11591/ijece.v11i4.pp3434-3442  3434
Journal homepage: http://ijece.iaescore.com
An assistive model of obstacle detection based on deep learning:
YOLOv3 for visually impaired people
Nachirat Rachburee, Wattana Punlumjeak
Department of Computer Engineering, Faculty of Engineering, Rajamangala University of Technology Thanyaburi,
Pathum Thani, Thailand
Article Info ABSTRACT
Article history:
Received Jul 31, 2020
Revised Dec 22, 2020
Accepted Jan 19, 2021
The World Health Organization (WHO) reported in 2019 that at least 2.2
billion people were visual-impairment or blindness. The main problem of
living for visually impaired people have been facing difficulties in moving
even indoor or outdoor situations. Therefore, their lives are not safe and
harmful. In this paper, we proposed an assistive application model based on
deep learning: YOLOv3 with a Darknet-53 base network for visually
impaired people on a smartphone. The Pascal VOC2007 and Pascal
VOC2012 were used for the training set and used Pascal VOC2007 test set
for validation. The assistive model was installed on a smartphone with an
eSpeak synthesizer which generates the audio output to the user. The
experimental result showed a high speed and also high detection accuracy.
The proposed application with the help of technology will be an effective
way to assist visually impaired people to interact with the surrounding
environment in their daily life.
Keywords:
Assistive model
Deep learning
Obstacle detection
Visually impaired
YOLOv3
This is an open access article under the CC BY-SA license.
Corresponding Author:
Wattana Punlumjeak
Department of Computer Engineering
Rajamangala University of Technology Thanyaburi
39 Moo 1, Klong 6, Khlong Luang Pathum Thani 12110 Thailand
Email: wattana.p@en.rmutt.ac.th
1. INTRODUCTION
The visual-impairment or blindness people in the world today is at least 2.2 billion reported by the
World Health Organization (WHO) on 8 October 2019. The definition of classification of diseases 11(2018)
classifies vision impairment by visual acuity worse into two groups: distance, and near presenting vision
impairment. The visual acuity worse presenting between 6/12 and 6/60 of distance is defined as vision
impairment, whereas visual acuity worse presenting than 3/60 of distance is defined as blindness [1]. One
difficulty in the daily life of the visually impaired or blind people is living an invisible life indoor or outdoor
environment. Although, guide dogs or white cane still the most popular tool used for obstacle detectors to
navigate but the visually impaired people cannot know what things or the name of the obstacles are. A large
number of people who visually impaired or blinded have been realized for the researcher to find a technology
or a solution to assist them in their daily life.
Object detection, image processing and machine learning are some of the popular topics and have
become rapidly growing fields. Object detection is a computer technology that deals with detecting instances
of semantic objects of a certain class such as humans, cars, dogs, or traffic signs in digital images and videos.
Machine learning is a subset of application of artificial intelligence (AI) that subject in the scientific study of
algorithms and statistical models which provides systems the ability to automatically learn and improve itself
from experience without being explicitly programmed. Machine learning can be categorized into supervised,
Int J Elec & Comp Eng ISSN: 2088-8708 
An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee)
3435
semi-supervised or unsupervised. Classification is one of the supervised learning algorithms in machine
learning category. The classification used in object detection is to classify the object into a certain class that
has learned. Deep learning is part of a broader family of machine learning methods based on artificial neural
networks that use multiple layers to progressively extract higher-level features from the raw input. In image
processing, lower layers of neural networks may identify edges, while higher layers may identify the
concepts relevant to object in that image such as humans, dogs, cats, or cars.
In this research, an efficient algorithm in machine learning is proposed. The PASCAL VOC2007
and the PASCAL VOC2012 data set is used to train the machine. The prototype of the system on the screen
of the smartphone is developed to find the best assistive model for the visually impaired. The paper is
organized as follows: after the introduction, literature review, and related works are presented in section 2,
follow by the research method and proposed experiment in section 3, the result and discussion in section 4.
Finally, in section 5 we provide conclusive remarks and our future work.
2. LITERATURE REVIEW AND RELATED WORK
In the past, one of the main tasks of machine learning was to classify things by creating classifiers
that could classify whether the object in the image was a person or an animal (e.g. dog, cat) or any other
object. In this era, most researchers had focused on finding and creating effective classifiers, from simple
linear classifiers that combine features from linear combination until support vector machine (SVM)
classifier used the kernel function to transform these features into mathematical kernel space. When research
on the classification of things became saturated, researchers began to move on to more difficult and
challenging problems, which were "detecting and classifying" what was an interesting object in the image.
The paper [2] that had known as the pioneer of object recognition which used a convolutional neural network
(CNN) with gradient-based learning to handwritten character recognition. The breakthrough in computer
vision, the face detector system by Viola and Jones [3]. The main idea of this research has created a cascade
classifier and combined it with AdaBoosting learning algorithm instead of using a one classifier. At that time,
Viola and Jones research paper was considered state-of-the-art for object detection. Due to the limitations of
the processors that are not fast enough, therefore CNN classifier was not received much attention. In the
famous annual computer vision competition, imagenet large-scale visual recognition challenge in 2012:
ILSVRC, Alex and his team presented a deep convolutional neural network architecture called AlexNet [4].
AlexNet showed the best performance in the competition. So, CNN is becoming more and more popular, and
many CNN models and architecture had been improved from the previous AlexNet structure. After that,
more research that adapted and fine-tuning from the previous architecture had been proposed e.g. VGG [5],
GoogLeNet which its codenamed was Inception [6], Microsoft ResNet [7] and more.
The advent of region-based CNNs (R-CNN) which the authors purposed to solved object detection
problems [8]. The R-CNN processes were split into two steps: region proposal step and the classification
step. Region proposal step used selection search which proposed region of interest (ROI) and generated
different 2000 regions then extract feature by CNN named AlexNet. Then, classified each region using linear
SVMs in the classification step. The same author from R-CNN [9] improved the Fast R-CNN from the
R-CNN in the problem of speed by used ROI pooling [10] through a ConvNet to extracted the feature and
used a fully connected layer instead of SVM to classification or recognition. In 2016, Faster R-CNN was
presented [11]. Faster R-CNN was improved from Fast R-CNN by replacing region proposals with region
proposal network (RPN) after the last convolutional layer. Faster R-CNN had two outputs: a bounding-box
offset for each candidate object and a class label of ROI. Mask R-CNN [12] extended Faster R-CNN by
adding a mask branch to predict a segmentation mask on each ROI while the existing branch for
classification and bounding box regression.
All the above of object detectors are the state of the art which based on a two-stage framework: The
first stage generates region proposal to localize the object in the image, the second stage classifies the object.
Despite the success of two-stage detectors, one-stage detectors are also applied. Single shot multibox detector
(SSD) was presented in 2016 [13] which based on standard network architecture: VGG-16. The SSD
produced a bounding boxes in different aspect ratios and score for each category of presence object.
Another famous one-stage detector is you only look once (YOLO) [14]. YOLO, a unified
architecture which straightforward, simple and extremely fast in which the network architecture was inspired
by GoogleNet, then is called Darknet. The network architecture has 24 convolutional layers working as
feature extractors and 2 fully connected layers for the predictions in which the framework is trained on the
ImageNet-1000 dataset. The YOLO architecture [14] showed in Figure 1. The YOLO used algorithms which
based on regression where the process of detection, localization, and classification the object for the input
image will take place in a single pass. The YOLO detection system process start by resizes the input image
into 448 X 448 and then divided into S X S grid cell, then fed into the single convolutional network. Each
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442
3436
grid cell predicts B bounding box and confidence score which output of each bounding box consists of 5
prediction values: offset values (x, y, w, and h) where x and y are the coordinates of the object in the input
image, w and h are the width and height of the object respectively for each of the bounding box, while the
last prediction value is confidence score or class probabilities that given in terms of an IOU (intersection over
union), which should have the object exist in the bounding box. The bounding box which has a high
confidence score above the threshold value is selected and then used to locate the object within the image.
Figure 1. The YOLO architecture [14]
YOLO or named YOLOv1 has a limitation, that it could not find small objects in the image if they
appeared as a cluster or group and difficult found in a generalization of objects if the image is different
dimensions from the trained image. In December 2016, the second version of the YOLO has named YOLOv2
or YOLO9000 real-time framework for detection categories of the object more than 9000 categories have
been published [15]. Mainly new thing was to introduce the anchor boxes which are designed for a given
dataset by using k-means clustering to responsible to predict the bounding box. The architecture of YOLOv2
used the Darknet-19 architecture with 19 convolutional layers and 5 max-pooling layers and then a softmax
layer for classification of the objects. YOLOv3: An incremental improvement has been published in April
2018 [16]. YOLOv3 has been improved from the previous version with a high accuracy of classifying the
objects. To predict the score for the objects for each bounding box, YOLOv3 uses logistic regression and also
used independent logistic classifiers for each box to predict the classes of the bounding box which may
contain an object instead of softmax. YOLOv3 uses the Darknet-53 network [16] for feature extractor which
has 53 convolutional layers which showed in Figure 2.
One of the challenging problems in the field of object detection and machine learning is assisting
people who visually impaired. Many researchers proposed their work which aims to help visually impaired in
daily life. Patient monitoring framework in telemedicine system was presented in different scenarios [17]. A
greedy algorithm was developed to design cascade for applied real-time text detector for the visually
impaired [18]. Arakeri et al. proposed a raspberry pi with NoIR camera to captures the readable material
around the visually impaired and used a speech synthesis to generate sound in regional language [19]. Fink
and Humayun [20], and more researcher [21, 22] presented the invention for the visually impaired. A digital
camera mounted on the person’s eye or head is used to take snapshots of an image on demand and provided
to an image processing algorithm. Edge detection techniques are used to identify the object in the image and
classified the known object by artificial neural networks that have been trained. The invention could
determine the size, distance from another object and announced the computer-based voice synthesizer to
describe the descriptive sentence for the blind. Tapu et al. [23] introduced a real-time obstacle detection and
classification to assist visually impaired people in indoor and outdoor environments by handling a
smartphone device with the help of a chest-mounted harness. The authors proposed the step of object
detection by extract an image grid and using the multiscale Lucas Kanade algorithm to track the interested
point. Then, estimate the motion classes with an agglomerative clustering technique to classify into clusters
and refined them with the K-NN algorithm. The step of moving object classification incorporates the HOG
descriptor into the bag of visual words (BoVW) retrieval framework. The results of the experiment in
different environments achieved high accuracy rates and efficient to a blind person. The AI assistant through
Int J Elec & Comp Eng ISSN: 2088-8708 
An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee)
3437
an android mobile application for visually impaired was proposed by [24]. The group of researchers had
focused their idea on image recognition, currency recognition, text recognition, chatbot and voice assistant by
using voice command via interaction with the environment. Their application was developed on the google
cloud platform which used cloud API libraries. Convolution neural networks for object detection systems for
visually impaired or blind people were continually improved. Convolutional neural network and Haar
cascade classifiers were compared by Shah et al. [25] to conduct a suitable algorithm to assist the visually
impaired or blind person for a real-time scenario. The dataset used in the training process of CNN was
COCO 2017. The experiment was conducted that CNN is more high accuracy to detect multiple objects than
Haar cascade for real-time applications. Bianco et al. [26] presented a category-based image quality
assessment named DeepBIQ that used to extract features from a CNN fine-tuned for image quality task. The
group of researchers [27] used the single shot multibox detector (SSD) in their system to identified objects
after a webcam captured a real-time scene and extracted. Raspberry Pi 3 was used as a prototype, and the
audio-based detector was generated the detection information as sound to the connected headphone. The
model was worked well although offline condition compared with fast R-CNN. An experiment of object
detection and localization in the street environment model was proposed by [28]. The pre-trained model
based on faster R-CNN was used with the COCO dataset while transfer learning was fine-tuned. The self-
made dataset was acquired from the internet in different kinds include the object in the urban street. They
concluded that faster R-CNN on a self-made dataset improved average accuracy and the fine-tuned network
was effective.
Figure 2. Darknet-53 architecture [16]
In a single-stage object detection network, there are much interesting research and its applications:
Detecting obstacles with the light field camera was proposed by [29]. YOLO, deep learning was used to
classify objects in the image under the indoor environment into categories. The group of researchers had
presented that the obstacle was accurately classified and was getting a high accuracy in size and their
position. Human action recognition with YOLO object detection in the frame of the video was used by [30]
with the LIRIS dataset. The proposed were presented effectively in terms of action label, confidence score,
and localized action. YOLO and multi-task cascaded neural networking (MTCNN) structure were proposed
by Rahman et al. [31] to implement an assistive model for visually impaired people on Raspberry Pi for
object detection and facial recognition. The personalized dataset used in this proposed model consists of 3
positions: left, right, and center of the new face image and the name of that image act as a label of the data.
The model achieved 6-7 FPS with a 63-80% accuracy rate for object detection while the accuracy rate of
facial recognition achieved an 80-100%. To improve the accuracy of face detection, Garg et al. [32] used the
YOLO framework on face detection dataset and benchmark (FDDB) dataset to train their proposed work. A
model compared the execution time and the performance on two different machines. After fine-tuning all
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442
3438
parameters and the suitable values of the proposed model, the accuracy was compared with Harr cascade and
R-CNN algorithms. To improve the YOLOv2 structure for pedestrian detection, Lan et al. [33] proposed a
YOLO-R structure which added three layers of pedestrian feature in front of the deep YOLOv2 network and
also changed the passthrough layers to increase the ability of the network. The results of the proposed model
had shown the high accuracy of pedestrian detection while the false rate and the miss rate was reduced,
compared with the YOLOv2 network on the INRIA dataset. A real-time face detection model was proposed
by Wang, and Jiachun [34] on the WIDER FACE dataset with the YOLO algorithm. The 20 various images
size were selected from three datasets: Celeb Faces, FDDB, and WIDER FACE dataset to use for a testing
phase in the proposed model. They had shown the high-speed rate of detection, reduced error rate and strong
robustness of the YOLOv3 compared with traditional algorithms. To handle a real-time object detection for
non-GPU computers, the group of researchers [35] was proposed the YOLO-LITE model which the best trial
experiment was run on the COCO dataset. They had shown that YOLO-LITE was a faster, smaller, and more
efficient model to detect the object compared with the state of the art model in a variety of devices. A pre-
trained ssdlite_mobilenet_v2_coco_2018_05_09 model as a feature extractor was used for obstacle detection
in sidewalks design and alert system for visually impaired people [36]. Raspberry Pi and Pi camera were used
as hardware prototype and eSpeak was used as a speech synthesizer to represent the direction of the object by
headphones. Whereas, the vibration sensor was activated when the object was detected and recognized. The
application for visually impaired people for an android platform was proposed by [37]. The researchers had
claimed that their application would be a virtual third eye for visually impaired. A suitable chest strap was
designed to hold the phone [38]. Tiny-YOLO was used in their experiment and integrate with ARKit
configuration to detect an object with augmented reality for iOS applications. The training period used Tiny
YOLO with a Darknet base network on Turicreate engine and INRIA annotation for the Graz-02 dataset,
while they tested the model by using a 100-random set of cars and bikes in different background, shape, and
size. In their conclusion, the model could detect an object and overlayed 3D graphics at the location of an
object in an effective way. [39] YOLOv3 algorithm was used to detect the five classes of a real-time object of
traffic participants or road signalization in advanced driver assistance systems (ADAS). The proposed system
evaluated on NVidia GeForce GTX 1060GPU by using the weights on the COCO pre-trained model and
trained on the Berkley deep drive dataset. The effectiveness of the proposed model had shown in the variety
of driving conditions.
3. RESEARCH METHOD
The proposed ideas of our work divided into two parts: Train the detection model and then
developed the application. Our proposed method is shown as in a Figure 3(a) and (b). Our experiment model
was done on Google Colaboratory: Colab 12 GB-RAM GPU. First, we prepared the data used by download
the PASCAL VOC2007 and the PASCAL VOC2012 from The PASCAL Visual Object Classes Homepage.
The two datasets have twenty object classes that have been selected are:
 Person: Person
 Animal: Bird, cat, cow, dog, horse, sheep
 Vehicle: Airplane, bicycle, boat, bus, car, motorbike, train
 Indoor: Bottle, chair, dining table, potted plant, sofa, television/monitor
The PASCAL VOC2007 have train/validation: 9,963 images containing 24,640 annotated objects
and PASCAL VOC2012 have train/ validation: 11,530 images containing 27,450 ROI annotated objects and
6,929 segmentations. The two datasets have been split into 50% for training/validation and 50% for testing.
In our experiment, we combined two dataset together as a dataset for training set and used PASCAL
VOC2007 test set for data testing set. Second, train YOLOv3 by using Darknet on Colab with the dataset we
prepared from the process above and then validate with testing dataset. The YOLOv3 structure shown as
Figure 4.
(a) (b)
Figure 3. Proposed method (a) Train the detection model, and (b) The prototype of detection application
Int J Elec & Comp Eng ISSN: 2088-8708 
An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee)
3439
Figure 4. The YOLOv3 structure
4. RESULTS AND DISCUSSION
From our proposed methodology above, after training the YOLOv3 with the dataset prepared above,
a detection model had been as an output. So, we export the model from Google COLAB to our local drive.
Then, we developed a prototype of an application on a smartphone which installed the obstacle detection
model. We designed the user interface (UI) in a simple way and used eSpeak as a function to generate the
audio output. The eSpeak is open-source software that synthesizes text to speech in English and other
languages. The example of the indoor and outdoor images we captured in the real-time view or in the real
situation which mimics as an input to the obstacle detection application shown in Figure 5(a) and 5(b).
(a) (b)
Figure 5. The example of the image in the real situation
The captured image was then forwarded to the obstacle detection model to classify the object. After
that, the system showed the class of the output it detected and generated a voice synthesis of the detected
object to notify or assist the visually impaired or blind people to identify the object. The output of the system
shown in Figure 6(a) and 6(b).
The prototype of the obstacle detection system on the screen of the smartphone shown in Figure 7.
The experimental in real situation results based on YOLOv3 showed a high speed and also high detection
accuracy in the real-time view. The proposed model of the obstacle detection system on a smartphone will be
assisting visually impaired people about the surrounding environment.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442
3440
(a) (b)
Figure 6. The output of the obstacle detection system
Figure 7. The output of the system prototype on the smartphone
5. CONCLUSION
In this paper, we have introduced a novel framework of an application on a smartphone for obstacle
detection and classification which based on deep learning: YOLOv3. Our proposed application on a
smartphone works in real-time to capture an image and forward it to the obstacle detection system. The
experiment results prove the effectiveness of the system which not only able to show the output of the
obstacle detected and can classify the name in the class of the obstacle but also can generate the audio output
in their own languages. An application of obstacle detection and classification for visually impaired people
will be a benefit in safety and comfort for a better quality of living in daily life. In our future works, we will
study the distance between visually impaired people and obstacles. We plan to study a similar triangle,
Euclidean distance, and other theories and then integrate it to improve the overall application.
REFERENCES
[1] World Health Organization, “Blindness and vision impairment,” [Online], Available: https://www.who.int/news-
room/fact-sheets/detail/blindness-and-visual-impairment.
[2] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc.
of the IEEE, vol. 86, no. 11, 1998, pp. 2278-2324, doi: 10.1109/5.726791.
[3] Viola, P., and Jones, M. J., "Robust real-time face detection," International journal of computer vision, vol. 57,
no. 2, pp. 137-154, 2004.
[4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet classification with deep convolutional neural
networks," Commun. ACM, vol. 60, no. 6, pp. 84-90, May 2017, doi: 10.1145/3065386.
[5] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv
preprint arXiv:1409.1556, 2014.
[6] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D et al., "Going deeper with convolutions, " in
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1-9,
doi: 10.1109/CVPR.2015.7298594.
[7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778,
doi: 10.1109/CVPR.2016.90.
Int J Elec & Comp Eng ISSN: 2088-8708 
An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee)
3441
[8] Girshick, R., Donahue, J., Darrell, T., and Malik, J., "Rich Feature Hierarchies for Accurate Object Detection and
Semantic Segmentation, " in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH,
USA, 2014, pp. 580–587, doi: 10.1109/CVPR.2014.81.
[9] Girshick, Ross, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015,
pp. 1440-1448.
[10] He, K., Zhang, X., Ren, S., and Sun, J., "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual
Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904-1916, Sep. 2015,
doi: 10.1109/TPAMI.2015.2389824.
[11] Ren, S., He, K., Girshick, R., and Sun, J., "Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017,
doi: 10.1109/TPAMI.2016.2577031.
[12] He, K., Gkioxari, G., Dollár, P., and Girshick, R., "Mask r-cnn," in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2961-2969.
[13] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., and Berg, A. C., "Ssd: Single shot multibox
detector," in European conference on computer vision, Springer, Cham, 2016, pp. 21-37.
[14] Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., "You Only Look Once: Unified, Real-Time Object
Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA,
2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
[15] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690.
[16] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
[17] Chakraborty, C., Gupta, B., and Ghosh, S. K., "A review on telemedicine-based WBAN framework for patient
monitoring," in 2013 Telemedicine and e-Health, vol. 19 no. 8, pp. 619-626, 2013.
[18] Xiangrong Chen and A. L. Yuille, "A Time-Efficient Cascade for Real-Time Object Detection: With applications
for the visually impaired," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’05)-Workshops, San Diego, CA, USA, vol. 3, 2005, pp. 28-28, doi: 10.1109/CVPR.2005.399.
[19] Arakeri, M. P., Keerthana, N. S., Madhura, M., Sankar, A., Munnavar, T., "Assistive Technology for the Visually
Impaired Using Computer Vision," in 2018 International Conference on Advances in Computing, Communications
and Informatics (ICACCI), Bangalore, pp. 1725-1730, 2018, doi: 10.1109/ICACCI.2018.8554625.
[20] Fink, W., and Humayun, M., "Digital object recognition audio-assistant for the visually impaired," U.S. Patent
Application 11/030,678, Sep. 2005.
[21] Guevarra, E. C., Camama, M. I. R., and Cruzado, G. V., "Development of Guiding Cane with Voice Notification
for Visually Impaired individuals," International Journal of Electrical and Computer Engineering (IJECE), vol. 8,
no. 1, pp. 104-112, 2018.
[22] Jaejoon Kim, "Application on character recognition system on road sign for visually impaired: case study approach
and future," International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 1, pp. 778-785,
2020.
[23] Tapu, R., Mocanu, B., Bursuc, A., and Zaharia, T., "A Smartphone-Based Obstacle Detection and Classification
System for Assisting Visually Impaired People," in 2013 IEEE International Conference on Computer Vision
Workshops, Sydney, Australia, 2013, pp. 444-451, doi: 10.1109/ICCVW.2013.65.
[24] Felix, S. M., Kumar, S., and Veeramuthu, A., "A Smart Personal AI Assistant for Visually Impaired People," in
2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, 2018,
pp. 1245-1250, doi: 10.1109/ICOEI.2018.8553750.
[25] Shah, S., Bandariya, J., Jain, G., Ghevariya, M., and Dastoor, S., "CNN based Auto-Assistance System as a Boon
for Directing Visually Impaired Person," in 2019 3rd International Conference on Trends in Electronics and
Informatics (ICOEI), Tirunelveli, India, 2019, pp. 235-240, doi: 10.1109/ICOEI.2019.8862699.
[26] Bianco, S., Celona, L., Napoletano, P., and Schettini, R., "On the use of deep learning for blind image quality
assessment," Signal, Image and Video Processing, vol. 12, no. 2, pp. 355-362, Feb. 2018, doi: 10.1007/s11760-017-
1166-8.
[27] Wong, Y. C., Lai, J. A., Ranjit, S. S. S., Syafeeza, A. R., and Hamid, N. A., "Convolutional Neural Network for
Object Detection System for Blind People," Journal of Telecommunication, Electronic and Computer Engineering
(JTEC), vol. 11, no. 2, p. 6, 2019.
[28] Cai, W., Li, J., Xie, Z., Zhao, T., and Kang, L. U., "Street Object Detection Based on Faster R-CNN," In 2018 37th
Chinese Control Conference (CCC), 2018, pp. 9500-9503.
[29] Zhang, R., Yang, Y., Wang, W., Zeng, L., Chen, J., and McGrath, S., "An Algorithm for Obstacle Detection based
on YOLO and Light Filed Camera," in 2018 12th International Conference on Sensing Technology (ICST),
Limerick, pp. 223-226, 2018, doi: 10.1109/ICSensT.2018.8603600.
[30] Shinde, S., Kothari, A., and Gupta, V., "YOLO based Human Action Recognition and Localization," Procedia
Computer Science, vol. 133, pp. 831-838, 2018, doi: 10.1016/j.procs.2018.07.112.
[31] Rahman, F., Ritun, I. J., Farhin, N., and Uddin, J., "An assistive model for visually impaired people using YOLO
and MTCNN," in Proceedings of the 3rd International Conference on Cryptography, Security and Privacy - ICCSP
’19, Kuala Lumpur, Malaysia, pp. 225-230, 2019, doi: 10.1145/3309074.3309114.
[32] Garg, D., Goel, P., Pandya, S., Ganatra, A., and Kotecha, K., "A Deep Learning Approach for Face Detection using
YOLO," in 2018 IEEE Punecon, Pune, India, pp. 1-4, 2018, doi: 10.1109/PUNECON.2018.8745376.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442
3442
[33] Lan, W., Dang, J., Wang, Y., and Wang, S., "Pedestrian Detection Based on YOLO Network Model," in 2018 IEEE
International Conference on Mechatronics and Automation (ICMA), Changchun, 2018, pp. 1547–1551,
doi: 10.1109/ICMA.2018.8484698.
[34] W. Yang and Z. Jiachun, "Real-time face detection based on YOLO," in 2018 1st IEEE International Conference
on Knowledge Innovation and Invention (ICKII), Jeju, 2018, pp. 221-224, doi: 10.1109/ICKII.2018.8569109.
[35] Huang, R., Pedoeem, J., and Chen, C., "YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for
Non-GPU Computers," in 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018,
pp. 2503-2510, doi: 10.1109/BigData.2018.8621865.
[36] Pehlivan, S., Unay, M., and Akan, A., "Designing an Obstacle Detection and Alerting System for Visually Impaired
People on Sidewalks," in 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, 2019, pp. 1-4,
doi: 10.1109/TIPTEKNO.2019.8895181.
[37] S. Tosun and E. Karaarslan, "Real-Time Object Detection Application for Visually Impaired People: Third Eye," in
2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018,
pp. 1-6, doi: 10.1109/IDAP.2018.8620773.
[38] S. Mahurkar, "Integrating YOLO Object Detection with Augmented Reality for iOS Apps," in 2018 9th IEEE
Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York City,
NY, USA, 2018, pp. 585-589, doi: 10.1109/UEMCON.2018.8796579.
[39] Ćorović, A., Ilić, V., Ðurić, S., Marijan, M., and Pavković, B., "The Real-Time Detection of Traffic Participants
Using YOLO Algorithm," in 2018 26th Telecommunications Forum (TELFOR), Belgrade, pp. 1-4, 2018,
doi: 10.1109/TELFOR.2018.8611986.
BIOGRAPHIES OF AUTHORS
Nachirat Rachburee is a lecturer at Department of Computer Engineering, Faculty of
Engineering, Rajamangala University of Technology Thanyaburi, Pathum Thani, Thailand. His
research interests include Data Mining, Big data analytics, Deep Learning, Neural Networks and
Predictive analytics.
Wattna Punlumjeak is a lecturer at Department of Computer Engineering, Faculty of
Engineering, Rajamangala University of Technology Thanyaburi, Pathum Thani, Thailand. Her
research interests include Data Mining, Big data analytics, Deep Learning, Neural Networks and
Predictive analytics.

More Related Content

PDF
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
PDF
IISc Internship Report
PDF
IRJET - Direct Me-Nevigation for Blind People
PDF
Adversarial Multi Scale Features Learning for Person Re Identification
PDF
Face Recognition and Increased Reality System for Mobile Devices
PDF
Gender Classification using SVM With Flask
PDF
IRJET- Automated Detection of Gender from Face Images
PDF
The deep learning technology on coco framework full report
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IISc Internship Report
IRJET - Direct Me-Nevigation for Blind People
Adversarial Multi Scale Features Learning for Person Re Identification
Face Recognition and Increased Reality System for Mobile Devices
Gender Classification using SVM With Flask
IRJET- Automated Detection of Gender from Face Images
The deep learning technology on coco framework full report

What's hot (20)

PDF
Ijetcas14 465
PDF
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
PDF
An Extensive Review on Generative Adversarial Networks GAN’s
PPTX
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
PDF
Deep convolutional neural network for hand sign language recognition using mo...
PDF
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
PDF
Deep Learning for X ray Image to Text Generation
PDF
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
PDF
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
PDF
Ijarcce 27
PDF
AI that/for matters
PPTX
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
PDF
Performance Comparison of Face Recognition Using DCT Against Face Recognition...
PDF
Face Recognition Techniques - An evaluation Study
PDF
IRJET- Detection of Writing, Spelling and Arithmetic Dyslexic Problems in...
PDF
Edge AI: Deep Learning techniques for Computer Vision applied to Embedded Sys...
PDF
Machine Learning:
PDF
Deep learning 1.0 and Beyond, Part 1
PDF
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
PDF
AHP validated literature review of forgery type dependent passive image forge...
Ijetcas14 465
REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES
An Extensive Review on Generative Adversarial Networks GAN’s
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
Deep convolutional neural network for hand sign language recognition using mo...
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
Deep Learning for X ray Image to Text Generation
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
Ijarcce 27
AI that/for matters
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
Performance Comparison of Face Recognition Using DCT Against Face Recognition...
Face Recognition Techniques - An evaluation Study
IRJET- Detection of Writing, Spelling and Arithmetic Dyslexic Problems in...
Edge AI: Deep Learning techniques for Computer Vision applied to Embedded Sys...
Machine Learning:
Deep learning 1.0 and Beyond, Part 1
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
AHP validated literature review of forgery type dependent passive image forge...
Ad

Similar to An assistive model of obstacle detection based on deep learning: YOLOv3 for visually impaired people (20)

PDF
Smart Navigation Assistance System for Blind People
PDF
ooObject detection and Recognization.pdf
PDF
IRJET- Object Detection and Recognition for Blind Assistance
PDF
Voice Enable Blind Assistance System -Real time Object Detection
PDF
Drishyam - Virtual Eye for Blind
PPTX
ppt - of a project will help you on your college projects
PDF
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
PDF
Assistance Application for Visually Impaired - VISION
PDF
A lightweight you only look once for real-time dangerous weapons detection
PDF
IRJET- Alternate Vision Assistance: For the Blind
PDF
OBJECT IDENTIFICATION
PDF
information-11-00583-v3.pdf
PPTX
Deep learning based object detection
PDF
Sanjaya: A Blind Assistance System
PPTX
ppt - Copy for projects will help you further
PDF
Real Time Object Detection System with YOLO and CNN Models: A Review
PPTX
6. PRESENTATION REAL TIME OBJECT DETECTION.pptx
PDF
You only look once model-based object identification in computer vision
PDF
Object Detection Using YOLO Models
PPTX
OBJECT DETECTION FOR VISUALLY IMPAIRED USING TENSORFLOW LITE.pptx
Smart Navigation Assistance System for Blind People
ooObject detection and Recognization.pdf
IRJET- Object Detection and Recognition for Blind Assistance
Voice Enable Blind Assistance System -Real time Object Detection
Drishyam - Virtual Eye for Blind
ppt - of a project will help you on your college projects
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
Assistance Application for Visually Impaired - VISION
A lightweight you only look once for real-time dangerous weapons detection
IRJET- Alternate Vision Assistance: For the Blind
OBJECT IDENTIFICATION
information-11-00583-v3.pdf
Deep learning based object detection
Sanjaya: A Blind Assistance System
ppt - Copy for projects will help you further
Real Time Object Detection System with YOLO and CNN Models: A Review
6. PRESENTATION REAL TIME OBJECT DETECTION.pptx
You only look once model-based object identification in computer vision
Object Detection Using YOLO Models
OBJECT DETECTION FOR VISUALLY IMPAIRED USING TENSORFLOW LITE.pptx
Ad

More from IJECEIAES (20)

PDF
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
PDF
Embedded machine learning-based road conditions and driving behavior monitoring
PDF
Advanced control scheme of doubly fed induction generator for wind turbine us...
PDF
Neural network optimizer of proportional-integral-differential controller par...
PDF
An improved modulation technique suitable for a three level flying capacitor ...
PDF
A review on features and methods of potential fishing zone
PDF
Electrical signal interference minimization using appropriate core material f...
PDF
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
PDF
Bibliometric analysis highlighting the role of women in addressing climate ch...
PDF
Voltage and frequency control of microgrid in presence of micro-turbine inter...
PDF
Enhancing battery system identification: nonlinear autoregressive modeling fo...
PDF
Smart grid deployment: from a bibliometric analysis to a survey
PDF
Use of analytical hierarchy process for selecting and prioritizing islanding ...
PDF
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
PDF
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
PDF
Adaptive synchronous sliding control for a robot manipulator based on neural ...
PDF
Remote field-programmable gate array laboratory for signal acquisition and de...
PDF
Detecting and resolving feature envy through automated machine learning and m...
PDF
Smart monitoring technique for solar cell systems using internet of things ba...
PDF
An efficient security framework for intrusion detection and prevention in int...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Embedded machine learning-based road conditions and driving behavior monitoring
Advanced control scheme of doubly fed induction generator for wind turbine us...
Neural network optimizer of proportional-integral-differential controller par...
An improved modulation technique suitable for a three level flying capacitor ...
A review on features and methods of potential fishing zone
Electrical signal interference minimization using appropriate core material f...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Bibliometric analysis highlighting the role of women in addressing climate ch...
Voltage and frequency control of microgrid in presence of micro-turbine inter...
Enhancing battery system identification: nonlinear autoregressive modeling fo...
Smart grid deployment: from a bibliometric analysis to a survey
Use of analytical hierarchy process for selecting and prioritizing islanding ...
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
Adaptive synchronous sliding control for a robot manipulator based on neural ...
Remote field-programmable gate array laboratory for signal acquisition and de...
Detecting and resolving feature envy through automated machine learning and m...
Smart monitoring technique for solar cell systems using internet of things ba...
An efficient security framework for intrusion detection and prevention in int...

Recently uploaded (20)

PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Sustainable Sites - Green Building Construction
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Project quality management in manufacturing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
web development for engineering and engineering
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Well-logging-methods_new................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Sustainable Sites - Green Building Construction
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT-1 - COAL BASED THERMAL POWER PLANTS
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Safety Seminar civil to be ensured for safe working.
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
OOP with Java - Java Introduction (Basics)
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
bas. eng. economics group 4 presentation 1.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
CH1 Production IntroductoryConcepts.pptx
Project quality management in manufacturing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
web development for engineering and engineering
Fundamentals of safety and accident prevention -final (1).pptx
573137875-Attendance-Management-System-original
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Well-logging-methods_new................

An assistive model of obstacle detection based on deep learning: YOLOv3 for visually impaired people

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 11, No. 4, August 2021, pp. 3434~3442 ISSN: 2088-8708, DOI: 10.11591/ijece.v11i4.pp3434-3442  3434 Journal homepage: http://ijece.iaescore.com An assistive model of obstacle detection based on deep learning: YOLOv3 for visually impaired people Nachirat Rachburee, Wattana Punlumjeak Department of Computer Engineering, Faculty of Engineering, Rajamangala University of Technology Thanyaburi, Pathum Thani, Thailand Article Info ABSTRACT Article history: Received Jul 31, 2020 Revised Dec 22, 2020 Accepted Jan 19, 2021 The World Health Organization (WHO) reported in 2019 that at least 2.2 billion people were visual-impairment or blindness. The main problem of living for visually impaired people have been facing difficulties in moving even indoor or outdoor situations. Therefore, their lives are not safe and harmful. In this paper, we proposed an assistive application model based on deep learning: YOLOv3 with a Darknet-53 base network for visually impaired people on a smartphone. The Pascal VOC2007 and Pascal VOC2012 were used for the training set and used Pascal VOC2007 test set for validation. The assistive model was installed on a smartphone with an eSpeak synthesizer which generates the audio output to the user. The experimental result showed a high speed and also high detection accuracy. The proposed application with the help of technology will be an effective way to assist visually impaired people to interact with the surrounding environment in their daily life. Keywords: Assistive model Deep learning Obstacle detection Visually impaired YOLOv3 This is an open access article under the CC BY-SA license. Corresponding Author: Wattana Punlumjeak Department of Computer Engineering Rajamangala University of Technology Thanyaburi 39 Moo 1, Klong 6, Khlong Luang Pathum Thani 12110 Thailand Email: wattana.p@en.rmutt.ac.th 1. INTRODUCTION The visual-impairment or blindness people in the world today is at least 2.2 billion reported by the World Health Organization (WHO) on 8 October 2019. The definition of classification of diseases 11(2018) classifies vision impairment by visual acuity worse into two groups: distance, and near presenting vision impairment. The visual acuity worse presenting between 6/12 and 6/60 of distance is defined as vision impairment, whereas visual acuity worse presenting than 3/60 of distance is defined as blindness [1]. One difficulty in the daily life of the visually impaired or blind people is living an invisible life indoor or outdoor environment. Although, guide dogs or white cane still the most popular tool used for obstacle detectors to navigate but the visually impaired people cannot know what things or the name of the obstacles are. A large number of people who visually impaired or blinded have been realized for the researcher to find a technology or a solution to assist them in their daily life. Object detection, image processing and machine learning are some of the popular topics and have become rapidly growing fields. Object detection is a computer technology that deals with detecting instances of semantic objects of a certain class such as humans, cars, dogs, or traffic signs in digital images and videos. Machine learning is a subset of application of artificial intelligence (AI) that subject in the scientific study of algorithms and statistical models which provides systems the ability to automatically learn and improve itself from experience without being explicitly programmed. Machine learning can be categorized into supervised,
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708  An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee) 3435 semi-supervised or unsupervised. Classification is one of the supervised learning algorithms in machine learning category. The classification used in object detection is to classify the object into a certain class that has learned. Deep learning is part of a broader family of machine learning methods based on artificial neural networks that use multiple layers to progressively extract higher-level features from the raw input. In image processing, lower layers of neural networks may identify edges, while higher layers may identify the concepts relevant to object in that image such as humans, dogs, cats, or cars. In this research, an efficient algorithm in machine learning is proposed. The PASCAL VOC2007 and the PASCAL VOC2012 data set is used to train the machine. The prototype of the system on the screen of the smartphone is developed to find the best assistive model for the visually impaired. The paper is organized as follows: after the introduction, literature review, and related works are presented in section 2, follow by the research method and proposed experiment in section 3, the result and discussion in section 4. Finally, in section 5 we provide conclusive remarks and our future work. 2. LITERATURE REVIEW AND RELATED WORK In the past, one of the main tasks of machine learning was to classify things by creating classifiers that could classify whether the object in the image was a person or an animal (e.g. dog, cat) or any other object. In this era, most researchers had focused on finding and creating effective classifiers, from simple linear classifiers that combine features from linear combination until support vector machine (SVM) classifier used the kernel function to transform these features into mathematical kernel space. When research on the classification of things became saturated, researchers began to move on to more difficult and challenging problems, which were "detecting and classifying" what was an interesting object in the image. The paper [2] that had known as the pioneer of object recognition which used a convolutional neural network (CNN) with gradient-based learning to handwritten character recognition. The breakthrough in computer vision, the face detector system by Viola and Jones [3]. The main idea of this research has created a cascade classifier and combined it with AdaBoosting learning algorithm instead of using a one classifier. At that time, Viola and Jones research paper was considered state-of-the-art for object detection. Due to the limitations of the processors that are not fast enough, therefore CNN classifier was not received much attention. In the famous annual computer vision competition, imagenet large-scale visual recognition challenge in 2012: ILSVRC, Alex and his team presented a deep convolutional neural network architecture called AlexNet [4]. AlexNet showed the best performance in the competition. So, CNN is becoming more and more popular, and many CNN models and architecture had been improved from the previous AlexNet structure. After that, more research that adapted and fine-tuning from the previous architecture had been proposed e.g. VGG [5], GoogLeNet which its codenamed was Inception [6], Microsoft ResNet [7] and more. The advent of region-based CNNs (R-CNN) which the authors purposed to solved object detection problems [8]. The R-CNN processes were split into two steps: region proposal step and the classification step. Region proposal step used selection search which proposed region of interest (ROI) and generated different 2000 regions then extract feature by CNN named AlexNet. Then, classified each region using linear SVMs in the classification step. The same author from R-CNN [9] improved the Fast R-CNN from the R-CNN in the problem of speed by used ROI pooling [10] through a ConvNet to extracted the feature and used a fully connected layer instead of SVM to classification or recognition. In 2016, Faster R-CNN was presented [11]. Faster R-CNN was improved from Fast R-CNN by replacing region proposals with region proposal network (RPN) after the last convolutional layer. Faster R-CNN had two outputs: a bounding-box offset for each candidate object and a class label of ROI. Mask R-CNN [12] extended Faster R-CNN by adding a mask branch to predict a segmentation mask on each ROI while the existing branch for classification and bounding box regression. All the above of object detectors are the state of the art which based on a two-stage framework: The first stage generates region proposal to localize the object in the image, the second stage classifies the object. Despite the success of two-stage detectors, one-stage detectors are also applied. Single shot multibox detector (SSD) was presented in 2016 [13] which based on standard network architecture: VGG-16. The SSD produced a bounding boxes in different aspect ratios and score for each category of presence object. Another famous one-stage detector is you only look once (YOLO) [14]. YOLO, a unified architecture which straightforward, simple and extremely fast in which the network architecture was inspired by GoogleNet, then is called Darknet. The network architecture has 24 convolutional layers working as feature extractors and 2 fully connected layers for the predictions in which the framework is trained on the ImageNet-1000 dataset. The YOLO architecture [14] showed in Figure 1. The YOLO used algorithms which based on regression where the process of detection, localization, and classification the object for the input image will take place in a single pass. The YOLO detection system process start by resizes the input image into 448 X 448 and then divided into S X S grid cell, then fed into the single convolutional network. Each
  • 3.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442 3436 grid cell predicts B bounding box and confidence score which output of each bounding box consists of 5 prediction values: offset values (x, y, w, and h) where x and y are the coordinates of the object in the input image, w and h are the width and height of the object respectively for each of the bounding box, while the last prediction value is confidence score or class probabilities that given in terms of an IOU (intersection over union), which should have the object exist in the bounding box. The bounding box which has a high confidence score above the threshold value is selected and then used to locate the object within the image. Figure 1. The YOLO architecture [14] YOLO or named YOLOv1 has a limitation, that it could not find small objects in the image if they appeared as a cluster or group and difficult found in a generalization of objects if the image is different dimensions from the trained image. In December 2016, the second version of the YOLO has named YOLOv2 or YOLO9000 real-time framework for detection categories of the object more than 9000 categories have been published [15]. Mainly new thing was to introduce the anchor boxes which are designed for a given dataset by using k-means clustering to responsible to predict the bounding box. The architecture of YOLOv2 used the Darknet-19 architecture with 19 convolutional layers and 5 max-pooling layers and then a softmax layer for classification of the objects. YOLOv3: An incremental improvement has been published in April 2018 [16]. YOLOv3 has been improved from the previous version with a high accuracy of classifying the objects. To predict the score for the objects for each bounding box, YOLOv3 uses logistic regression and also used independent logistic classifiers for each box to predict the classes of the bounding box which may contain an object instead of softmax. YOLOv3 uses the Darknet-53 network [16] for feature extractor which has 53 convolutional layers which showed in Figure 2. One of the challenging problems in the field of object detection and machine learning is assisting people who visually impaired. Many researchers proposed their work which aims to help visually impaired in daily life. Patient monitoring framework in telemedicine system was presented in different scenarios [17]. A greedy algorithm was developed to design cascade for applied real-time text detector for the visually impaired [18]. Arakeri et al. proposed a raspberry pi with NoIR camera to captures the readable material around the visually impaired and used a speech synthesis to generate sound in regional language [19]. Fink and Humayun [20], and more researcher [21, 22] presented the invention for the visually impaired. A digital camera mounted on the person’s eye or head is used to take snapshots of an image on demand and provided to an image processing algorithm. Edge detection techniques are used to identify the object in the image and classified the known object by artificial neural networks that have been trained. The invention could determine the size, distance from another object and announced the computer-based voice synthesizer to describe the descriptive sentence for the blind. Tapu et al. [23] introduced a real-time obstacle detection and classification to assist visually impaired people in indoor and outdoor environments by handling a smartphone device with the help of a chest-mounted harness. The authors proposed the step of object detection by extract an image grid and using the multiscale Lucas Kanade algorithm to track the interested point. Then, estimate the motion classes with an agglomerative clustering technique to classify into clusters and refined them with the K-NN algorithm. The step of moving object classification incorporates the HOG descriptor into the bag of visual words (BoVW) retrieval framework. The results of the experiment in different environments achieved high accuracy rates and efficient to a blind person. The AI assistant through
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708  An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee) 3437 an android mobile application for visually impaired was proposed by [24]. The group of researchers had focused their idea on image recognition, currency recognition, text recognition, chatbot and voice assistant by using voice command via interaction with the environment. Their application was developed on the google cloud platform which used cloud API libraries. Convolution neural networks for object detection systems for visually impaired or blind people were continually improved. Convolutional neural network and Haar cascade classifiers were compared by Shah et al. [25] to conduct a suitable algorithm to assist the visually impaired or blind person for a real-time scenario. The dataset used in the training process of CNN was COCO 2017. The experiment was conducted that CNN is more high accuracy to detect multiple objects than Haar cascade for real-time applications. Bianco et al. [26] presented a category-based image quality assessment named DeepBIQ that used to extract features from a CNN fine-tuned for image quality task. The group of researchers [27] used the single shot multibox detector (SSD) in their system to identified objects after a webcam captured a real-time scene and extracted. Raspberry Pi 3 was used as a prototype, and the audio-based detector was generated the detection information as sound to the connected headphone. The model was worked well although offline condition compared with fast R-CNN. An experiment of object detection and localization in the street environment model was proposed by [28]. The pre-trained model based on faster R-CNN was used with the COCO dataset while transfer learning was fine-tuned. The self- made dataset was acquired from the internet in different kinds include the object in the urban street. They concluded that faster R-CNN on a self-made dataset improved average accuracy and the fine-tuned network was effective. Figure 2. Darknet-53 architecture [16] In a single-stage object detection network, there are much interesting research and its applications: Detecting obstacles with the light field camera was proposed by [29]. YOLO, deep learning was used to classify objects in the image under the indoor environment into categories. The group of researchers had presented that the obstacle was accurately classified and was getting a high accuracy in size and their position. Human action recognition with YOLO object detection in the frame of the video was used by [30] with the LIRIS dataset. The proposed were presented effectively in terms of action label, confidence score, and localized action. YOLO and multi-task cascaded neural networking (MTCNN) structure were proposed by Rahman et al. [31] to implement an assistive model for visually impaired people on Raspberry Pi for object detection and facial recognition. The personalized dataset used in this proposed model consists of 3 positions: left, right, and center of the new face image and the name of that image act as a label of the data. The model achieved 6-7 FPS with a 63-80% accuracy rate for object detection while the accuracy rate of facial recognition achieved an 80-100%. To improve the accuracy of face detection, Garg et al. [32] used the YOLO framework on face detection dataset and benchmark (FDDB) dataset to train their proposed work. A model compared the execution time and the performance on two different machines. After fine-tuning all
  • 5.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442 3438 parameters and the suitable values of the proposed model, the accuracy was compared with Harr cascade and R-CNN algorithms. To improve the YOLOv2 structure for pedestrian detection, Lan et al. [33] proposed a YOLO-R structure which added three layers of pedestrian feature in front of the deep YOLOv2 network and also changed the passthrough layers to increase the ability of the network. The results of the proposed model had shown the high accuracy of pedestrian detection while the false rate and the miss rate was reduced, compared with the YOLOv2 network on the INRIA dataset. A real-time face detection model was proposed by Wang, and Jiachun [34] on the WIDER FACE dataset with the YOLO algorithm. The 20 various images size were selected from three datasets: Celeb Faces, FDDB, and WIDER FACE dataset to use for a testing phase in the proposed model. They had shown the high-speed rate of detection, reduced error rate and strong robustness of the YOLOv3 compared with traditional algorithms. To handle a real-time object detection for non-GPU computers, the group of researchers [35] was proposed the YOLO-LITE model which the best trial experiment was run on the COCO dataset. They had shown that YOLO-LITE was a faster, smaller, and more efficient model to detect the object compared with the state of the art model in a variety of devices. A pre- trained ssdlite_mobilenet_v2_coco_2018_05_09 model as a feature extractor was used for obstacle detection in sidewalks design and alert system for visually impaired people [36]. Raspberry Pi and Pi camera were used as hardware prototype and eSpeak was used as a speech synthesizer to represent the direction of the object by headphones. Whereas, the vibration sensor was activated when the object was detected and recognized. The application for visually impaired people for an android platform was proposed by [37]. The researchers had claimed that their application would be a virtual third eye for visually impaired. A suitable chest strap was designed to hold the phone [38]. Tiny-YOLO was used in their experiment and integrate with ARKit configuration to detect an object with augmented reality for iOS applications. The training period used Tiny YOLO with a Darknet base network on Turicreate engine and INRIA annotation for the Graz-02 dataset, while they tested the model by using a 100-random set of cars and bikes in different background, shape, and size. In their conclusion, the model could detect an object and overlayed 3D graphics at the location of an object in an effective way. [39] YOLOv3 algorithm was used to detect the five classes of a real-time object of traffic participants or road signalization in advanced driver assistance systems (ADAS). The proposed system evaluated on NVidia GeForce GTX 1060GPU by using the weights on the COCO pre-trained model and trained on the Berkley deep drive dataset. The effectiveness of the proposed model had shown in the variety of driving conditions. 3. RESEARCH METHOD The proposed ideas of our work divided into two parts: Train the detection model and then developed the application. Our proposed method is shown as in a Figure 3(a) and (b). Our experiment model was done on Google Colaboratory: Colab 12 GB-RAM GPU. First, we prepared the data used by download the PASCAL VOC2007 and the PASCAL VOC2012 from The PASCAL Visual Object Classes Homepage. The two datasets have twenty object classes that have been selected are:  Person: Person  Animal: Bird, cat, cow, dog, horse, sheep  Vehicle: Airplane, bicycle, boat, bus, car, motorbike, train  Indoor: Bottle, chair, dining table, potted plant, sofa, television/monitor The PASCAL VOC2007 have train/validation: 9,963 images containing 24,640 annotated objects and PASCAL VOC2012 have train/ validation: 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations. The two datasets have been split into 50% for training/validation and 50% for testing. In our experiment, we combined two dataset together as a dataset for training set and used PASCAL VOC2007 test set for data testing set. Second, train YOLOv3 by using Darknet on Colab with the dataset we prepared from the process above and then validate with testing dataset. The YOLOv3 structure shown as Figure 4. (a) (b) Figure 3. Proposed method (a) Train the detection model, and (b) The prototype of detection application
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708  An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee) 3439 Figure 4. The YOLOv3 structure 4. RESULTS AND DISCUSSION From our proposed methodology above, after training the YOLOv3 with the dataset prepared above, a detection model had been as an output. So, we export the model from Google COLAB to our local drive. Then, we developed a prototype of an application on a smartphone which installed the obstacle detection model. We designed the user interface (UI) in a simple way and used eSpeak as a function to generate the audio output. The eSpeak is open-source software that synthesizes text to speech in English and other languages. The example of the indoor and outdoor images we captured in the real-time view or in the real situation which mimics as an input to the obstacle detection application shown in Figure 5(a) and 5(b). (a) (b) Figure 5. The example of the image in the real situation The captured image was then forwarded to the obstacle detection model to classify the object. After that, the system showed the class of the output it detected and generated a voice synthesis of the detected object to notify or assist the visually impaired or blind people to identify the object. The output of the system shown in Figure 6(a) and 6(b). The prototype of the obstacle detection system on the screen of the smartphone shown in Figure 7. The experimental in real situation results based on YOLOv3 showed a high speed and also high detection accuracy in the real-time view. The proposed model of the obstacle detection system on a smartphone will be assisting visually impaired people about the surrounding environment.
  • 7.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442 3440 (a) (b) Figure 6. The output of the obstacle detection system Figure 7. The output of the system prototype on the smartphone 5. CONCLUSION In this paper, we have introduced a novel framework of an application on a smartphone for obstacle detection and classification which based on deep learning: YOLOv3. Our proposed application on a smartphone works in real-time to capture an image and forward it to the obstacle detection system. The experiment results prove the effectiveness of the system which not only able to show the output of the obstacle detected and can classify the name in the class of the obstacle but also can generate the audio output in their own languages. An application of obstacle detection and classification for visually impaired people will be a benefit in safety and comfort for a better quality of living in daily life. In our future works, we will study the distance between visually impaired people and obstacles. We plan to study a similar triangle, Euclidean distance, and other theories and then integrate it to improve the overall application. REFERENCES [1] World Health Organization, “Blindness and vision impairment,” [Online], Available: https://www.who.int/news- room/fact-sheets/detail/blindness-and-visual-impairment. [2] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. of the IEEE, vol. 86, no. 11, 1998, pp. 2278-2324, doi: 10.1109/5.726791. [3] Viola, P., and Jones, M. J., "Robust real-time face detection," International journal of computer vision, vol. 57, no. 2, pp. 137-154, 2004. [4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet classification with deep convolutional neural networks," Commun. ACM, vol. 60, no. 6, pp. 84-90, May 2017, doi: 10.1145/3065386. [5] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014. [6] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D et al., "Going deeper with convolutions, " in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594. [7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
  • 8. Int J Elec & Comp Eng ISSN: 2088-8708  An assistive model of obstacle detection based on deep learning: YOLOv3 for… (Nachirat Rachburee) 3441 [8] Girshick, R., Donahue, J., Darrell, T., and Malik, J., "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, " in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 580–587, doi: 10.1109/CVPR.2014.81. [9] Girshick, Ross, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448. [10] He, K., Zhang, X., Ren, S., and Sun, J., "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904-1916, Sep. 2015, doi: 10.1109/TPAMI.2015.2389824. [11] Ren, S., He, K., Girshick, R., and Sun, J., "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031. [12] He, K., Gkioxari, G., Dollár, P., and Girshick, R., "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. [13] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., and Berg, A. C., "Ssd: Single shot multibox detector," in European conference on computer vision, Springer, Cham, 2016, pp. 21-37. [14] Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91. [15] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690. [16] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018. [17] Chakraborty, C., Gupta, B., and Ghosh, S. K., "A review on telemedicine-based WBAN framework for patient monitoring," in 2013 Telemedicine and e-Health, vol. 19 no. 8, pp. 619-626, 2013. [18] Xiangrong Chen and A. L. Yuille, "A Time-Efficient Cascade for Real-Time Object Detection: With applications for the visually impaired," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops, San Diego, CA, USA, vol. 3, 2005, pp. 28-28, doi: 10.1109/CVPR.2005.399. [19] Arakeri, M. P., Keerthana, N. S., Madhura, M., Sankar, A., Munnavar, T., "Assistive Technology for the Visually Impaired Using Computer Vision," in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, pp. 1725-1730, 2018, doi: 10.1109/ICACCI.2018.8554625. [20] Fink, W., and Humayun, M., "Digital object recognition audio-assistant for the visually impaired," U.S. Patent Application 11/030,678, Sep. 2005. [21] Guevarra, E. C., Camama, M. I. R., and Cruzado, G. V., "Development of Guiding Cane with Voice Notification for Visually Impaired individuals," International Journal of Electrical and Computer Engineering (IJECE), vol. 8, no. 1, pp. 104-112, 2018. [22] Jaejoon Kim, "Application on character recognition system on road sign for visually impaired: case study approach and future," International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 1, pp. 778-785, 2020. [23] Tapu, R., Mocanu, B., Bursuc, A., and Zaharia, T., "A Smartphone-Based Obstacle Detection and Classification System for Assisting Visually Impaired People," in 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2013, pp. 444-451, doi: 10.1109/ICCVW.2013.65. [24] Felix, S. M., Kumar, S., and Veeramuthu, A., "A Smart Personal AI Assistant for Visually Impaired People," in 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, 2018, pp. 1245-1250, doi: 10.1109/ICOEI.2018.8553750. [25] Shah, S., Bandariya, J., Jain, G., Ghevariya, M., and Dastoor, S., "CNN based Auto-Assistance System as a Boon for Directing Visually Impaired Person," in 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019, pp. 235-240, doi: 10.1109/ICOEI.2019.8862699. [26] Bianco, S., Celona, L., Napoletano, P., and Schettini, R., "On the use of deep learning for blind image quality assessment," Signal, Image and Video Processing, vol. 12, no. 2, pp. 355-362, Feb. 2018, doi: 10.1007/s11760-017- 1166-8. [27] Wong, Y. C., Lai, J. A., Ranjit, S. S. S., Syafeeza, A. R., and Hamid, N. A., "Convolutional Neural Network for Object Detection System for Blind People," Journal of Telecommunication, Electronic and Computer Engineering (JTEC), vol. 11, no. 2, p. 6, 2019. [28] Cai, W., Li, J., Xie, Z., Zhao, T., and Kang, L. U., "Street Object Detection Based on Faster R-CNN," In 2018 37th Chinese Control Conference (CCC), 2018, pp. 9500-9503. [29] Zhang, R., Yang, Y., Wang, W., Zeng, L., Chen, J., and McGrath, S., "An Algorithm for Obstacle Detection based on YOLO and Light Filed Camera," in 2018 12th International Conference on Sensing Technology (ICST), Limerick, pp. 223-226, 2018, doi: 10.1109/ICSensT.2018.8603600. [30] Shinde, S., Kothari, A., and Gupta, V., "YOLO based Human Action Recognition and Localization," Procedia Computer Science, vol. 133, pp. 831-838, 2018, doi: 10.1016/j.procs.2018.07.112. [31] Rahman, F., Ritun, I. J., Farhin, N., and Uddin, J., "An assistive model for visually impaired people using YOLO and MTCNN," in Proceedings of the 3rd International Conference on Cryptography, Security and Privacy - ICCSP ’19, Kuala Lumpur, Malaysia, pp. 225-230, 2019, doi: 10.1145/3309074.3309114. [32] Garg, D., Goel, P., Pandya, S., Ganatra, A., and Kotecha, K., "A Deep Learning Approach for Face Detection using YOLO," in 2018 IEEE Punecon, Pune, India, pp. 1-4, 2018, doi: 10.1109/PUNECON.2018.8745376.
  • 9.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 4, August 2021 : 3434 - 3442 3442 [33] Lan, W., Dang, J., Wang, Y., and Wang, S., "Pedestrian Detection Based on YOLO Network Model," in 2018 IEEE International Conference on Mechatronics and Automation (ICMA), Changchun, 2018, pp. 1547–1551, doi: 10.1109/ICMA.2018.8484698. [34] W. Yang and Z. Jiachun, "Real-time face detection based on YOLO," in 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), Jeju, 2018, pp. 221-224, doi: 10.1109/ICKII.2018.8569109. [35] Huang, R., Pedoeem, J., and Chen, C., "YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers," in 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 2503-2510, doi: 10.1109/BigData.2018.8621865. [36] Pehlivan, S., Unay, M., and Akan, A., "Designing an Obstacle Detection and Alerting System for Visually Impaired People on Sidewalks," in 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, 2019, pp. 1-4, doi: 10.1109/TIPTEKNO.2019.8895181. [37] S. Tosun and E. Karaarslan, "Real-Time Object Detection Application for Visually Impaired People: Third Eye," in 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018, pp. 1-6, doi: 10.1109/IDAP.2018.8620773. [38] S. Mahurkar, "Integrating YOLO Object Detection with Augmented Reality for iOS Apps," in 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York City, NY, USA, 2018, pp. 585-589, doi: 10.1109/UEMCON.2018.8796579. [39] Ćorović, A., Ilić, V., Ðurić, S., Marijan, M., and Pavković, B., "The Real-Time Detection of Traffic Participants Using YOLO Algorithm," in 2018 26th Telecommunications Forum (TELFOR), Belgrade, pp. 1-4, 2018, doi: 10.1109/TELFOR.2018.8611986. BIOGRAPHIES OF AUTHORS Nachirat Rachburee is a lecturer at Department of Computer Engineering, Faculty of Engineering, Rajamangala University of Technology Thanyaburi, Pathum Thani, Thailand. His research interests include Data Mining, Big data analytics, Deep Learning, Neural Networks and Predictive analytics. Wattna Punlumjeak is a lecturer at Department of Computer Engineering, Faculty of Engineering, Rajamangala University of Technology Thanyaburi, Pathum Thani, Thailand. Her research interests include Data Mining, Big data analytics, Deep Learning, Neural Networks and Predictive analytics.