SlideShare a Scribd company logo
Convolutional Neural Networks and
Ensembles for Visually Impaired Aid
Fabricio Breve
São Paulo State University - UNESP
fabricio.breve@unesp.br
Motivation
• Approximately 2.2 billion people suffer from
some form of visual impairment.
• Including at least 1 billion with moderate or severe
distance vision impairment [40].
• The prevalence of distance vision impairment
is significantly higher in low- and middle-
income areas compared to high-income
regions [34].
• This population faces numerous difficulties in
their daily routines, mostly linked to mobility
and navigation.
• With advancements in computer vision and
related technologies, numerous navigation
systems have been proposed.
• Issues: many of them:
• require costly, bulky, and/or custom equipment;
• are too computationally intensive to run on
portable devices;
• require a network connection to a more powerful
remote server.
• White canes and guide dogs are currently the
most commonly utilized tools to aid visually
impaired (VI) individuals [15].
[34] Steinmetz, J.D., Bourne, R.R., Briant, P.S., Flaxman, S.R., Taylor, H.R., Jonas, J.B., Abdoli, A.A., Abrha, W.A., Abualhasan, A., Abu-Gharbieh, E.G., et al.: Causes of blindness and vision impairment in 2020 and trends
over 30 years, and prevalence of avoidable blindness in relation to vision 2020: the right to sight: na analysis for the global burden of disease study. The Lancet Global Health 9(2), e144-e160 (2021).
[40] World Health Organization: Vision impairment and blindness (Oct 2022), https://www.who.int/news-room/fact-sheets/detail/blindness-and-visualimpairment,accessed:2023-01-30.
[15] Islam, M.M., Sheikh Sadi, M., Zamli, K.Z., Ahmed, M.M.: Developing walking assistants for visually impaired people: A review. IEEE Sensors Journal 19(8), 2814-2828 (2019).
https://doi.org/10.1109/JSEN.2018.2890423.
Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023
04/07/2023 2
Motivation
• Recent surveys show that:
• Smartphone-based computer vision tools for the VI often employ outdated
image and video processing techniques [3];
• Researchers have started to adopt deep learning approaches [24]:
• These techniques have grown with the advent of increased computational power in
machines;
• However, carrying high-powered computational devices for vision-based assistive
solutions is not practical for users.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 3
[3] Budrionis, A., Plikynas, D., Daniu²is, P., Indrulionis, A.: Smartphonebased computer vision travelling aids for blind and visually impaired individuals: A systematic review. Assistive
Technology 34(2), 178194 (2022). https://doi.org/10.1080/10400435.2020.1743381, pMID: 32207640.
[24] Mandia, S., Kumar, A., Verma, K., Deegwal, J.K.: Vision-based assistive systems for visually impaired people: A review. In: Tiwari, M., Ismail, Y., Verma, K., Garg, A.K. (eds.) Optical
and Wireless Technologies. pp. 163172. Springer Nature Singapore, Singapore (2023).
Objectives
• Project Goal: build a system to assist visually impaired people.
• Requirement: execute on a single smartphone, without extra
accessories or connection requirements.
• Method: the smartphone takes pictures of the path and provides
audio and/or vibration feedback regarding potential obstacles, before
they are in the reach of the white cane.
• This Paper Goal: perform the classification step, based on
Convolutional Neural Networks (CNNs).
• Find the best CNN architecture for this task;
• Find the optimal learning hyperparameters.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 4
Contributions
• Prior study [2]: a framework that leverages CNNs, transfer learning, and semi-supervised
learning (SSL).
• The focus was to minimize computational costs and make it feasible for implementation on
smartphones without requiring additional hardware.
• This study: previous works are significantly expanded upon with the following key contributions:
1. Eight additional CNN models were added to the study, based on the cutting-edge
EfficientNet architecture [37], bringing the total number of networks evaluated to 25;
2. The K-Fold Cross Validation process was repeated five times, providing more robust results;
3. Image pre-processing functions were introduced to enhance image preparation for each
network type, resulting in improved accuracy in most cases;
4. Ensembles of CNNs were employed to boost the overall accuracy by leveraging the strengths
of multiple CNN architectures.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 5
The Dataset
• 342 images;
• Two classes:
• 175 images of “clear-path”;
• 167 images of “non clear-path”.
• The Dataset covers:
• Indoor and outdoor situations;
• Different types of floor;
• Dry and wet floor;
• Different amounts of light;
• Daylight and artificial light;
• Different types of obstacles:
• Stairs, trees, holes, animals, traffic
cones, etc.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 6
The full dataset is available at: https://github.com/fbreve/via-dataset
Clear Path Non-Clear Path
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 7
CNN
Architectures
• 25 evaluated
architectures
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 8
Model
Input Image
Resolution
Output of Last
Conv. Layer
Trainable
Parameters
DenseNet121 224 × 224 7 × 7 × 1024 7,085,314
DenseNet169 224 × 224 7 × 7 × 1664 12,697,858
DenseNet201 224 × 224 7 × 7 × 1920 18,339,074
EfficientNetB0 224 × 224 7 × 7 × 1280 4,171,774
EfficientNetB1 240 × 240 8 × 8 × 1280 6,677,410
EfficientNetB2 260 × 260 9 × 9 × 1408 7,881,604
EfficientNetB3 300 × 300 10 × 10 × 1536 10,893,226
EfficientNetB4 380 × 380 12 × 12 × 1792 17,778,378
EfficientNetB5 456 × 456 15 × 15 × 2048 28,603,314
EfficientNetB6 528 × 528 17 × 17 × 2304 41,031,002
EfficientNetB7 600 × 600 19 × 19 × 2560 64,115,026
InceptionResNetV2 299 × 299 8 × 8 × 1536 54,473,186
InceptionV3 299 × 299 8 × 8 × 2048 22,030,882
MobileNet 224 × 224 7 × 7 × 1024 3,338,434
MobileNetV2 224 × 224 7 × 7 × 1280 2,388,098
NASNetMobile 224 × 224 7 × 7 × 1056 4,368,532
ResNet101 224 × 224 7 × 7 × 2048 42,815,362
ResNet101V2 224 × 224 7 × 7 × 2048 42,791,426
ResNet152 224 × 224 7 × 7 × 2048 58,482,050
ResNet152V2 224 × 224 7 × 7 × 2048 58,450,434
ResNet50 224 × 224 7 × 7 × 2048 23,797,122
ResNet50V2 224 × 224 7 × 7 × 2048 23,781,890
VGG16 224 × 224 7 × 7 × 512 14,780,610
VGG19 224 × 224 7 × 7 × 512 20,090,306
Xception 299 × 299 10 × 10 × 2048 21,069,482
Proposed CNN Networks
• Weights initialization:
• Convolutional layers: pre-trained
weights from the Imagenet
dataset [28].
• Dense layers: He uniform variance
scaling initializer [7].
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 9
CNN
Input: (x, x, 3)
Output: (w, y, z)
Average Global
Pooling 2D
Input: (w, y, z)
Output: (z)
Dense
Input: (z)
Output: (128)
Dense
softmax
Input: (128)
Output: (2)
[28] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale
Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115(3),
211252 (2015). https://doi.org/10.1007/s11263-015-0816-y.
[7] He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on imagenet classication. In: Proceedings of the IEEE international
conference on computer vision. pp. 10261034 (2015).
Tested Scenarios
• Scenarios A to D:
• Adaptive learning rate: 10−3
to 10−5
.
• Adjustment factor: 0.5 when validation
accuracy did not increase in the last
two epochs.
• Scenarios E and F:
• Fixed learning rate:
• Dense layers: 10−3
; Convolutional
layers: 10−5
.
• All scenarios:
• Up to 50 epochs;
• Early stop: validation loss did not
decrease in the last 10 epochs.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 10
Config. Fine-Tuning
Different
Learning
Rates Optimizer
A No No RMSprop
B No No Adam
C Yes No RMSprop
D Yes No Adam
E Yes Yes RMSprop
F Yes Yes Adam
Tested Scenarios
• Software: Python with
TensorFlow.
• Hardware: 3 desktop computers
equipped with NVIDIA GeForce
GPU boards:
• GTX 970;
• GTX 1080;
• RTX 2060 SUPER.
• The code is available at GitHub:
• https://github.com/fbreve/via-py
• K-Fold Cross Validation:
• 𝑘 = 10
• 5 repetitions
• Validation subset:
• 20% of the training instances.
• Batch size: 16
• Exceptions:
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 11
Method Conf. C Conf. D Conf. E Conf. F
EfficientNetB3 16 - 8
EfficientNetB4 4 8 4 8
EfficientNetB5 2 4 2 4
EfficientNetB6 2 2 2 2
EfficientNetB7 1 1 1 1
Results:
Single
Networks
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 12
Method Conf. A Conf. B Conf. C Conf. D Conf. E Conf. F Average
MobileNet 0,9158 0,9152 0,9299 0,9257 0,8729 0,9105 0,9117
Xception 0,8749 0,8755 0,9374 0,9252 0,8934 0,9029 0,9016
EfficientNetB0 0,8877 0,8876 0,9427 0,9274 0,8724 0,8819 0,9000
EfficientNetB3 0,8901 0,8889 0,9404 0,9291 0,8727 0,8761 0,8995
EfficientNetB2 0,8725 0,8660 0,9391 0,9426 0,8672 0,8679 0,8926
EfficientNetB4 0,8813 0,8807 0,9304 0,9456 0,8414 0,8632 0,8904
EfficientNetB1 0,8908 0,8855 0,9369 0,9341 0,8482 0,8463 0,8903
DenseNet201 0,8841 0,8847 0,8847 0,8807 0,8696 0,8809 0,8808
InceptionResNetV2 0,8650 0,8644 0,8954 0,9217 0,8632 0,8668 0,8794
InceptionV3 0,8691 0,8657 0,8611 0,8965 0,8936 0,8807 0,8778
DenseNet169 0,8789 0,8766 0,8779 0,8854 0,8679 0,8713 0,8763
DenseNet121 0,8730 0,8671 0,8867 0,8912 0,8715 0,8680 0,8763
ResNet50 0,8947 0,8901 0,8139 0,8392 0,8801 0,8761 0,8657
ResNet101 0,8925 0,8919 0,8130 0,7919 0,8731 0,8626 0,8542
EfficientNetB5 0,8953 0,8947 0,8760 0,9233 0,6398 0,8327 0,8436
MobileNetV2 0,8924 0,8912 0,8324 0,7954 0,8116 0,8042 0,8378
ResNet152 0,8755 0,8754 0,7418 0,7724 0,8819 0,8638 0,8351
ResNet50V2 0,8539 0,8581 0,7273 0,8263 0,8516 0,8587 0,8293
EfficientNetB6 0,8719 0,8690 0,8514 0,8738 0,7383 0,7516 0,8260
ResNet101V2 0,8807 0,8790 0,6184 0,7256 0,8778 0,8779 0,8099
ResNet152V2 0,9106 0,9077 0,5942 0,6663 0,8890 0,8885 0,8094
VGG19 0,8263 0,8118 0,7916 0,6707 0,8746 0,8680 0,8072
VGG16 0,8263 0,8175 0,7877 0,6316 0,8759 0,8543 0,7989
NASNetMobile 0,8560 0,8578 0,6671 0,6997 0,7092 0,7050 0,7491
EfficientNetB7 0,8731 0,8714 0,5248 0,4999 0,5242 0,5230 0,6361
Average 0,8773 0,8749 0,8241 0,8289 0,8344 0,8433 0,8472
Ensembles
• Average of ensemble members for each of the 50 folds (𝑘 = 10,
repeated 5 times).
• Output of the last dense layer, before softmax, taken as probabilities for each
class.
• The best model for each of the 6 configurations were used.
• Two ensemble experiments:
• Multiple instances of the same model.
• Randomized: seeds, initial weights, validation subset, etc.
• From 1 to 10 instances of the best models in each configuration.
• Best 2 to 6 configurations.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 13
Results: Ensembles of Single CNN Models
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 14
Instances Conf. A Conf. B Conf. C Conf. D Conf. E Conf. F Average
1 0,9158 0,9152 0,9427 0,9456 0,8936 0,9105 0,9206
2 0,9187 0,9135 0,9451 0,9502 0,9065 0,9170 0,9252
3 0,9170 0,9176 0,9445 0,9532 0,9112 0,9193 0,9271
4 0,9164 0,9158 0,9485 0,9562 0,9176 0,9182 0,9288
5 0,9187 0,9182 0,9491 0,9585 0,9171 0,9217 0,9306
6 0,9176 0,9199 0,9549 0,9602 0,9182 0,9199 0,9318
7 0,9211 0,9182 0,9543 0,9579 0,9211 0,9164 0,9315
8 0,9164 0,9158 0,9509 0,9596 0,9194 0,9158 0,9297
9 0,9182 0,9176 0,9538 0,9596 0,9200 0,9193 0,9314
10 0,9188 0,9159 0,9526 0,9602 0,9194 0,9176 0,9308
Average 0,9179 0,9168 0,9496 0,9561 0,9144 0,9176 0,9287
Results: Ensembles of Multiple CNN Models
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 15
Instances of
each conf.
Conf.
D
Conf.
DC
Conf.
DCA
Conf.
DCAB
Conf.
DCABF
All
Conf. Average
1 0,9456 0,9532 0,9567 0,9550 0,9503 0,9491 0,9517
2 0,9502 0,9491 0,9544 0,9538 0,9486 0,9521 0,9514
3 0,9532 0,9544 0,9597 0,9545 0,9509 0,9539 0,9544
4 0,9562 0,9555 0,9614 0,9527 0,9515 0,9556 0,9555
5 0,9585 0,9567 0,9620 0,9544 0,9544 0,9573 0,9572
6 0,9602 0,9579 0,9655 0,9562 0,9538 0,9556 0,9582
7 0,9579 0,9544 0,9643 0,9574 0,9544 0,9550 0,9572
8 0,9596 0,9532 0,9626 0,9562 0,9526 0,9573 0,9569
9 0,9596 0,9555 0,9637 0,9568 0,9568 0,9550 0,9579
10 0,9602 0,9561 0,9626 0,9556 0,9579 0,9550 0,9579
Average 0,9561 0,9546 0,9613 0,9553 0,9531 0,9546 0,9558
Conclusions
• Comparison of 25 different CNN architectures to identify obstacles in the
path of visually impaired individuals.
• K-Fold Cross Validation was utilized with 𝑘 = 10 and five repetitions to
provide robust results.
• Architectures have low computational costs during inference, executing in
milliseconds on current smartphones.
• can be implemented without relying on external equipment or remote servers.
• Fine-tuning an EfficientNetB4 network achieved the highest accuracy of
0.9456.
• Improved to 0.9602 using an ensemble with six instances of the same network.
• Further increased to 0.9655 by adding six instances of fine-tuned EfficientNetB0 and
six instances of MobileNet with fixed weights in the convolutional layers.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 16
Conclusions
• The numerous computer simulations conducted in this study yielded
promising results for some CNN architectures and investigated the
use of:
• different optimizers (Adam and RMSprop);
• different learning strategies (single learning rate versus different rates for
convolution and dense layers);
• fixed versus fine-tuned pre-trained weights.
• Future work:
• expanding the proposed dataset by acquiring more images;
• exploring other approaches and modifications to the current framework to
further enhance classification accuracy.
04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 17
Convolutional Neural Networks and
Ensembles for Visually Impaired Aid
Fabricio Breve
São Paulo State University - UNESP
fabricio.breve@unesp.br

More Related Content

PPTX
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
PDF
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
PDF
Channel reconstruction through improvised deep learning architecture for high...
PDF
Compact optimized deep learning model for edge: a review
PDF
Improving AI surveillance using Edge Computing
PDF
ROAD POTHOLE DETECTION USING YOLOV4 DARKNET
PDF
14. 23759.pdf
PDF
Campus realities: forecasting user bandwidth utilization using Monte Carlo si...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Channel reconstruction through improvised deep learning architecture for high...
Compact optimized deep learning model for edge: a review
Improving AI surveillance using Edge Computing
ROAD POTHOLE DETECTION USING YOLOV4 DARKNET
14. 23759.pdf
Campus realities: forecasting user bandwidth utilization using Monte Carlo si...

Similar to Convolution Neural Networks and Ensembles for Visually Impaired Aid.pdf (20)

PDF
Deep Learning Initiative @ NECSTLab
PDF
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
PDF
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
PDF
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING
PPTX
1 st review pothole srm bi1 st review pothole srm bi1 st review pothole srm bi
PDF
Automatic Crack Detection Using Convolutional Neural Network
PDF
Object and Currency Detection for the Visually Impaired
PDF
20190725 li chun_lab_intro_v5
PDF
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
PDF
Smart Navigation Assistance System for Blind People
PDF
A novel cryptographic technique that emphasis visual quality and efficieny by...
PDF
Deep learning-based channel estimation with application to 5G and beyond netw...
PDF
IRJET- Rice QA using Deep Learning
PPTX
Machine Learning for Multimedia and Edge Information Processing.pptx
PDF
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
PDF
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
PDF
Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
PDF
IRJET- Object Detection and Recognition for Blind Assistance
PDF
Wi-Fi fingerprinting-based floor detection using adaptive scaling and weighte...
PPTX
weed detecion hhjjjjjhhbnjjjnjffhjjkkjhjjj
Deep Learning Initiative @ NECSTLab
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING
1 st review pothole srm bi1 st review pothole srm bi1 st review pothole srm bi
Automatic Crack Detection Using Convolutional Neural Network
Object and Currency Detection for the Visually Impaired
20190725 li chun_lab_intro_v5
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Smart Navigation Assistance System for Blind People
A novel cryptographic technique that emphasis visual quality and efficieny by...
Deep learning-based channel estimation with application to 5G and beyond netw...
IRJET- Rice QA using Deep Learning
Machine Learning for Multimedia and Edge Information Processing.pptx
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
IRJET- Object Detection and Recognition for Blind Assistance
Wi-Fi fingerprinting-based floor detection using adaptive scaling and weighte...
weed detecion hhjjjjjhhbnjjjnjffhjjkkjhjjj
Ad

Recently uploaded (20)

PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
Digital Literacy And Online Safety on internet
PPTX
Introduction to Information and Communication Technology
DOCX
Unit-3 cyber security network security of internet system
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
E -tech empowerment technologies PowerPoint
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PPTX
international classification of diseases ICD-10 review PPT.pptx
PDF
The Internet -By the Numbers, Sri Lanka Edition
PPT
tcp ip networks nd ip layering assotred slides
Unit-1 introduction to cyber security discuss about how to secure a system
Cloud-Scale Log Monitoring _ Datadog.pdf
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
SASE Traffic Flow - ZTNA Connector-1.pdf
Digital Literacy And Online Safety on internet
Introduction to Information and Communication Technology
Unit-3 cyber security network security of internet system
Tenda Login Guide: Access Your Router in 5 Easy Steps
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Paper PDF World Game (s) Great Redesign.pdf
E -tech empowerment technologies PowerPoint
PptxGenJS_Demo_Chart_20250317130215833.pptx
RPKI Status Update, presented by Makito Lay at IDNOG 10
Decoding a Decade: 10 Years of Applied CTI Discipline
WebRTC in SignalWire - troubleshooting media negotiation
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
international classification of diseases ICD-10 review PPT.pptx
The Internet -By the Numbers, Sri Lanka Edition
tcp ip networks nd ip layering assotred slides
Ad

Convolution Neural Networks and Ensembles for Visually Impaired Aid.pdf

  • 1. Convolutional Neural Networks and Ensembles for Visually Impaired Aid Fabricio Breve São Paulo State University - UNESP fabricio.breve@unesp.br
  • 2. Motivation • Approximately 2.2 billion people suffer from some form of visual impairment. • Including at least 1 billion with moderate or severe distance vision impairment [40]. • The prevalence of distance vision impairment is significantly higher in low- and middle- income areas compared to high-income regions [34]. • This population faces numerous difficulties in their daily routines, mostly linked to mobility and navigation. • With advancements in computer vision and related technologies, numerous navigation systems have been proposed. • Issues: many of them: • require costly, bulky, and/or custom equipment; • are too computationally intensive to run on portable devices; • require a network connection to a more powerful remote server. • White canes and guide dogs are currently the most commonly utilized tools to aid visually impaired (VI) individuals [15]. [34] Steinmetz, J.D., Bourne, R.R., Briant, P.S., Flaxman, S.R., Taylor, H.R., Jonas, J.B., Abdoli, A.A., Abrha, W.A., Abualhasan, A., Abu-Gharbieh, E.G., et al.: Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to vision 2020: the right to sight: na analysis for the global burden of disease study. The Lancet Global Health 9(2), e144-e160 (2021). [40] World Health Organization: Vision impairment and blindness (Oct 2022), https://www.who.int/news-room/fact-sheets/detail/blindness-and-visualimpairment,accessed:2023-01-30. [15] Islam, M.M., Sheikh Sadi, M., Zamli, K.Z., Ahmed, M.M.: Developing walking assistants for visually impaired people: A review. IEEE Sensors Journal 19(8), 2814-2828 (2019). https://doi.org/10.1109/JSEN.2018.2890423. Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 04/07/2023 2
  • 3. Motivation • Recent surveys show that: • Smartphone-based computer vision tools for the VI often employ outdated image and video processing techniques [3]; • Researchers have started to adopt deep learning approaches [24]: • These techniques have grown with the advent of increased computational power in machines; • However, carrying high-powered computational devices for vision-based assistive solutions is not practical for users. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 3 [3] Budrionis, A., Plikynas, D., Daniu²is, P., Indrulionis, A.: Smartphonebased computer vision travelling aids for blind and visually impaired individuals: A systematic review. Assistive Technology 34(2), 178194 (2022). https://doi.org/10.1080/10400435.2020.1743381, pMID: 32207640. [24] Mandia, S., Kumar, A., Verma, K., Deegwal, J.K.: Vision-based assistive systems for visually impaired people: A review. In: Tiwari, M., Ismail, Y., Verma, K., Garg, A.K. (eds.) Optical and Wireless Technologies. pp. 163172. Springer Nature Singapore, Singapore (2023).
  • 4. Objectives • Project Goal: build a system to assist visually impaired people. • Requirement: execute on a single smartphone, without extra accessories or connection requirements. • Method: the smartphone takes pictures of the path and provides audio and/or vibration feedback regarding potential obstacles, before they are in the reach of the white cane. • This Paper Goal: perform the classification step, based on Convolutional Neural Networks (CNNs). • Find the best CNN architecture for this task; • Find the optimal learning hyperparameters. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 4
  • 5. Contributions • Prior study [2]: a framework that leverages CNNs, transfer learning, and semi-supervised learning (SSL). • The focus was to minimize computational costs and make it feasible for implementation on smartphones without requiring additional hardware. • This study: previous works are significantly expanded upon with the following key contributions: 1. Eight additional CNN models were added to the study, based on the cutting-edge EfficientNet architecture [37], bringing the total number of networks evaluated to 25; 2. The K-Fold Cross Validation process was repeated five times, providing more robust results; 3. Image pre-processing functions were introduced to enhance image preparation for each network type, resulting in improved accuracy in most cases; 4. Ensembles of CNNs were employed to boost the overall accuracy by leveraging the strengths of multiple CNN architectures. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 5
  • 6. The Dataset • 342 images; • Two classes: • 175 images of “clear-path”; • 167 images of “non clear-path”. • The Dataset covers: • Indoor and outdoor situations; • Different types of floor; • Dry and wet floor; • Different amounts of light; • Daylight and artificial light; • Different types of obstacles: • Stairs, trees, holes, animals, traffic cones, etc. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 6
  • 7. The full dataset is available at: https://github.com/fbreve/via-dataset Clear Path Non-Clear Path 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 7
  • 8. CNN Architectures • 25 evaluated architectures 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 8 Model Input Image Resolution Output of Last Conv. Layer Trainable Parameters DenseNet121 224 × 224 7 × 7 × 1024 7,085,314 DenseNet169 224 × 224 7 × 7 × 1664 12,697,858 DenseNet201 224 × 224 7 × 7 × 1920 18,339,074 EfficientNetB0 224 × 224 7 × 7 × 1280 4,171,774 EfficientNetB1 240 × 240 8 × 8 × 1280 6,677,410 EfficientNetB2 260 × 260 9 × 9 × 1408 7,881,604 EfficientNetB3 300 × 300 10 × 10 × 1536 10,893,226 EfficientNetB4 380 × 380 12 × 12 × 1792 17,778,378 EfficientNetB5 456 × 456 15 × 15 × 2048 28,603,314 EfficientNetB6 528 × 528 17 × 17 × 2304 41,031,002 EfficientNetB7 600 × 600 19 × 19 × 2560 64,115,026 InceptionResNetV2 299 × 299 8 × 8 × 1536 54,473,186 InceptionV3 299 × 299 8 × 8 × 2048 22,030,882 MobileNet 224 × 224 7 × 7 × 1024 3,338,434 MobileNetV2 224 × 224 7 × 7 × 1280 2,388,098 NASNetMobile 224 × 224 7 × 7 × 1056 4,368,532 ResNet101 224 × 224 7 × 7 × 2048 42,815,362 ResNet101V2 224 × 224 7 × 7 × 2048 42,791,426 ResNet152 224 × 224 7 × 7 × 2048 58,482,050 ResNet152V2 224 × 224 7 × 7 × 2048 58,450,434 ResNet50 224 × 224 7 × 7 × 2048 23,797,122 ResNet50V2 224 × 224 7 × 7 × 2048 23,781,890 VGG16 224 × 224 7 × 7 × 512 14,780,610 VGG19 224 × 224 7 × 7 × 512 20,090,306 Xception 299 × 299 10 × 10 × 2048 21,069,482
  • 9. Proposed CNN Networks • Weights initialization: • Convolutional layers: pre-trained weights from the Imagenet dataset [28]. • Dense layers: He uniform variance scaling initializer [7]. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 9 CNN Input: (x, x, 3) Output: (w, y, z) Average Global Pooling 2D Input: (w, y, z) Output: (z) Dense Input: (z) Output: (128) Dense softmax Input: (128) Output: (2) [28] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115(3), 211252 (2015). https://doi.org/10.1007/s11263-015-0816-y. [7] He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human- level performance on imagenet classication. In: Proceedings of the IEEE international conference on computer vision. pp. 10261034 (2015).
  • 10. Tested Scenarios • Scenarios A to D: • Adaptive learning rate: 10−3 to 10−5 . • Adjustment factor: 0.5 when validation accuracy did not increase in the last two epochs. • Scenarios E and F: • Fixed learning rate: • Dense layers: 10−3 ; Convolutional layers: 10−5 . • All scenarios: • Up to 50 epochs; • Early stop: validation loss did not decrease in the last 10 epochs. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 10 Config. Fine-Tuning Different Learning Rates Optimizer A No No RMSprop B No No Adam C Yes No RMSprop D Yes No Adam E Yes Yes RMSprop F Yes Yes Adam
  • 11. Tested Scenarios • Software: Python with TensorFlow. • Hardware: 3 desktop computers equipped with NVIDIA GeForce GPU boards: • GTX 970; • GTX 1080; • RTX 2060 SUPER. • The code is available at GitHub: • https://github.com/fbreve/via-py • K-Fold Cross Validation: • 𝑘 = 10 • 5 repetitions • Validation subset: • 20% of the training instances. • Batch size: 16 • Exceptions: 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 11 Method Conf. C Conf. D Conf. E Conf. F EfficientNetB3 16 - 8 EfficientNetB4 4 8 4 8 EfficientNetB5 2 4 2 4 EfficientNetB6 2 2 2 2 EfficientNetB7 1 1 1 1
  • 12. Results: Single Networks 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 12 Method Conf. A Conf. B Conf. C Conf. D Conf. E Conf. F Average MobileNet 0,9158 0,9152 0,9299 0,9257 0,8729 0,9105 0,9117 Xception 0,8749 0,8755 0,9374 0,9252 0,8934 0,9029 0,9016 EfficientNetB0 0,8877 0,8876 0,9427 0,9274 0,8724 0,8819 0,9000 EfficientNetB3 0,8901 0,8889 0,9404 0,9291 0,8727 0,8761 0,8995 EfficientNetB2 0,8725 0,8660 0,9391 0,9426 0,8672 0,8679 0,8926 EfficientNetB4 0,8813 0,8807 0,9304 0,9456 0,8414 0,8632 0,8904 EfficientNetB1 0,8908 0,8855 0,9369 0,9341 0,8482 0,8463 0,8903 DenseNet201 0,8841 0,8847 0,8847 0,8807 0,8696 0,8809 0,8808 InceptionResNetV2 0,8650 0,8644 0,8954 0,9217 0,8632 0,8668 0,8794 InceptionV3 0,8691 0,8657 0,8611 0,8965 0,8936 0,8807 0,8778 DenseNet169 0,8789 0,8766 0,8779 0,8854 0,8679 0,8713 0,8763 DenseNet121 0,8730 0,8671 0,8867 0,8912 0,8715 0,8680 0,8763 ResNet50 0,8947 0,8901 0,8139 0,8392 0,8801 0,8761 0,8657 ResNet101 0,8925 0,8919 0,8130 0,7919 0,8731 0,8626 0,8542 EfficientNetB5 0,8953 0,8947 0,8760 0,9233 0,6398 0,8327 0,8436 MobileNetV2 0,8924 0,8912 0,8324 0,7954 0,8116 0,8042 0,8378 ResNet152 0,8755 0,8754 0,7418 0,7724 0,8819 0,8638 0,8351 ResNet50V2 0,8539 0,8581 0,7273 0,8263 0,8516 0,8587 0,8293 EfficientNetB6 0,8719 0,8690 0,8514 0,8738 0,7383 0,7516 0,8260 ResNet101V2 0,8807 0,8790 0,6184 0,7256 0,8778 0,8779 0,8099 ResNet152V2 0,9106 0,9077 0,5942 0,6663 0,8890 0,8885 0,8094 VGG19 0,8263 0,8118 0,7916 0,6707 0,8746 0,8680 0,8072 VGG16 0,8263 0,8175 0,7877 0,6316 0,8759 0,8543 0,7989 NASNetMobile 0,8560 0,8578 0,6671 0,6997 0,7092 0,7050 0,7491 EfficientNetB7 0,8731 0,8714 0,5248 0,4999 0,5242 0,5230 0,6361 Average 0,8773 0,8749 0,8241 0,8289 0,8344 0,8433 0,8472
  • 13. Ensembles • Average of ensemble members for each of the 50 folds (𝑘 = 10, repeated 5 times). • Output of the last dense layer, before softmax, taken as probabilities for each class. • The best model for each of the 6 configurations were used. • Two ensemble experiments: • Multiple instances of the same model. • Randomized: seeds, initial weights, validation subset, etc. • From 1 to 10 instances of the best models in each configuration. • Best 2 to 6 configurations. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 13
  • 14. Results: Ensembles of Single CNN Models 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 14 Instances Conf. A Conf. B Conf. C Conf. D Conf. E Conf. F Average 1 0,9158 0,9152 0,9427 0,9456 0,8936 0,9105 0,9206 2 0,9187 0,9135 0,9451 0,9502 0,9065 0,9170 0,9252 3 0,9170 0,9176 0,9445 0,9532 0,9112 0,9193 0,9271 4 0,9164 0,9158 0,9485 0,9562 0,9176 0,9182 0,9288 5 0,9187 0,9182 0,9491 0,9585 0,9171 0,9217 0,9306 6 0,9176 0,9199 0,9549 0,9602 0,9182 0,9199 0,9318 7 0,9211 0,9182 0,9543 0,9579 0,9211 0,9164 0,9315 8 0,9164 0,9158 0,9509 0,9596 0,9194 0,9158 0,9297 9 0,9182 0,9176 0,9538 0,9596 0,9200 0,9193 0,9314 10 0,9188 0,9159 0,9526 0,9602 0,9194 0,9176 0,9308 Average 0,9179 0,9168 0,9496 0,9561 0,9144 0,9176 0,9287
  • 15. Results: Ensembles of Multiple CNN Models 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 15 Instances of each conf. Conf. D Conf. DC Conf. DCA Conf. DCAB Conf. DCABF All Conf. Average 1 0,9456 0,9532 0,9567 0,9550 0,9503 0,9491 0,9517 2 0,9502 0,9491 0,9544 0,9538 0,9486 0,9521 0,9514 3 0,9532 0,9544 0,9597 0,9545 0,9509 0,9539 0,9544 4 0,9562 0,9555 0,9614 0,9527 0,9515 0,9556 0,9555 5 0,9585 0,9567 0,9620 0,9544 0,9544 0,9573 0,9572 6 0,9602 0,9579 0,9655 0,9562 0,9538 0,9556 0,9582 7 0,9579 0,9544 0,9643 0,9574 0,9544 0,9550 0,9572 8 0,9596 0,9532 0,9626 0,9562 0,9526 0,9573 0,9569 9 0,9596 0,9555 0,9637 0,9568 0,9568 0,9550 0,9579 10 0,9602 0,9561 0,9626 0,9556 0,9579 0,9550 0,9579 Average 0,9561 0,9546 0,9613 0,9553 0,9531 0,9546 0,9558
  • 16. Conclusions • Comparison of 25 different CNN architectures to identify obstacles in the path of visually impaired individuals. • K-Fold Cross Validation was utilized with 𝑘 = 10 and five repetitions to provide robust results. • Architectures have low computational costs during inference, executing in milliseconds on current smartphones. • can be implemented without relying on external equipment or remote servers. • Fine-tuning an EfficientNetB4 network achieved the highest accuracy of 0.9456. • Improved to 0.9602 using an ensemble with six instances of the same network. • Further increased to 0.9655 by adding six instances of fine-tuned EfficientNetB0 and six instances of MobileNet with fixed weights in the convolutional layers. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 16
  • 17. Conclusions • The numerous computer simulations conducted in this study yielded promising results for some CNN architectures and investigated the use of: • different optimizers (Adam and RMSprop); • different learning strategies (single learning rate versus different rates for convolution and dense layers); • fixed versus fine-tuned pre-trained weights. • Future work: • expanding the proposed dataset by acquiring more images; • exploring other approaches and modifications to the current framework to further enhance classification accuracy. 04/07/2023 Fabricio Breve ICCSA 2023 Athens, Greece, July 3 – 6, 2023 17
  • 18. Convolutional Neural Networks and Ensembles for Visually Impaired Aid Fabricio Breve São Paulo State University - UNESP fabricio.breve@unesp.br