SlideShare a Scribd company logo
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 14, No. 1, February 2024, pp. 192~198
ISSN: 2088-8708, DOI: 10.11591/ijece.v14i1.pp192-198  192
Journal homepage: http://ijece.iaescore.com
Comparison of convolutional neural network models for user’s
facial recognition
Javier Orlando Pinzón-Arenas1
, Robinson Jiménez-Moreno1
, Javier Eduardo Martinez Baquero2
1
Mechatronic Engineering, Faculty of Engineering, Universidad Militar Nueva Granada, Bogota, Colombia
2
Engineering School, Faculty of Basic Sciences and Engineering, Universidad de los Llanos, Villavicencio, Colombia
Article Info ABSTRACT
Article history:
Received May 10, 2023
Revised Jul 12, 2023
Accepted Jul 17, 2023
This paper compares well-known convolutional neural networks (CNN)
models for facial recognition. For this, it uses its database created from two
registered users and an additional category of unknown persons. Eight different
base models of convolutional architectures were compared by transfer of
learning, and two additional proposed models called shallow CNN and shallow
directed acyclic graph with CNN (DAG-CNN), which are architectures with
little depth (six convolution layers). Within the tests with the database, the best
results were obtained by the GoogLeNet and ResNet-101 models, managing to
classify 100% of the images, even without confusing people outside the two
users. However, in an additional real-time test, in which one of the users had
his style changed, the models that showed the greatest robustness in this
situation were the Inception and the ResNet-101, being able to maintain
constant recognition. This demonstrated that the networks of greater depth
manage to learn more detailed features of the users' faces, unlike those of
shallower ones; their learning of features is more generalized. Declare the full
term of an abbreviation/acronym when it is mentioned for the first time.
Keywords:
Architecture comparison
Biometric register
Convolutional neural network
Face recognition
Transfer learning
User identification
This is an open access article under the CC BY-SA license.
Corresponding Author:
Robinson Jiménez-Moreno
Mechatronics Engineering Program, Faculty of Engineering, Universidad Militar Nueva Granada
Carrera 11 # 101-80, Bogotá D.C., Colombia
Email: robinson.jimenez@unimilitar.edu.co
1. INTRODUCTION
Face detection is one of the research topics that have remained in force in state of the art [1], gaining
high relevance in applied systems such as access control to safe areas [2], attendance management systems [3],
or young people recognition with some risk of vulnerability [4]. For face detection, the techniques used are
very varied. For example, the use of binary descriptors [5], mechanisms based on three-way decisions [6], and
alignment learning [7], all of them based mainly on image and video analysis [8]. One of the most efficient
techniques for extracting features through images or videos focuses on deep learning algorithms.
Deep learning techniques have witnessed notable advancements in face identification, with
convolutional neural networks (CNNs) emerging as prominent players [9]. CNNs are object-recognition-
focused networks that excel in pattern recognition tasks [10]. Their architecture has been continuously
improved to enhance performance [11], and they find applications across various knowledge domains,
particularly in image classification [12], [13]. Noteworthy CNN-based architectures include which combines
CNNs with long short-term memory networks (CNN-LSTM) for sequential data processing [14], R-CNN
region-based networks [15], and the fast R-CNN, which improves detection speed for region-based networks
[16]. These advancements in CNN-based architectures have greatly contributed to the progress of face
identification techniques based on deep learning.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of convolutional neural network models for user’s facial … (Javier Orlando Pinzón-Arenas)
193
Deep learning has facilitated the development of various applications in face recognition, including
real-time human action recognition, with CNN-based models demonstrating impressive performance
[17]–[20]. However, within the current state of the art, there needs to be more comparative evaluation for
different CNN architectures, specifically face detection. This work aims to address this gap by
comprehensively evaluating 10 CNN-based architectures using transfer learning [21].
By focusing on face detection, this research contributes to a better understanding of the effectiveness
and suitability of various CNN models in this specific domain. Among the applications that this comparative
analysis allows is the development of access control systems by user recognition, among others. The article
presents the methodology employed based on the use of convolutional networks by transfer of learning.
Next, the methods and materials are presented, exposing the database and architectures to be evaluated.
The models compared were AlexNet, VGG-16, and VGG-19, GoogLeNet, Inception V3, ResNet-18, ResNet-50
and ResNet-101, and two additional proposed models called shallow CNN and shallow directed acyclic graph
with CNN (DAG-CNN). The results section is presented, analyzing the activations of the networks with the best
performance, and finally, the conclusions reached are exposed.
2. METHOD
A database consisting of three categories is created to carry out the comparison. Two categories are
registered users to be recognized (Javier and Robinson). The other category represents a random group of
individuals to verify that others are not recognized (Others). For the construction of the database, photos of
users' faces are obtained in different positions so that the network can recognize the person, no matter if the
face is not completely in front. For the “Others” category, the CelebA [22] database is used, thus obtaining
faces with different characteristics, even some similar to the original users to recognize. In total, 3,840 images
are used for training, of which 1,940 are in the “Others” category.
The reason for using almost twice as many images within that category as the two users is to give the
network more possible characteristics of the unregistered subjects, to avoid that if a person has similar traits,
the network can know that the subject is neither of the original. On the other hand, for the validation of the
networks, 525 images are used, distributed in 75 images for each of the users and 375 for the category of
“Others,” in order to verify that, although there are many different users, the networks are capable of
discriminating against them. In Figure 1, it is possible to see samples of the images for each category. The size
of the images varies according to the neural network to be used since not all of them have the standard size of
224×224 pixels.
Eight of the most well-known ones are selected to compare the users' facial recognition capacity
between different models of CNNs. These models are AlexNet [23], the two versions of the visual geometry
group (VGG) model (16 and 19) [24], GoogLeNet [25], Inception V3 [26], and the ResNet models (18, 50 and
101) [27]. It is also proposed to implement two additional basic models to verify if low-depth architectures can
maintain a level of recognition as good as the pre-trained models.
Figure 1. Examples of images used in the database
Javier
Robinson
Others
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 192-198
194
The two proposed architectures are basic CNN consisting of convolution blocks, where the first one
is a sequential network named shallow CNN since it is less deep than its counterparts (apart from AlexNet).
The second architecture comprises two branches, one with convolution filters of size 3×3, and another with
convolution filters of size 5×5, to learn different patterns of faces. The latter is called shallow DAG-CNN
because of its different paths and depth, which remains similar to the previous one, although it has a total of
twelve convolution layers. A general diagram of the two architectures can be seen in Figure 2, where S refers
to the filter stride, P to the padding used, and the last value represents the number of filters used in that layer.
The weights of the two proposed networks were initialized using the He method [28].
The same training parameters were set for all the networks, even for the two proposed architectures,
to avoid giving one network a greater advantage than another. For the pre-trained models, mixed transfer
learning is performed, i.e., the weights of the first convolution layers are frozen and used as feature extractors
while the rest of the layers are fine-tuned. The parameters are as follows: learning rate of 10-3 with a reduction
factor of 0.5 every four epochs; training will be done for eight epochs, with a mini-batch size of 8 per iteration.
These parameters are selected because the models were mostly trained with a learning rate of 0.1, and their
weights are expected to not vary greatly from the initial ones. Similarly, it is optional to train for many epochs
to avoid over-adjustment. As for the classification section, its learning rate is multiplied by a factor of 10 since,
at this stage, the network has not had initial learning, so its rate must be higher for its learning curve to be
greater.
During the training, three of the networks had problems with their learning: the AlexNet and VGG
models, where their gradient tended to increase abruptly, preventing the network from finishing the training.
For this reason, for these models, it was decided to carry out a transfer learning with complete fine-tuning in
all its layers. Being pre-entrained and deep networks, it is the generalized characteristics of the architectures
that extend the gradient loss.
Figure 2. Proposed shallow CNN
3. RESULTS AND DISCUSSION
For the first performance evaluation, all networks were compared, as shown in Figure 3, during their
training and tested with the validation set, thus obtaining the behavior of the network accuracy in Figure 3(a).
In this, the networks with the worst behavior were AlexNet and VGG. Although their losses were reduced in
Figure 3(b), these nets fell into overfitting early, remaining below 75% accuracy. On the other hand, the rest
of the networks achieved an accuracy above 95%, with shallow DAG-CNN and ResNet-18 as the two slowest
learning networks. The fastest models were the GoogLeNet, the ResNet-50, and the ResNet-101, achieving
more than 98% accuracy in their first epoch.
The GoogLeNet and ResNet-101 models were the ones that obtained the best recognition
performance, managing to discriminate without errors all the images. It is also possible to observe how the two
shallow type networks obtained results above 98%; even the DAG-CNN achieved the same result as the
Inception V3 without having previous learning and fewer layers than the Inception. AlexNet and the VGG
maintained a low level of recognition because they could recognize either of the two users, i.e., the two
registered users were recognized as one user, despite correctly classifying the category of “Others”. Table 1
shows the accuracies obtained with each of the trained architectures.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of convolutional neural network models for user’s facial … (Javier Orlando Pinzón-Arenas)
195
(a) (b)
Figure 3. The validation set is used in (a) network accuracy and (b) network loss during training
Table 1. Comparison of user’s face recognition results
Network Accuracy [%] Network Accuracy [%]
Shallow CNN 98.67 GoogLeNet 100
Shallow DAG-CNN 99.05 Inception V3 99.05
AlexNet 71.43 ResNet-18 99.81
VGG-16 71.43 Resnet-50 99.81
VGG-19 14.29 ResNet-101 100
In order to enhance the testing and comparison of the models, it was proposed to verify their operation
in real-time, adding a higher level of difficulty in recognition by changing the style of one of the users, that is,
the beard and hairstyle. Most photos of the user “Javier” are presented as the examples shown in Figure 1,
where he has hair and no facial hair. For real-time testing, the user uses full beard and hair removal to verify if
the networks can recognize him. Face detection is performed using the Viola-Jones algorithm [29], with which
the bounding box is cropped and then sent to the neural network.
Each neural network was tested with the user “Javier” video sequence. Figure 4 shows a frame taken
from the sequence, where the category in which each model classified the user's face is displayed. The AlexNet
and VGG models maintained constant user recognition in other categories. On the other hand, the shallow and
ResNet-18 types, although capable of recognizing the user, maintained a constant variation of categories,
repeatedly classifying the user as unknown, even if the position of the face was frontal, as can be seen in the
figure. As for the deeper networks, they managed to maintain an accurate recognition in most of the video,
with few category changes, without confusing him with the other user or classifying him as unknown.
Figure 4. Tests performed in real-time
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 192-198
196
With the models that performed better in the real-time tests, i.e., those with less variation in the user's
classification, another test is done, where the user makes slight rotations of the face, to check if the networks
could maintain an accurate recognition. In this test, only two networks could maintain a correct classification,
the Inception V3 and the ResNet-101, as seen in Figure 5. The percentages of success are respectively 92%
and 76%, evidencing the better performance of the Inception architecture used.
Figure 5. Face rotation tests
The capability of these two networks (the Inception and the ResNet-101) is because, thanks to a
large number of convolution blocks, they can learn specific patterns of the user's face, helping them to
improve their recognition so that there is a change of style of the user. While architectures with less depth
can recognize the user adequately, if a change is made that is not contemplated in the learning set, they will
not be able to recognize the user because the learned characteristics will be more general. This factor is
shown in Figure 6, where the first layer activations of the shallow DAG-CNN in the 3×3 filter-branch
Figure 6(a) and the ResNet-101 Figure 6(b) are obtained from an image of the validation set (top) and an
image of the test video (bottom).
(a) (b)
Figure 6. Activations of the first layer in (a) shallow DAG-CNN and (b) ResNet-101 architectures
In shallow architecture, the network focuses on general face patterns, such as eyes, nostrils, and
specific face parts. However, these are repeated through most filters without having variations of other user
characteristics, as seen in the first image of activations. It can also distinguish shapes and edges, but only in
slight sections of the face (such as the shape of the eye), without taking into account, for example, the general
shape of the person's skull. Another pattern in their learning is found mainly only in the forehead, repeated by
several filters, making a not very specific feature of the user to cover many activations.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of convolutional neural network models for user’s facial … (Javier Orlando Pinzón-Arenas)
197
As for ResNet-101, although it also focuses its learning on these parts of the face, it does it in a more
detailed way, without generalizing or joining several sections at the same time in one filter. But only aiming to
discriminate certain patterns, achieving a better distribution of what each filter has learned. For instance, some
filters learned the shape of the head, others the location of the eyes, and the nose, apart from its nostrils.
Switching to its application in the real-time test further enhances each of the features learned by each network,
whereas the shallow mostly keeps its activations on the whole face without discriminating more generally about
the characteristics of the user. Meanwhile, ResNet can even highlight the shape of the user's face and head and
the location of the ears.
4. CONCLUSION
In this work, a comparison of different CNN models for facial recognition was made to verify the
performance of each one. With these comparisons, it was demonstrated that the best networks for this
application were GoogLeNet and ResNet-101, which managed to recognize the two users without error
correctly and to discriminate against all subjects not belonging to the database. However, the shallow networks
without pre-training, such as the shallow CNN and the DAG-CNN, could obtain high performance, even
matching the capacity of the Inception V3.
After performing a style change, an additional test was added to recognize one of the users. With this,
it was found that the two networks with the greatest capability to withstand drastic changes in certain user
characteristics are the Inception V3 and the ResNet-101, which have a greater capacity to learn detailed user
features due to their depth. They managed to maintain constant recognition of the subject, even when
performing face rotations. This robustness was demonstrated using the layer activations, comparing the
learning of one of these against one of little depth, evidencing that these networks could learn more detailed
patterns, allowing them to discriminate characteristic features of the user.
ACKNOWLEDGMENTS
The authors are grateful to Universidad Militar Nueva Granada for the funding of this Project and
Universidad de los Llanos for all help to participate.
REFERENCES
[1] W. Niu, Y. Zhao, Z. Yu, Y. Liu, and Y. Gong, “Research on a face recognition algorithm based on 3D face data and 2D face image
matching,” Journal of Visual Communication and Image Representation, vol. 91, Mar. 2023, doi: 10.1016/j.jvcir.2023.103757.
[2] R. Rameswari, S. N. Kumar, M. A. Aananth, and C. Deepak, “Automated access control system using face recognition,” Materials
Today: Proceedings, vol. 45, pp. 1251–1256, 2021, doi: 10.1016/j.matpr.2020.04.664.
[3] S. M. Bah and F. Ming, “An improved face recognition algorithm and its application in attendance management system,” Array,
vol. 5, Mar. 2020, doi: 10.1016/j.array.2019.100014.
[4] C. Y. J. Liu and C. Wilkinson, “Image conditions for machine-based face recognition of juvenile faces,” Science & Justice, vol. 60,
no. 1, pp. 43–52, Jan. 2020, doi: 10.1016/j.scijus.2019.10.001.
[5] C. Zhao, X. Li, and Y. Dong, “Learning blur invariant binary descriptor for face recognition,” Neurocomputing, vol. 404,
pp. 34–40, Sep. 2020, doi: 10.1016/j.neucom.2020.04.082.
[6] A. Shah, B. Ali, M. Habib, J. Frnda, I. Ullah, and M. S. Anwar, “An ensemble face recognition mechanism based on three-way
decisions,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 4, pp. 196–208, Apr. 2023, doi:
10.1016/j.jksuci.2023.03.016.
[7] F. Tang et al., “An end-to-end face recognition method with alignment learning,” Optik, vol. 205, Mar. 2020, doi:
10.1016/j.ijleo.2020.164238.
[8] D. Manju and V. Radha, “A novel approach for pose invariant face recognition in surveillance videos,” Procedia Computer Science,
vol. 167, pp. 890–899, 2020, doi: 10.1016/j.procs.2020.03.428.
[9] J. Yuan et al., “Gated CNN: Integrating multi-scale feature layers for object detection,” Pattern Recognition, vol. 105, Sep. 2020,
doi: 10.1016/j.patcog.2019.107131.
[10] I. Rafegas, M. Vanrell, L. A. Alexandre, and G. Arias, “Understanding trained CNNs by indexing neuron selectivity,” Pattern
Recognition Letters, vol. 136, pp. 318–325, Aug. 2020, doi: 10.1016/j.patrec.2019.10.013.
[11] A. M. S. Aradhya, A. Ashfahani, F. Angelina, M. Pratama, R. F. de Mello, and S. Sundaram, “Autonomous CNN (AutoCNN): A
data-driven approach to network architecture determination,” Information Sciences, vol. 607, pp. 638–653, Aug. 2022, doi:
10.1016/j.ins.2022.05.100.
[12] J. Qin, W. Pan, X. Xiang, Y. Tan, and G. Hou, “A biological image classification method based on improved CNN,” Ecological
Informatics, vol. 58, Jul. 2020, doi: 10.1016/j.ecoinf.2020.101093.
[13] Y. Li et al., “Robust detection for network intrusion of industrial IoT based on multi-CNN fusion,” Measurement, vol. 154, Mar.
2020, doi: 10.1016/j.measurement.2019.107450.
[14] Q. Fu, C. Wang, and X. Han, “A CNN-LSTM network with attention approach for learning universal sentence representation in
embedded system,” Microprocessors and Microsystems, vol. 74, Apr. 2020, doi: 10.1016/j.micpro.2020.103051.
[15] Y. Tian, G. Yang, Z. Wang, E. Li, and Z. Liang, “Instance segmentation of apple flowers using the improved mask R–CNN model,”
Biosystems Engineering, vol. 193, pp. 264–278, May 2020, doi: 10.1016/j.biosystemseng.2020.03.008.
[16] G. Rajeshkumar et al., “Smart office automation via faster R-CNN based face recognition and internet of things,” Measurement:
Sensors, vol. 27, Jun. 2023, doi: 10.1016/j.measen.2023.100719.
[17] A. Budiman, R. A. Yaputera, S. Achmad, and A. Kurniawan, “Student attendance with face recognition (LBPH or CNN): systematic
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 192-198
198
literature review,” Procedia Computer Science, vol. 216, pp. 31–38, 2023, doi: 10.1016/j.procs.2022.12.108.
[18] F. Zhao, J. Li, L. Zhang, Z. Li, and S.-G. Na, “Multi-view face recognition using deep neural networks,” Future Generation
Computer Systems, vol. 111, pp. 375–380, Oct. 2020, doi: 10.1016/j.future.2020.05.002.
[19] S. R. Mishra, T. K. Mishra, G. Sanyal, A. Sarkar, and S. C. Satapathy, “Real time human action recognition using triggered frame
extraction and a typical CNN heuristic,” Pattern Recognition Letters, vol. 135, pp. 329–336, Jul. 2020, doi:
10.1016/j.patrec.2020.04.031.
[20] K. B. Pranav and J. Manikandan, “Design and evaluation of a real-time face recognition system using convolutional neural
networks,” Procedia Computer Science, vol. 171, pp. 1651–1659, 2020, doi: 10.1016/j.procs.2020.04.177.
[21] J. Lin, L. Zhao, Q. Wang, R. Ward, and Z. J. Wang, “DT-LET: deep transfer learning by exploring where to transfer,”
Neurocomputing, vol. 390, pp. 99–107, May 2020, doi: 10.1016/j.neucom.2020.01.042.
[22] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” Prepr. arXiv.1411.7766, Nov. 2014.
[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications
of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Prepr. arXiv.1409.1556,
Sep. 2014.
[25] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, Jun. 2015, pp. 1–9, doi: 10.1109/CVPR.2015.7298594.
[26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2818–2826, doi: 10.1109/CVPR.2016.308.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[28] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: surpassing human-level performance on ImageNet classification,”
in 2015 IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp. 1026–1034, doi: 10.1109/ICCV.2015.123.
[29] P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154,
May 2004, doi: 10.1023/B:VISI.0000013087.49260.fb.
BIOGRAPHIES OF AUTHORS
Javier Orlando Pinzón Arenas was born in Socorro-Santander, Colombia, in
1990. He received his degree in mechatronics engineering (cum laude) and specialization in
Engineering Project Management at Universidad Militar Nueva Granada-UMNG in 2013 and
2016, respectively. He has experience in the areas of automation, electronic control, and
machine learning. Currently, he has a degree in mechatronics engineering and is working as a
research assistant at the UMNG with an emphasis on robotics and machine learning. He can be
contacted at est.javier.pinzon@unimilitar.edu.co.
Robinson Jiménez-Moreno is an electronic engineer who graduated from
Universidad Distrital Francisco José de Caldas in 2002. He received an M.Sc. in engineering
from Universidad Nacional de Colombia in 2012 and a Ph.D. in engineering at Universidad
Distrital Francisco José de Caldas in 2018. He is currently working as an assistant professor at
Universidad Militar Nueva Granada and his research focuses on the use of convolutional neural
networks for object recognition and image processing for robotic applications such as human-
machine interaction. He can be contacted at robinson.jimenez@unimilitar.edu.co. His profile
can be found at ResearchGate https://www.researchgate.net/profile/Robinson-Moreno-2.
RedDOLAC https://reddolac.org/profile/RobinsonJimenezMoreno.
Javier Eduardo Martinez Baquero is an electronic engineer who graduated
from Universidad de los Llanos in 2002, a postgraduate in electronic instrumentation from
Universidad Santo Tomas in 2004, a postgraduate in instrumentation and industrial
control at Universidad de los Llanos in 2020, and an M.Sc. in educative technology and
innovative media for education at Universidad Autonoma de Bucaramanga in 2013. He is
currently working as an associate professor at Universidad de los Llanos and his
research focuses on instrumentation, automation, control, and renewable energies.
He can be contacted at jmartinez@unillanos.edu.co. His profile can be found at
ResearchGate https://www.researchgate.net/profile/Javier-Martinez-Baquero and RedDOLAC
https://reddolac.org/profile/JavierEduardoMartinezBaquero?xg_source=activity.

More Related Content

PPTX
Introduction to computer vision
PDF
IRJET- Face Recognition using Machine Learning
PDF
Paper_3.pdf
PDF
Face Recognition Methods based on Convolutional Neural Networks
PDF
Semantic Assisted Convolutional Neural Networks in Face Recognition
PPTX
Introduction to computer vision with Convoluted Neural Networks
PDF
Finding the best solution for Image Processing
PDF
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
Introduction to computer vision
IRJET- Face Recognition using Machine Learning
Paper_3.pdf
Face Recognition Methods based on Convolutional Neural Networks
Semantic Assisted Convolutional Neural Networks in Face Recognition
Introduction to computer vision with Convoluted Neural Networks
Finding the best solution for Image Processing
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...

Similar to Comparison of convolutional neural network models for user’s facial recognition (20)

PDF
Report face recognition : ArganRecogn
PDF
Face recognition for presence system by using residual networks-50 architectu...
PDF
CNN Algorithm
PDF
improving Profile detection using Deep Learning
PPTX
Face Detection.pptx
PDF
Deep hypersphere embedding for real-time face recognition
DOCX
Convolutional Neural Networks
PDF
Convolutional neural network
PDF
Deep learning for pose-invariant face detection in unconstrained environment
PDF
IRJET- Development of a Face Recognition System with Deep Learning and Py...
PDF
Design and analysis of face recognition system based on VGG-Face-16 with vari...
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
A hybrid approach for face recognition using a convolutional neural network c...
PDF
IRJET- A Study of Different Convolution Neural Network Architectures for Huma...
PPTX
04 Deep CNN (Ch_01 to Ch_3).pptx
PDF
Modelling Framework of a Neural Object Recognition
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
PPTX
A Survey of Convolutional Neural Networks
PPTX
conv_nets.pptx
PDF
1409.1556.pdf
Report face recognition : ArganRecogn
Face recognition for presence system by using residual networks-50 architectu...
CNN Algorithm
improving Profile detection using Deep Learning
Face Detection.pptx
Deep hypersphere embedding for real-time face recognition
Convolutional Neural Networks
Convolutional neural network
Deep learning for pose-invariant face detection in unconstrained environment
IRJET- Development of a Face Recognition System with Deep Learning and Py...
Design and analysis of face recognition system based on VGG-Face-16 with vari...
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
A hybrid approach for face recognition using a convolutional neural network c...
IRJET- A Study of Different Convolution Neural Network Architectures for Huma...
04 Deep CNN (Ch_01 to Ch_3).pptx
Modelling Framework of a Neural Object Recognition
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
A Survey of Convolutional Neural Networks
conv_nets.pptx
1409.1556.pdf
Ad

More from IJECEIAES (20)

PDF
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
PDF
Embedded machine learning-based road conditions and driving behavior monitoring
PDF
Advanced control scheme of doubly fed induction generator for wind turbine us...
PDF
Neural network optimizer of proportional-integral-differential controller par...
PDF
An improved modulation technique suitable for a three level flying capacitor ...
PDF
A review on features and methods of potential fishing zone
PDF
Electrical signal interference minimization using appropriate core material f...
PDF
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
PDF
Bibliometric analysis highlighting the role of women in addressing climate ch...
PDF
Voltage and frequency control of microgrid in presence of micro-turbine inter...
PDF
Enhancing battery system identification: nonlinear autoregressive modeling fo...
PDF
Smart grid deployment: from a bibliometric analysis to a survey
PDF
Use of analytical hierarchy process for selecting and prioritizing islanding ...
PDF
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
PDF
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
PDF
Adaptive synchronous sliding control for a robot manipulator based on neural ...
PDF
Remote field-programmable gate array laboratory for signal acquisition and de...
PDF
Detecting and resolving feature envy through automated machine learning and m...
PDF
Smart monitoring technique for solar cell systems using internet of things ba...
PDF
An efficient security framework for intrusion detection and prevention in int...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Embedded machine learning-based road conditions and driving behavior monitoring
Advanced control scheme of doubly fed induction generator for wind turbine us...
Neural network optimizer of proportional-integral-differential controller par...
An improved modulation technique suitable for a three level flying capacitor ...
A review on features and methods of potential fishing zone
Electrical signal interference minimization using appropriate core material f...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Bibliometric analysis highlighting the role of women in addressing climate ch...
Voltage and frequency control of microgrid in presence of micro-turbine inter...
Enhancing battery system identification: nonlinear autoregressive modeling fo...
Smart grid deployment: from a bibliometric analysis to a survey
Use of analytical hierarchy process for selecting and prioritizing islanding ...
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
Adaptive synchronous sliding control for a robot manipulator based on neural ...
Remote field-programmable gate array laboratory for signal acquisition and de...
Detecting and resolving feature envy through automated machine learning and m...
Smart monitoring technique for solar cell systems using internet of things ba...
An efficient security framework for intrusion detection and prevention in int...
Ad

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Digital Logic Computer Design lecture notes
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
composite construction of structures.pdf
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT on Performance Review to get promotions
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mechanical Engineering MATERIALS Selection
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Lecture Notes Electrical Wiring System Components
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Model Code of Practice - Construction Work - 21102022 .pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Digital Logic Computer Design lecture notes
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
additive manufacturing of ss316l using mig welding
bas. eng. economics group 4 presentation 1.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Construction Project Organization Group 2.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
composite construction of structures.pdf
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

Comparison of convolutional neural network models for user’s facial recognition

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 14, No. 1, February 2024, pp. 192~198 ISSN: 2088-8708, DOI: 10.11591/ijece.v14i1.pp192-198  192 Journal homepage: http://ijece.iaescore.com Comparison of convolutional neural network models for user’s facial recognition Javier Orlando Pinzón-Arenas1 , Robinson Jiménez-Moreno1 , Javier Eduardo Martinez Baquero2 1 Mechatronic Engineering, Faculty of Engineering, Universidad Militar Nueva Granada, Bogota, Colombia 2 Engineering School, Faculty of Basic Sciences and Engineering, Universidad de los Llanos, Villavicencio, Colombia Article Info ABSTRACT Article history: Received May 10, 2023 Revised Jul 12, 2023 Accepted Jul 17, 2023 This paper compares well-known convolutional neural networks (CNN) models for facial recognition. For this, it uses its database created from two registered users and an additional category of unknown persons. Eight different base models of convolutional architectures were compared by transfer of learning, and two additional proposed models called shallow CNN and shallow directed acyclic graph with CNN (DAG-CNN), which are architectures with little depth (six convolution layers). Within the tests with the database, the best results were obtained by the GoogLeNet and ResNet-101 models, managing to classify 100% of the images, even without confusing people outside the two users. However, in an additional real-time test, in which one of the users had his style changed, the models that showed the greatest robustness in this situation were the Inception and the ResNet-101, being able to maintain constant recognition. This demonstrated that the networks of greater depth manage to learn more detailed features of the users' faces, unlike those of shallower ones; their learning of features is more generalized. Declare the full term of an abbreviation/acronym when it is mentioned for the first time. Keywords: Architecture comparison Biometric register Convolutional neural network Face recognition Transfer learning User identification This is an open access article under the CC BY-SA license. Corresponding Author: Robinson Jiménez-Moreno Mechatronics Engineering Program, Faculty of Engineering, Universidad Militar Nueva Granada Carrera 11 # 101-80, Bogotá D.C., Colombia Email: robinson.jimenez@unimilitar.edu.co 1. INTRODUCTION Face detection is one of the research topics that have remained in force in state of the art [1], gaining high relevance in applied systems such as access control to safe areas [2], attendance management systems [3], or young people recognition with some risk of vulnerability [4]. For face detection, the techniques used are very varied. For example, the use of binary descriptors [5], mechanisms based on three-way decisions [6], and alignment learning [7], all of them based mainly on image and video analysis [8]. One of the most efficient techniques for extracting features through images or videos focuses on deep learning algorithms. Deep learning techniques have witnessed notable advancements in face identification, with convolutional neural networks (CNNs) emerging as prominent players [9]. CNNs are object-recognition- focused networks that excel in pattern recognition tasks [10]. Their architecture has been continuously improved to enhance performance [11], and they find applications across various knowledge domains, particularly in image classification [12], [13]. Noteworthy CNN-based architectures include which combines CNNs with long short-term memory networks (CNN-LSTM) for sequential data processing [14], R-CNN region-based networks [15], and the fast R-CNN, which improves detection speed for region-based networks [16]. These advancements in CNN-based architectures have greatly contributed to the progress of face identification techniques based on deep learning.
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of convolutional neural network models for user’s facial … (Javier Orlando Pinzón-Arenas) 193 Deep learning has facilitated the development of various applications in face recognition, including real-time human action recognition, with CNN-based models demonstrating impressive performance [17]–[20]. However, within the current state of the art, there needs to be more comparative evaluation for different CNN architectures, specifically face detection. This work aims to address this gap by comprehensively evaluating 10 CNN-based architectures using transfer learning [21]. By focusing on face detection, this research contributes to a better understanding of the effectiveness and suitability of various CNN models in this specific domain. Among the applications that this comparative analysis allows is the development of access control systems by user recognition, among others. The article presents the methodology employed based on the use of convolutional networks by transfer of learning. Next, the methods and materials are presented, exposing the database and architectures to be evaluated. The models compared were AlexNet, VGG-16, and VGG-19, GoogLeNet, Inception V3, ResNet-18, ResNet-50 and ResNet-101, and two additional proposed models called shallow CNN and shallow directed acyclic graph with CNN (DAG-CNN). The results section is presented, analyzing the activations of the networks with the best performance, and finally, the conclusions reached are exposed. 2. METHOD A database consisting of three categories is created to carry out the comparison. Two categories are registered users to be recognized (Javier and Robinson). The other category represents a random group of individuals to verify that others are not recognized (Others). For the construction of the database, photos of users' faces are obtained in different positions so that the network can recognize the person, no matter if the face is not completely in front. For the “Others” category, the CelebA [22] database is used, thus obtaining faces with different characteristics, even some similar to the original users to recognize. In total, 3,840 images are used for training, of which 1,940 are in the “Others” category. The reason for using almost twice as many images within that category as the two users is to give the network more possible characteristics of the unregistered subjects, to avoid that if a person has similar traits, the network can know that the subject is neither of the original. On the other hand, for the validation of the networks, 525 images are used, distributed in 75 images for each of the users and 375 for the category of “Others,” in order to verify that, although there are many different users, the networks are capable of discriminating against them. In Figure 1, it is possible to see samples of the images for each category. The size of the images varies according to the neural network to be used since not all of them have the standard size of 224×224 pixels. Eight of the most well-known ones are selected to compare the users' facial recognition capacity between different models of CNNs. These models are AlexNet [23], the two versions of the visual geometry group (VGG) model (16 and 19) [24], GoogLeNet [25], Inception V3 [26], and the ResNet models (18, 50 and 101) [27]. It is also proposed to implement two additional basic models to verify if low-depth architectures can maintain a level of recognition as good as the pre-trained models. Figure 1. Examples of images used in the database Javier Robinson Others
  • 3.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 192-198 194 The two proposed architectures are basic CNN consisting of convolution blocks, where the first one is a sequential network named shallow CNN since it is less deep than its counterparts (apart from AlexNet). The second architecture comprises two branches, one with convolution filters of size 3×3, and another with convolution filters of size 5×5, to learn different patterns of faces. The latter is called shallow DAG-CNN because of its different paths and depth, which remains similar to the previous one, although it has a total of twelve convolution layers. A general diagram of the two architectures can be seen in Figure 2, where S refers to the filter stride, P to the padding used, and the last value represents the number of filters used in that layer. The weights of the two proposed networks were initialized using the He method [28]. The same training parameters were set for all the networks, even for the two proposed architectures, to avoid giving one network a greater advantage than another. For the pre-trained models, mixed transfer learning is performed, i.e., the weights of the first convolution layers are frozen and used as feature extractors while the rest of the layers are fine-tuned. The parameters are as follows: learning rate of 10-3 with a reduction factor of 0.5 every four epochs; training will be done for eight epochs, with a mini-batch size of 8 per iteration. These parameters are selected because the models were mostly trained with a learning rate of 0.1, and their weights are expected to not vary greatly from the initial ones. Similarly, it is optional to train for many epochs to avoid over-adjustment. As for the classification section, its learning rate is multiplied by a factor of 10 since, at this stage, the network has not had initial learning, so its rate must be higher for its learning curve to be greater. During the training, three of the networks had problems with their learning: the AlexNet and VGG models, where their gradient tended to increase abruptly, preventing the network from finishing the training. For this reason, for these models, it was decided to carry out a transfer learning with complete fine-tuning in all its layers. Being pre-entrained and deep networks, it is the generalized characteristics of the architectures that extend the gradient loss. Figure 2. Proposed shallow CNN 3. RESULTS AND DISCUSSION For the first performance evaluation, all networks were compared, as shown in Figure 3, during their training and tested with the validation set, thus obtaining the behavior of the network accuracy in Figure 3(a). In this, the networks with the worst behavior were AlexNet and VGG. Although their losses were reduced in Figure 3(b), these nets fell into overfitting early, remaining below 75% accuracy. On the other hand, the rest of the networks achieved an accuracy above 95%, with shallow DAG-CNN and ResNet-18 as the two slowest learning networks. The fastest models were the GoogLeNet, the ResNet-50, and the ResNet-101, achieving more than 98% accuracy in their first epoch. The GoogLeNet and ResNet-101 models were the ones that obtained the best recognition performance, managing to discriminate without errors all the images. It is also possible to observe how the two shallow type networks obtained results above 98%; even the DAG-CNN achieved the same result as the Inception V3 without having previous learning and fewer layers than the Inception. AlexNet and the VGG maintained a low level of recognition because they could recognize either of the two users, i.e., the two registered users were recognized as one user, despite correctly classifying the category of “Others”. Table 1 shows the accuracies obtained with each of the trained architectures.
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of convolutional neural network models for user’s facial … (Javier Orlando Pinzón-Arenas) 195 (a) (b) Figure 3. The validation set is used in (a) network accuracy and (b) network loss during training Table 1. Comparison of user’s face recognition results Network Accuracy [%] Network Accuracy [%] Shallow CNN 98.67 GoogLeNet 100 Shallow DAG-CNN 99.05 Inception V3 99.05 AlexNet 71.43 ResNet-18 99.81 VGG-16 71.43 Resnet-50 99.81 VGG-19 14.29 ResNet-101 100 In order to enhance the testing and comparison of the models, it was proposed to verify their operation in real-time, adding a higher level of difficulty in recognition by changing the style of one of the users, that is, the beard and hairstyle. Most photos of the user “Javier” are presented as the examples shown in Figure 1, where he has hair and no facial hair. For real-time testing, the user uses full beard and hair removal to verify if the networks can recognize him. Face detection is performed using the Viola-Jones algorithm [29], with which the bounding box is cropped and then sent to the neural network. Each neural network was tested with the user “Javier” video sequence. Figure 4 shows a frame taken from the sequence, where the category in which each model classified the user's face is displayed. The AlexNet and VGG models maintained constant user recognition in other categories. On the other hand, the shallow and ResNet-18 types, although capable of recognizing the user, maintained a constant variation of categories, repeatedly classifying the user as unknown, even if the position of the face was frontal, as can be seen in the figure. As for the deeper networks, they managed to maintain an accurate recognition in most of the video, with few category changes, without confusing him with the other user or classifying him as unknown. Figure 4. Tests performed in real-time
  • 5.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 192-198 196 With the models that performed better in the real-time tests, i.e., those with less variation in the user's classification, another test is done, where the user makes slight rotations of the face, to check if the networks could maintain an accurate recognition. In this test, only two networks could maintain a correct classification, the Inception V3 and the ResNet-101, as seen in Figure 5. The percentages of success are respectively 92% and 76%, evidencing the better performance of the Inception architecture used. Figure 5. Face rotation tests The capability of these two networks (the Inception and the ResNet-101) is because, thanks to a large number of convolution blocks, they can learn specific patterns of the user's face, helping them to improve their recognition so that there is a change of style of the user. While architectures with less depth can recognize the user adequately, if a change is made that is not contemplated in the learning set, they will not be able to recognize the user because the learned characteristics will be more general. This factor is shown in Figure 6, where the first layer activations of the shallow DAG-CNN in the 3×3 filter-branch Figure 6(a) and the ResNet-101 Figure 6(b) are obtained from an image of the validation set (top) and an image of the test video (bottom). (a) (b) Figure 6. Activations of the first layer in (a) shallow DAG-CNN and (b) ResNet-101 architectures In shallow architecture, the network focuses on general face patterns, such as eyes, nostrils, and specific face parts. However, these are repeated through most filters without having variations of other user characteristics, as seen in the first image of activations. It can also distinguish shapes and edges, but only in slight sections of the face (such as the shape of the eye), without taking into account, for example, the general shape of the person's skull. Another pattern in their learning is found mainly only in the forehead, repeated by several filters, making a not very specific feature of the user to cover many activations.
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of convolutional neural network models for user’s facial … (Javier Orlando Pinzón-Arenas) 197 As for ResNet-101, although it also focuses its learning on these parts of the face, it does it in a more detailed way, without generalizing or joining several sections at the same time in one filter. But only aiming to discriminate certain patterns, achieving a better distribution of what each filter has learned. For instance, some filters learned the shape of the head, others the location of the eyes, and the nose, apart from its nostrils. Switching to its application in the real-time test further enhances each of the features learned by each network, whereas the shallow mostly keeps its activations on the whole face without discriminating more generally about the characteristics of the user. Meanwhile, ResNet can even highlight the shape of the user's face and head and the location of the ears. 4. CONCLUSION In this work, a comparison of different CNN models for facial recognition was made to verify the performance of each one. With these comparisons, it was demonstrated that the best networks for this application were GoogLeNet and ResNet-101, which managed to recognize the two users without error correctly and to discriminate against all subjects not belonging to the database. However, the shallow networks without pre-training, such as the shallow CNN and the DAG-CNN, could obtain high performance, even matching the capacity of the Inception V3. After performing a style change, an additional test was added to recognize one of the users. With this, it was found that the two networks with the greatest capability to withstand drastic changes in certain user characteristics are the Inception V3 and the ResNet-101, which have a greater capacity to learn detailed user features due to their depth. They managed to maintain constant recognition of the subject, even when performing face rotations. This robustness was demonstrated using the layer activations, comparing the learning of one of these against one of little depth, evidencing that these networks could learn more detailed patterns, allowing them to discriminate characteristic features of the user. ACKNOWLEDGMENTS The authors are grateful to Universidad Militar Nueva Granada for the funding of this Project and Universidad de los Llanos for all help to participate. REFERENCES [1] W. Niu, Y. Zhao, Z. Yu, Y. Liu, and Y. Gong, “Research on a face recognition algorithm based on 3D face data and 2D face image matching,” Journal of Visual Communication and Image Representation, vol. 91, Mar. 2023, doi: 10.1016/j.jvcir.2023.103757. [2] R. Rameswari, S. N. Kumar, M. A. Aananth, and C. Deepak, “Automated access control system using face recognition,” Materials Today: Proceedings, vol. 45, pp. 1251–1256, 2021, doi: 10.1016/j.matpr.2020.04.664. [3] S. M. Bah and F. Ming, “An improved face recognition algorithm and its application in attendance management system,” Array, vol. 5, Mar. 2020, doi: 10.1016/j.array.2019.100014. [4] C. Y. J. Liu and C. Wilkinson, “Image conditions for machine-based face recognition of juvenile faces,” Science & Justice, vol. 60, no. 1, pp. 43–52, Jan. 2020, doi: 10.1016/j.scijus.2019.10.001. [5] C. Zhao, X. Li, and Y. Dong, “Learning blur invariant binary descriptor for face recognition,” Neurocomputing, vol. 404, pp. 34–40, Sep. 2020, doi: 10.1016/j.neucom.2020.04.082. [6] A. Shah, B. Ali, M. Habib, J. Frnda, I. Ullah, and M. S. Anwar, “An ensemble face recognition mechanism based on three-way decisions,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 4, pp. 196–208, Apr. 2023, doi: 10.1016/j.jksuci.2023.03.016. [7] F. Tang et al., “An end-to-end face recognition method with alignment learning,” Optik, vol. 205, Mar. 2020, doi: 10.1016/j.ijleo.2020.164238. [8] D. Manju and V. Radha, “A novel approach for pose invariant face recognition in surveillance videos,” Procedia Computer Science, vol. 167, pp. 890–899, 2020, doi: 10.1016/j.procs.2020.03.428. [9] J. Yuan et al., “Gated CNN: Integrating multi-scale feature layers for object detection,” Pattern Recognition, vol. 105, Sep. 2020, doi: 10.1016/j.patcog.2019.107131. [10] I. Rafegas, M. Vanrell, L. A. Alexandre, and G. Arias, “Understanding trained CNNs by indexing neuron selectivity,” Pattern Recognition Letters, vol. 136, pp. 318–325, Aug. 2020, doi: 10.1016/j.patrec.2019.10.013. [11] A. M. S. Aradhya, A. Ashfahani, F. Angelina, M. Pratama, R. F. de Mello, and S. Sundaram, “Autonomous CNN (AutoCNN): A data-driven approach to network architecture determination,” Information Sciences, vol. 607, pp. 638–653, Aug. 2022, doi: 10.1016/j.ins.2022.05.100. [12] J. Qin, W. Pan, X. Xiang, Y. Tan, and G. Hou, “A biological image classification method based on improved CNN,” Ecological Informatics, vol. 58, Jul. 2020, doi: 10.1016/j.ecoinf.2020.101093. [13] Y. Li et al., “Robust detection for network intrusion of industrial IoT based on multi-CNN fusion,” Measurement, vol. 154, Mar. 2020, doi: 10.1016/j.measurement.2019.107450. [14] Q. Fu, C. Wang, and X. Han, “A CNN-LSTM network with attention approach for learning universal sentence representation in embedded system,” Microprocessors and Microsystems, vol. 74, Apr. 2020, doi: 10.1016/j.micpro.2020.103051. [15] Y. Tian, G. Yang, Z. Wang, E. Li, and Z. Liang, “Instance segmentation of apple flowers using the improved mask R–CNN model,” Biosystems Engineering, vol. 193, pp. 264–278, May 2020, doi: 10.1016/j.biosystemseng.2020.03.008. [16] G. Rajeshkumar et al., “Smart office automation via faster R-CNN based face recognition and internet of things,” Measurement: Sensors, vol. 27, Jun. 2023, doi: 10.1016/j.measen.2023.100719. [17] A. Budiman, R. A. Yaputera, S. Achmad, and A. Kurniawan, “Student attendance with face recognition (LBPH or CNN): systematic
  • 7.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 192-198 198 literature review,” Procedia Computer Science, vol. 216, pp. 31–38, 2023, doi: 10.1016/j.procs.2022.12.108. [18] F. Zhao, J. Li, L. Zhang, Z. Li, and S.-G. Na, “Multi-view face recognition using deep neural networks,” Future Generation Computer Systems, vol. 111, pp. 375–380, Oct. 2020, doi: 10.1016/j.future.2020.05.002. [19] S. R. Mishra, T. K. Mishra, G. Sanyal, A. Sarkar, and S. C. Satapathy, “Real time human action recognition using triggered frame extraction and a typical CNN heuristic,” Pattern Recognition Letters, vol. 135, pp. 329–336, Jul. 2020, doi: 10.1016/j.patrec.2020.04.031. [20] K. B. Pranav and J. Manikandan, “Design and evaluation of a real-time face recognition system using convolutional neural networks,” Procedia Computer Science, vol. 171, pp. 1651–1659, 2020, doi: 10.1016/j.procs.2020.04.177. [21] J. Lin, L. Zhao, Q. Wang, R. Ward, and Z. J. Wang, “DT-LET: deep transfer learning by exploring where to transfer,” Neurocomputing, vol. 390, pp. 99–107, May 2020, doi: 10.1016/j.neucom.2020.01.042. [22] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” Prepr. arXiv.1411.7766, Nov. 2014. [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386. [24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Prepr. arXiv.1409.1556, Sep. 2014. [25] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp. 1–9, doi: 10.1109/CVPR.2015.7298594. [26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2818–2826, doi: 10.1109/CVPR.2016.308. [27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90. [28] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: surpassing human-level performance on ImageNet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp. 1026–1034, doi: 10.1109/ICCV.2015.123. [29] P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, May 2004, doi: 10.1023/B:VISI.0000013087.49260.fb. BIOGRAPHIES OF AUTHORS Javier Orlando Pinzón Arenas was born in Socorro-Santander, Colombia, in 1990. He received his degree in mechatronics engineering (cum laude) and specialization in Engineering Project Management at Universidad Militar Nueva Granada-UMNG in 2013 and 2016, respectively. He has experience in the areas of automation, electronic control, and machine learning. Currently, he has a degree in mechatronics engineering and is working as a research assistant at the UMNG with an emphasis on robotics and machine learning. He can be contacted at est.javier.pinzon@unimilitar.edu.co. Robinson Jiménez-Moreno is an electronic engineer who graduated from Universidad Distrital Francisco José de Caldas in 2002. He received an M.Sc. in engineering from Universidad Nacional de Colombia in 2012 and a Ph.D. in engineering at Universidad Distrital Francisco José de Caldas in 2018. He is currently working as an assistant professor at Universidad Militar Nueva Granada and his research focuses on the use of convolutional neural networks for object recognition and image processing for robotic applications such as human- machine interaction. He can be contacted at robinson.jimenez@unimilitar.edu.co. His profile can be found at ResearchGate https://www.researchgate.net/profile/Robinson-Moreno-2. RedDOLAC https://reddolac.org/profile/RobinsonJimenezMoreno. Javier Eduardo Martinez Baquero is an electronic engineer who graduated from Universidad de los Llanos in 2002, a postgraduate in electronic instrumentation from Universidad Santo Tomas in 2004, a postgraduate in instrumentation and industrial control at Universidad de los Llanos in 2020, and an M.Sc. in educative technology and innovative media for education at Universidad Autonoma de Bucaramanga in 2013. He is currently working as an associate professor at Universidad de los Llanos and his research focuses on instrumentation, automation, control, and renewable energies. He can be contacted at jmartinez@unillanos.edu.co. His profile can be found at ResearchGate https://www.researchgate.net/profile/Javier-Martinez-Baquero and RedDOLAC https://reddolac.org/profile/JavierEduardoMartinezBaquero?xg_source=activity.