IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 14, No. 3, June 2025, pp. 1863~1869
ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i3.pp1863-1869  1863
Journal homepage: http://ijai.iaescore.com
Detection of partially occluded area in face image using U-Net
model
Jyothsna Cherapanamjeri1
, B. Narendra Kumar Rao2
1
Department of Computer Science and Engineering, JNTUA College of Engineering, Jawaharlal Nehru Technological University,
Ananthapur, India
2
Department of Artificial Intelligence and Machine Learning, School of Computing, Mohan Babu University (Erstwhile Sree
Vidyanikethan Engineering College), Tirupati, India
Article Info ABSTRACT
Article history:
Received Oct 21, 2024
Revised Feb 15, 2025
Accepted Mar 15, 2025
Occluded face recognition is important task in computer vision. To complete
the occluded face recognition efficiently, first we need to identify the
occluded region in face. Identifying the occluded region in face is a
challenging task in computer vision. One case of face occlusion is nothing
but wearing masks, sunglasses, and scarves. Another case of face occlusion
is face is hiding the other objects like books, things, or other faces. In our
research, identifying the occluded area which is corona virus disease of 2019
(COVID-19) masked area in face and generate segmentation map. In
semantic segmentation, deep learning-based techniques have demonstrated
promising outcomes. We have employed one of the deep learning-based
U-Net models to generate a binary segmentation map on masked region of a
human face. It achieves reliable performance and reducing network
complexity. We train our model on MaskedFace-CelebA dataset and
accuracy is 97.7%. Results from experiments demonstrate that, in
comparison to the most advanced semantic segmentation models, our
approach achieves a promising compromise between segmentation accuracy
and computing efficiency.
Keywords:
Artificial intelligence
Computer vision
Deep learning
Image segmentation
Occlusion
U-Net model
This is an open access article under the CC BY-SA license.
Corresponding Author:
Jyothsna Cherapanamjeri
Department of Computer Science and Engineering, JNTUA College of Engineering
Jawaharlal Nehru Technological University
Ananthapur, India
Email: jyothsna_513@yahoo.com
1. INTRODUCTION
When it comes to detecting occluded areas in face images, deep learning-based techniques have
shown excellent results. The COVID-19 masked area in facial images is the occluded area in our research.
Nowadays masks are used for different purposes such as COVID-19 pandemic widespread use of face masks
to prevent spread of disease, escape from crimes, save our health from pollution. These are the cases to wear
face masks. In such cases very difficult to identify faces while wearing masks. Face recognition technology is
commonly used to identify people based on their facial features, and studies have shown that it performs
extremely well. Many real time applications based on face recognition system such as face authentication-
based payment systems, face access control failed to effectively recognize the masked faces.
In our research, main objective is to identify occluded area in face images which is first step in
occluded face recognition. Occluded face recognition is challenging task in computer vision. This research
topic comes under computer vision which is the subarea of artificial intelligence. Computer vision is used to
interpret and understand visual world. To train our model using deep learning-based computer vision
 ISSN: 2252-8938
Int J Artif Intell, Vol. 14, No. 3, June 2025: 1863-1869
1864
technique called segmentation technique which is U-Net semantic segmentation model. By dividing the
images into semantically significant objects, semantic segmentation aims to interpret the images. It can be
used extensively in a variety of applications such as medical image analysis, agriculture, and remote sensing
images. The main aim of this research is to detect occluded area in face image using U-Net segmentation
model. Segmentation is the process of categorizing an image at the pixel level in order to identify the precise
area of occlusion. When one object hides a portion of another, this is known as occlusion in an image. Some
of the occluded face images as shown in Figure 1.
Figure 1. Occluded face images
Figure 1 shows the face images contains face with objects such as mask, sunglass, beard, and scarf.
When occurred these occluded face images in face recognition, very difficult to identify faces. In such cases
need to remove occlusion and identify faces in face recognition technology. For that purpose, first we need to
identify occluded area in faces and then remove occlusion in face images using deep learning-based models.
In our research focus on occluded area which is COVID-19 masked area detection in face images using
segmentation model called U-Net.
Semantic segmentation task challenging owning to detect objects in images. Several studies extract
the meaningful information from the images using segmentation task. According to Pan et al. [1], the
EG-TransUNet technique can significantly enhance the segmentation quality of biological pictures by
concurrently implementing the perdekamp emotional method (PEM), the feature fusion module based on
semantic guided attention, and the channel spatial attention (CSA_1) module into U-Net. On two well-known
colonoscopy datasets (CVC-ClinicDB and Vasir-SEG) by attaining 95.44 and 95.26% on mDice, respectively.
A better U-Net that achieves higher performance with less capacity by using a sparsely linked block with
multiscale filters [2]. A sinkhole dataset was manually gathered from various sources, considered, then
automatically marked with the suggested autoencoder, which shortened the time used for annotation while
creating accurate masks, in order to evaluate our approach. The performance and dependability of the model
were then further assessed using a benchmark nuclei dataset 94% of the time. Research by Dong et al. [3] to
tackle the scale variance issue and improve the segmentation outcomes, an superpixel segmentation pooling
(SSP) layer is integrated into the enhanced lightweight end-to-end semantic segmentation (ELES2)
architecture to accomplish end-to-end efficient semantic segmentation of high-resolution remote sensing
(HRRS) images. ELES2 can retain great computing efficiency while achieving promising segmentation
accuracy. With just 12.62 M1 parameters and 13.09 floating-point operations per second (FLOPs), ion
ELES2 obtains mIoU of 80.16 and 73.20% on the IPSRS Potsdam and Vaihingen_1 dataset, respectively.
According to Gao et al. [4], a unique Swin-Unet network to enhance multi-scale lesion segmentation
precision in COVID-19 CT scans. Research by Chen et al. [5] to improve the segmentation quality of
biological pictures, a new transformer-based attention-directed U-Net called TransAttUnet simultaneously
integrates multi-level guided attention and multi-scale skip connections into U-Net. Zuo et al. [6] suggests in
order to accurately segment crop seedlings in their natural habitat and accomplish autonomous assessment of
seedling location and phenotype. Gite et al. [7] use U-Net architecture used in lung segmentation using x-ray.
Half-UNet, TransUNet, DS-TransUNet dual swin transformer [8]–[11] used for medical image processing.
Research by Shelhamer et al. [12] fully convolutional network (FCN) on semantic segmentation and scene
parsing, exploring PASCAL visual object classes (VOC)1, NYU-Depth v2 dataset (NYUDv2), and scale-
invariant feature transform (SIFT) flow. Despite the fact that these activities have traditionally made a
distinction between areas and objects. Research by Li et al. [13] to parallelize the semantic segmentation of
target detection, a fast instance segmentation method based on metric learning is suggested for both log end
face detection and semantic segmentation. Literate survey on different image segmentation techniques
[14]–[19]. Object detection of aerial images, face detection and segmentation, detection of grape clusters
using mask region-based convolutional neural networks (R-CNN) [20]–[23]. The Deeplabv3+ model's
architecture with attention mechanisms for segmenting ocular images [24]. Different evaluations metrics
used for semantic segmentation techniques [25].
Int J Artif Intell ISSN: 2252-8938 
Detection of partially occluded area in face image using U-Net model (Jyothsna Cherapanamjeri)
1865
The inference from literature review is that semantic segmentation is used in many different real-
time applications such as medical image segmentation, agriculture, object detection in aerial images, and face
detection. By inspiring existing work, we apply this semantic technique to occlusion area detection in face
images which is masked area in face image. The main significant of this research work is to guide to remove
the occlusion in face images and recognizes the faces in real-time. Without identifying occlusion in face
images very difficult to remove occlusion.
The majority of existing approaches have focused on enhancing the network capacity to enhance the
model’s functionality. This will result in significant disadvantages: i) it makes there more layers, ii) it
increases the likelihood of overfitting in the neural network, and iii) more training samples are needed.
Summary of our primary contributions as follows:
‒ A U-Net Model that effectively identify the occluded area in the face images in-terms of the binary
segmentation mask. The main advantage of this model is to identify occluded area in limited data
sources.
‒ Providing a comprehensive review of the performance of segmentation models.
‒ The superiority and generalizability of the suggested U-Net for automatic occluded area segmentation
are demonstrated by extensive experimental findings on datasets of occluded face images.
‒ Conducting a comparative study of three state-of-the-art (SOTA) segmentation models, namely FCN,
DeepLabv3+, and Pyramid scene parsing network (PSPNET).
2. METHOD
In this section, in order to detect the occluded area in face images-in this case, the masked area in
faces-we describe our novel network architecture. Mask covered by nose, mouth, chin areas. Detecting
occluded area in face image is challenging task in computer vision. This is very important tasks for occlusion
removal. The goal of our research is to generate a binary segmentation map of the masked object in the face
image input.
2.1. Understanding the U-Net basic architecture
U-Net is a deep learning-based computer vision framework. U-Net architecture is look like a
U-shaped architecture. The main purpose of U-Net architecture is used to segments images accurately. There
are contracting and extending paths in the U-Net design. The contracting path consists of encoder layers. This
encoder layers receives contextual information and decrease the spatial resolution of the input. The
expanding path includes decoder layers that decoded the already encoded data and use the information from
the contracting path via skip connections to generate the binary segmentation map. Finding the appropriate
features in the input image is the primary goal of the contracting path. The contracting path is same as the
convolutional neural network. The operation of convolutional neural network is to convolutional operation
followed by rectified linear unit (ReLU). Every block comprises of two consecutive 3×3 convolutional
layers, succeeded by an activation function of ReLU. Following the convolution operation, max pooling 2×2
operations with stride 2 is used. The next convolutional layer doubles the number of filters employed after
each max pooling step. A layer that begins with 64 feature channels, for instance, will have 128 channels
following the subsequent pooling and convolution processes. On the other hand, expansive path which takes
the extracted input features and generate segmentation mask. Each block is followed by up sampling layer.
The decoder layers in the expansive path upsampling the feature maps. The decoder layers can detect the
features more precisely by using the skip connections from the contracting path to the expansive path, which
helps to preserve lost spatial information.
Figure 2 illustrates how the U-Net network converts a gray scale input image of size 572×572×1
into a binary segmented output map of size 388×388×2. Since no padding has been used, the output size is
smaller than the input size, which can be observed. We have to keep the input size the same if we can use the
padding. The input image rapidly decreases width and height as the number of channels grows along the
contracting path. If more channels exist, the network may gather higher-level data. A last convolution
operation at the bottleneck produces a feature map with 30×30×1024 pixels. After removing the feature map
from the bottleneck, the expansive path resizes it to fit the initial input size upsampling layers, which
decrease the number of channels in the feature map while improving its spatial resolution, are used to achieve
this. The decoder layers use the skip connections from the contraction path to locate and refine the image's
features. In the end, each pixel in the output image corresponds to a label in the input image that is connected
to a particular object or class. Each pixel in this output map shows either the background or the foreground
because it is a binary segmentation map. Basic steps in proposed methodology:
‒ The input image is delivered to the contracting path, which seeks to capture relevant details about the
input image while reducing the spatial dimensions of the image.
‒ After extracting complex characteristics and patterns, the feature map proceeds to the expanding path.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 14, No. 3, June 2025: 1863-1869
1866
‒ The expanding path upsamples the input feature map and combines learned features using both
convolution and up-convolution processes to produce a segmentation map.
‒ Skip connections are used to double the feature channels and concatenate the relevant feature map from
the contracting path.
‒ Each pixel in the feature map should be classified separately in a segmentation map that is produced
after upsampling it in the expanding path.
Figure 2. Proposed system architecture
3. RESULTS AND DISCUSSION
In this section, we present the experimental setup and outcomes of our method. We begin by
describing the dataset used for training and evaluation purposes. The training process is then described,
together with the particulars and configurations that were used. We develop appropriate evaluation metrics
and apply them to our network to evaluate its performance. Carried out a few tests to evaluate our suggested
model against SOTA techniques.
3.1. Dataset
The MaskedFace-CelebA dataset is available to the public. This dataset is constructed from CelebA
dataset using the MaskTheFace tool. This dataset is used in the field of computer vision, especially for face
analysis tasks. We assess the models mentioned previously using the masked-face datasets. This dataset
contains 21,844 masked face images and corresponding target or ground truth images on occluded area.
Images of size are 256×256 pixels. These images are randomly divided into 17,476 (80%) images for
training, 2,184 (10%) images for validation, and 2,184 (10%) images for testing from the MaskedFace-
CelebA dataset.
3.2. Evaluation metric
In our experiment, we use accuracy as the main parameter to determine how closely the predicted
mask and ground truth match. This metric is associated with four values i.e., true-positive (TP), true-negative
(TN), false-positive (FP), and false-negative (FN). The formula for accuracy is shown in (1).
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑁+𝐹𝑃
(1)
The number of occluded area pixels, or masked area pixels that have been correctly identified, is
represented by TP. The number of background pixels that are incorrectly classified as background is known
as FP. The number of occluded area pixels that are incorrectly classified as background is known as FP. The
number of background pixels that are correctly classified as background is known as TN.
3.3. Experimental settings
The network is trained using the input images and the segmentation maps that correspond to them.
All the experiments are implemented with Tensorflow which is Python framework. An 8-core 32G+1 V100
GPU PC and a 3.8 GHz CPU are used in the experiment to train the network. Adam is employed as an
Int J Artif Intell ISSN: 2252-8938 
Detection of partially occluded area in face image using U-Net model (Jyothsna Cherapanamjeri)
1867
optimizer to adjust the occluded face image segmentation network's parameters. This work employed the
following super parameters: batch size of 16, image resolution of 256, epochs of 100, and starting learning
rate η of 0.05. In this manuscript, 100 was utilized for epochs, which are the number of experimental training
rounds. The images used for experimental training and testing were converted into 256×256 resolution.
Batch_Size is the number of samples selected for one training.
3.4. Experimental results
To evaluate the efficacy of the proposed U-Net we first conduct the experiments on the
MaskedFace-CelebA dataset for the task of masked area segmentation. The results of our proposed model as
shown in Figure 3. The image as shown in Figure 3(a) is the masked face image which is input to our
proposed model. The image as shown in Figure 3(b) is the mask which is the ground truth label. This is what
our model must predicted for the given masked face image. The image as shown in Figure 3(c) predicted
occluded region. The white region denotes the masked area which is occlude area and the black region
denotes the no occluded area. Notice that if the mask is entirely black this means there are no occluded area
deposits in the given masked face image.
(a) (b) (c)
Figure 3. Experimental result for U-Net model on MaskedFace-CelebA dataset of (a) the original occludes
face image, (b) ground truth, and (c) predicted occluded area
3.5. Comparative study
We have contrasted the obtained experimental results with those of other semantic segmentation
approaches in order to assess the effectiveness of our suggested method further. The Table 1 shows the
comparative analysis of various existing semantic segmentation models. This comparative analysis says that
superiority of our proposed model for detection of occluded region in human face images. We noticed that
our model performs 97.7% accuracy with other three methods as shown in Table 1. From the comparative
study as shown in Figure 4, it is observed that proposed system provides appreciable accuracy of 97.7%. It
outperforms compared to existing image segmentation algorithms.
Table 1. Comparative analysis of various image segmentation techniques
S. no Method Accuracy (%)
1 FCN [12] 90.3
DeepLabv3+ [24] 93.4
3 PSPNet [25] 94.2
4 The proposed model 97.7
Figure 4. Graphical representation of comparative study between proposed system and existing system
approaches with reference to accuracy
86
88
90
92
94
96
98
100
Fully Convolutional
Network (FCN)
Deep Lab V3+ Pyramid Scene Parsing
Network (PSPNet)
The Proposed Model
Accuracy
Method
 ISSN: 2252-8938
Int J Artif Intell, Vol. 14, No. 3, June 2025: 1863-1869
1868
3.6. Ablation study
Role of using skip connections: we investigate the effectiveness of using skip connections. Assume
that a U-Net without skip connections would produce a blurry mess. If you would be interested, it could be
interesting to try disabling the skip connection and see what happens. By gradually adding more features to a
convolutions-based vision model, they transform it from an unstable and inaccurate model into a SOTA one.
By avoiding the bottleneck and giving the model access to the encoder's intermediate activations-which
include these fine-grained, high-resolution details-skip connections help solve this issue. That is what is
meant by recover fine grained details in the prediction.
4. CONCLUSION AND FUTURE WORK
This paper proposed a novel method for binary segmentation map on occluded area detection in face
images called U-Net model. This proposed model is improved version of fully convolutional network. The
experimental results show that our model efficiently segmenting occluded area in human face image, in our
case occluded area is masked area. Despite the promising results on dataset. Experiments on publicly
available dataset which is MaskedFace-CelebA dataset shows 97.7% segmentation accuracy. It produces high
perpetual quality results compared to other SOTA image segmentation methods. The main advantage of our
model has fewer parameters, limited dataset and faster in speed. This work is useful for masked face
recognition in real-time. There are promising areas for further study and advancements in face mask
identification, such as improving our model's performance in difficult situations like high occlusion or
complex mask patterns. Enhancing the dataset for different sizes of masks and shapes. We also think that
merging with facial recognition research has a lot of potential. We can increase the precision and
dependability of facial recognition systems by eliminating masks as a preprocessing step.
FUNDING INFORMATION
Authors state no funding involved.
AUTHOR CONTRIBUTIONS STATEMENT
This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author
contributions, reduce authorship disputes, and facilitate collaboration.
Name of Author C M So Va Fo I R D O E Vi Su P Fu
Jyothsna Cherapanamjeri ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
B. Narendra Kumar Rao ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
C : Conceptualization
M : Methodology
So : Software
Va : Validation
Fo : Formal analysis
I : Investigation
R : Resources
D : Data Curation
O : Writing - Original Draft
E : Writing - Review & Editing
Vi : Visualization
Su : Supervision
P : Project administration
Fu : Funding acquisition
CONFLICT OF INTEREST STATEMENT
Authors state no conflict of interest.
DATA AVAILABILITY
The data that support the findings of this study are openly available at
https://drive.google.com/drive/folders/1EJbxfgTVHDBNvfe7KzESwJoWc8e4J2HJ?usp=share_link.
REFERENCES
[1] S. Pan, X. Liu, N. Xie, and Y. Chong, “EG-TransUNet: a transformer-based U-Net with enhanced and guided models for
biomedical image segmentation,” BMC Bioinformatics, vol. 24, no. 1, Mar. 2023, doi: 10.1186/s12859-023-05196-1.
[2] R. Alshawi, M. T. Hoque, and M. C. Flanagin, “A depth-wise separable U-Net architecture with multiscale filters to detect
sinkholes,” Remote Sensing, vol. 15, no. 5, Feb. 2023, doi: 10.3390/rs15051384.
[3] H. Dong, B. Yu, W. Wu, and C. He, “Enhanced lightweight end-to-end semantic segmentation for high-resolution remote sensing
images,” IEEE Access, vol. 10, pp. 70947–70954, 2022, doi: 10.1109/ACCESS.2022.3182370.
[4] Z.-J. Gao, Y. He, and Y. Li, “A novel lightweight Swin-Unet network for semantic segmentation of COVID-19 lesion in CT
images,” IEEE Access, vol. 11, pp. 950–962, 2023, doi: 10.1109/ACCESS.2022.3232721.
Int J Artif Intell ISSN: 2252-8938 
Detection of partially occluded area in face image using U-Net model (Jyothsna Cherapanamjeri)
1869
[5] B. Chen, Y. Liu, Z. Zhang, G. Lu, and A. W. K. Kong, “TransAttUnet: Multi-Level Attention-Guided U-Net with transformer for
medical image segmentation,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 1, pp. 55–68,
Feb. 2024, doi: 10.1109/TETCI.2023.3309626.
[6] X. Zuo, H. Lin, D. Wang, and Z. Cui, “A method of crop seedling plant segmentation on edge information fusion model,” IEEE
Access, vol. 10, pp. 95281–95293, 2022, doi: 10.1109/ACCESS.2022.3187825.
[7] S. Gite, A. Mishra, and K. Kotecha, “Enhanced lung image segmentation using deep learning,” Neural Computing and
Applications, vol. 35, no. 31, pp. 22839–22853, Nov. 2023, doi: 10.1007/s00521-021-06719-8.
[8] H. Lu, Y. She, J. Tie, and S. Xu, “Half-UNet: a simplified U-Net architecture for medical image segmentation,” Frontiers in
Neuroinformatics, vol. 16, Jun. 2022, doi: 10.3389/fninf.2022.911679.
[9] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” Medical Image
Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241, 2015, doi: 10.1007/978-3-319-24574-4_28.
[10] J. Chen et al., “TransUNet: transformers make strong encoders for medical image segmentation,” arXiv-Computer Science, 2021.
[11] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “DS-TransUNet: dual swin transformer U-Net for medical image
segmentation,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–15, 2022, doi: 10.1109/TIM.2022.3178991.
[12] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, Apr. 2017, doi: 10.1109/TPAMI.2016.2572683.
[13] H. Li, J. Liu, and D. Wang, “A fast instance segmentation technique for log end faces based on metric learning,” Forests, vol. 14,
no. 4, Apr. 2023, doi: 10.3390/f14040795.
[14] S. Sahu, H. Sarma, and D. J. Bora, “Image segmentation and its different techniques: an in-depth analysis,” in 2018 International
Conference on Research in Intelligent and Computing in Engineering (RICE), Aug. 2018, pp. 1–7, doi: 10.1109/RICE.2018.8509038.
[15] S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning:
a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3523-3542, 2022, doi:
10.1109/TPAMI.2021.3059968.
[16] H. Ramadan, C. Lachqar, and H. Tairi, “A survey of recent interactive image segmentation methods,” Computational Visual
Media, vol. 6, no. 4, pp. 355–384, Dec. 2020, doi: 10.1007/s41095-020-0177-5.
[17] U. Sehar and M. L. Naseem, “How deep learning is empowering semantic segmentation,” Multimedia Tools and Applications,
vol. 81, no. 21, pp. 30519–30544, Sep. 2022, doi: 10.1007/s11042-022-12821-3.
[18] I. Ahmed, M. Ahmad, F. A. Khan, and M. Asif, “Comparison of deep-learning-based segmentation models: using top view person
images,” IEEE Access, vol. 8, pp. 136361–136373, 2020, doi: 10.1109/ACCESS.2020.3011406.
[19] H. Zhang, H. Sun, W. Ao, and G. Dimirovski, “A survey on instance segmentation: recent advances and challenges,”
International Journal of Innovative Computing, Information and Control, vol. 17, no. 3, pp. 1041–1053, 2021, doi:
10.24507/ijicic.17.03.1041.
[20] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision
(ICCV), Oct. 2017, pp. 2980–2988, doi: 10.1109/ICCV.2017.322.
[21] Musyarofah, V. Schmidt, and M. Kada, “Object detection of aerial image using mask-region convolutional neural network (mask
R-CNN),” IOP Conference Series: Earth and Environmental Science, Jun. 2020, doi: 10.1088/1755-1315/500/1/012090.
[22] K. Lin et al., “Face detection and segmentation based on improved mask R-CNN,” Discrete Dynamics in Nature and Society,
vol. 2020, no. 1, pp. 1–11, May 2020, doi: 10.1155/2020/9242917.
[23] L. Shen et al., “Fusing attention mechanism with Mask R-CNN for instance segmentation of grape cluster in the field,” Frontiers
in Plant Science, vol. 13, Jul. 2022, doi: 10.3389/fpls.2022.934450.
[24] C.-Y. Hsu, R. Hu, Y. Xiang, X. Long, and Z. Li, “Improving the Deeplabv3+ model with attention mechanisms applied to eye
detection and segmentation,” Mathematics, vol. 10, no. 15, Jul. 2022, doi: 10.3390/math10152597.
[25] C. Zhang, J. Zhao, and Y. Feng, “Research on semantic segmentation based on improved PSPNet,” in 2023 International
Conference on Intelligent Perception and Computer Vision (CIPCV), May 2023, pp. 1–6, doi: 10.1109/CIPCV58883.2023.00012.
BIOGRAPHIES OF AUTHORS
Jyothsna Cherapanamjeri is pursuing her Ph.D. JNTUA, Anantapur, master
degree in 2009 in SV University and bachelor degree in 2005 in JNTUH, Hyderabad, Andhra
Pradesh, India. She is currently doing Ph.D. in JNTUA, Anantapur. Her area of interests is
artificial intelligence, machine learning, computer vision, deep learning, data science, and IoT.
She has 15 plus years of teaching experience. GATE qualified in 2007. APRCET qualified in
2019. She registered for Google Scholar and Research Gate for latest developments in
technology. She has 5 publications in the field of artificial intelligence. She can be contacted at
email: jyothsnamtech@gmail.com or jyothsna_cse@513@yahoo.com.
B. Narendra Kumar Rao was obtained Bachelor Degree in Computer Science
and Engineering from University of Madras, M.Tech. and Ph.D. in computer science from
JNTU, Hyderabad. He has more than 22 years of experience in area of computer science and
engineering which includes four years of industrial experience and sixteen years of teaching
experience. Research interests include software engineering, deep learning, and embedded
systems. Currently he is working as Professor and Head, Department of Computer Science and
Engineering at Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College).
He has 35 publications in his credit till date in reputed journals and conferences. He can be
contacted at email: narendrakumarraob@gmail.com.

More Related Content

PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Customer segmentation using association rule mining on retail transaction data
PDF
Averaged bars for cryptocurrency price forecasting across different horizons
PDF
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
PDF
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
Convolutional neural network based encoder-decoder for efficient real-time ob...
Comparative analysis of machine learning models for fake news detection in so...
Enhancing plagiarism detection using data pre-processing and machine learning...
Improvisation in detection of pomegranate leaf disease using transfer learni...
Customer segmentation using association rule mining on retail transaction data
Averaged bars for cryptocurrency price forecasting across different horizons
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...

More from IAESIJAI (20)

PDF
Assured time series forecasting using inertial measurement unit, neural netwo...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Novel artificial intelligence-based ensemble learning for optimized software ...
PDF
GradeZen: automated grading ecosystem using deep learning for educational ass...
PDF
Leveraging artificial intelligence through long short-term memory approach fo...
PDF
Application of the adaptive neuro-fuzzy inference system for prediction of th...
PDF
Novel preemptive intelligent artificial intelligence-model for detecting inco...
PDF
Techniques of Quran reciters recognition: a review
PDF
ApDeC: A rule generator for Alzheimer's disease prediction
PDF
Exploring patient-patient interactions graphs by network analysis
PDF
Review on class imbalance techniques to strengthen model prediction
PDF
Artificial intelligence multilingual image-to-speech for accessibility and te...
PDF
Comprehensive survey of automated plant leaf disease identification technique...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Graph-based methods for transaction databases: a comparative study
PDF
Developing a website for English-speaking practice to English as a foreign la...
Assured time series forecasting using inertial measurement unit, neural netwo...
Flame analysis and combustion estimation using large language and vision assi...
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
The influence of sentiment analysis in enhancing early warning system model f...
Novel artificial intelligence-based ensemble learning for optimized software ...
GradeZen: automated grading ecosystem using deep learning for educational ass...
Leveraging artificial intelligence through long short-term memory approach fo...
Application of the adaptive neuro-fuzzy inference system for prediction of th...
Novel preemptive intelligent artificial intelligence-model for detecting inco...
Techniques of Quran reciters recognition: a review
ApDeC: A rule generator for Alzheimer's disease prediction
Exploring patient-patient interactions graphs by network analysis
Review on class imbalance techniques to strengthen model prediction
Artificial intelligence multilingual image-to-speech for accessibility and te...
Comprehensive survey of automated plant leaf disease identification technique...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Graph-based methods for transaction databases: a comparative study
Developing a website for English-speaking practice to English as a foreign la...
Ad

Recently uploaded (20)

PPTX
Modernising the Digital Integration Hub
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
CloudStack 4.21: First Look Webinar slides
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Five Habits of High-Impact Board Members
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
August Patch Tuesday
PDF
STKI Israel Market Study 2025 version august
PDF
Hybrid model detection and classification of lung cancer
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Chapter 5: Probability Theory and Statistics
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Unlock new opportunities with location data.pdf
Modernising the Digital Integration Hub
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Group 1 Presentation -Planning and Decision Making .pptx
CloudStack 4.21: First Look Webinar slides
A novel scalable deep ensemble learning framework for big data classification...
DP Operators-handbook-extract for the Mautical Institute
Five Habits of High-Impact Board Members
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Getting Started with Data Integration: FME Form 101
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
August Patch Tuesday
STKI Israel Market Study 2025 version august
Hybrid model detection and classification of lung cancer
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Benefits of Physical activity for teenagers.pptx
Architecture types and enterprise applications.pdf
Zenith AI: Advanced Artificial Intelligence
Chapter 5: Probability Theory and Statistics
A comparative study of natural language inference in Swahili using monolingua...
Unlock new opportunities with location data.pdf
Ad

Detection of partially occluded area in face image using U-Net model

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 14, No. 3, June 2025, pp. 1863~1869 ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i3.pp1863-1869  1863 Journal homepage: http://ijai.iaescore.com Detection of partially occluded area in face image using U-Net model Jyothsna Cherapanamjeri1 , B. Narendra Kumar Rao2 1 Department of Computer Science and Engineering, JNTUA College of Engineering, Jawaharlal Nehru Technological University, Ananthapur, India 2 Department of Artificial Intelligence and Machine Learning, School of Computing, Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College), Tirupati, India Article Info ABSTRACT Article history: Received Oct 21, 2024 Revised Feb 15, 2025 Accepted Mar 15, 2025 Occluded face recognition is important task in computer vision. To complete the occluded face recognition efficiently, first we need to identify the occluded region in face. Identifying the occluded region in face is a challenging task in computer vision. One case of face occlusion is nothing but wearing masks, sunglasses, and scarves. Another case of face occlusion is face is hiding the other objects like books, things, or other faces. In our research, identifying the occluded area which is corona virus disease of 2019 (COVID-19) masked area in face and generate segmentation map. In semantic segmentation, deep learning-based techniques have demonstrated promising outcomes. We have employed one of the deep learning-based U-Net models to generate a binary segmentation map on masked region of a human face. It achieves reliable performance and reducing network complexity. We train our model on MaskedFace-CelebA dataset and accuracy is 97.7%. Results from experiments demonstrate that, in comparison to the most advanced semantic segmentation models, our approach achieves a promising compromise between segmentation accuracy and computing efficiency. Keywords: Artificial intelligence Computer vision Deep learning Image segmentation Occlusion U-Net model This is an open access article under the CC BY-SA license. Corresponding Author: Jyothsna Cherapanamjeri Department of Computer Science and Engineering, JNTUA College of Engineering Jawaharlal Nehru Technological University Ananthapur, India Email: jyothsna_513@yahoo.com 1. INTRODUCTION When it comes to detecting occluded areas in face images, deep learning-based techniques have shown excellent results. The COVID-19 masked area in facial images is the occluded area in our research. Nowadays masks are used for different purposes such as COVID-19 pandemic widespread use of face masks to prevent spread of disease, escape from crimes, save our health from pollution. These are the cases to wear face masks. In such cases very difficult to identify faces while wearing masks. Face recognition technology is commonly used to identify people based on their facial features, and studies have shown that it performs extremely well. Many real time applications based on face recognition system such as face authentication- based payment systems, face access control failed to effectively recognize the masked faces. In our research, main objective is to identify occluded area in face images which is first step in occluded face recognition. Occluded face recognition is challenging task in computer vision. This research topic comes under computer vision which is the subarea of artificial intelligence. Computer vision is used to interpret and understand visual world. To train our model using deep learning-based computer vision
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 14, No. 3, June 2025: 1863-1869 1864 technique called segmentation technique which is U-Net semantic segmentation model. By dividing the images into semantically significant objects, semantic segmentation aims to interpret the images. It can be used extensively in a variety of applications such as medical image analysis, agriculture, and remote sensing images. The main aim of this research is to detect occluded area in face image using U-Net segmentation model. Segmentation is the process of categorizing an image at the pixel level in order to identify the precise area of occlusion. When one object hides a portion of another, this is known as occlusion in an image. Some of the occluded face images as shown in Figure 1. Figure 1. Occluded face images Figure 1 shows the face images contains face with objects such as mask, sunglass, beard, and scarf. When occurred these occluded face images in face recognition, very difficult to identify faces. In such cases need to remove occlusion and identify faces in face recognition technology. For that purpose, first we need to identify occluded area in faces and then remove occlusion in face images using deep learning-based models. In our research focus on occluded area which is COVID-19 masked area detection in face images using segmentation model called U-Net. Semantic segmentation task challenging owning to detect objects in images. Several studies extract the meaningful information from the images using segmentation task. According to Pan et al. [1], the EG-TransUNet technique can significantly enhance the segmentation quality of biological pictures by concurrently implementing the perdekamp emotional method (PEM), the feature fusion module based on semantic guided attention, and the channel spatial attention (CSA_1) module into U-Net. On two well-known colonoscopy datasets (CVC-ClinicDB and Vasir-SEG) by attaining 95.44 and 95.26% on mDice, respectively. A better U-Net that achieves higher performance with less capacity by using a sparsely linked block with multiscale filters [2]. A sinkhole dataset was manually gathered from various sources, considered, then automatically marked with the suggested autoencoder, which shortened the time used for annotation while creating accurate masks, in order to evaluate our approach. The performance and dependability of the model were then further assessed using a benchmark nuclei dataset 94% of the time. Research by Dong et al. [3] to tackle the scale variance issue and improve the segmentation outcomes, an superpixel segmentation pooling (SSP) layer is integrated into the enhanced lightweight end-to-end semantic segmentation (ELES2) architecture to accomplish end-to-end efficient semantic segmentation of high-resolution remote sensing (HRRS) images. ELES2 can retain great computing efficiency while achieving promising segmentation accuracy. With just 12.62 M1 parameters and 13.09 floating-point operations per second (FLOPs), ion ELES2 obtains mIoU of 80.16 and 73.20% on the IPSRS Potsdam and Vaihingen_1 dataset, respectively. According to Gao et al. [4], a unique Swin-Unet network to enhance multi-scale lesion segmentation precision in COVID-19 CT scans. Research by Chen et al. [5] to improve the segmentation quality of biological pictures, a new transformer-based attention-directed U-Net called TransAttUnet simultaneously integrates multi-level guided attention and multi-scale skip connections into U-Net. Zuo et al. [6] suggests in order to accurately segment crop seedlings in their natural habitat and accomplish autonomous assessment of seedling location and phenotype. Gite et al. [7] use U-Net architecture used in lung segmentation using x-ray. Half-UNet, TransUNet, DS-TransUNet dual swin transformer [8]–[11] used for medical image processing. Research by Shelhamer et al. [12] fully convolutional network (FCN) on semantic segmentation and scene parsing, exploring PASCAL visual object classes (VOC)1, NYU-Depth v2 dataset (NYUDv2), and scale- invariant feature transform (SIFT) flow. Despite the fact that these activities have traditionally made a distinction between areas and objects. Research by Li et al. [13] to parallelize the semantic segmentation of target detection, a fast instance segmentation method based on metric learning is suggested for both log end face detection and semantic segmentation. Literate survey on different image segmentation techniques [14]–[19]. Object detection of aerial images, face detection and segmentation, detection of grape clusters using mask region-based convolutional neural networks (R-CNN) [20]–[23]. The Deeplabv3+ model's architecture with attention mechanisms for segmenting ocular images [24]. Different evaluations metrics used for semantic segmentation techniques [25].
  • 3. Int J Artif Intell ISSN: 2252-8938  Detection of partially occluded area in face image using U-Net model (Jyothsna Cherapanamjeri) 1865 The inference from literature review is that semantic segmentation is used in many different real- time applications such as medical image segmentation, agriculture, object detection in aerial images, and face detection. By inspiring existing work, we apply this semantic technique to occlusion area detection in face images which is masked area in face image. The main significant of this research work is to guide to remove the occlusion in face images and recognizes the faces in real-time. Without identifying occlusion in face images very difficult to remove occlusion. The majority of existing approaches have focused on enhancing the network capacity to enhance the model’s functionality. This will result in significant disadvantages: i) it makes there more layers, ii) it increases the likelihood of overfitting in the neural network, and iii) more training samples are needed. Summary of our primary contributions as follows: ‒ A U-Net Model that effectively identify the occluded area in the face images in-terms of the binary segmentation mask. The main advantage of this model is to identify occluded area in limited data sources. ‒ Providing a comprehensive review of the performance of segmentation models. ‒ The superiority and generalizability of the suggested U-Net for automatic occluded area segmentation are demonstrated by extensive experimental findings on datasets of occluded face images. ‒ Conducting a comparative study of three state-of-the-art (SOTA) segmentation models, namely FCN, DeepLabv3+, and Pyramid scene parsing network (PSPNET). 2. METHOD In this section, in order to detect the occluded area in face images-in this case, the masked area in faces-we describe our novel network architecture. Mask covered by nose, mouth, chin areas. Detecting occluded area in face image is challenging task in computer vision. This is very important tasks for occlusion removal. The goal of our research is to generate a binary segmentation map of the masked object in the face image input. 2.1. Understanding the U-Net basic architecture U-Net is a deep learning-based computer vision framework. U-Net architecture is look like a U-shaped architecture. The main purpose of U-Net architecture is used to segments images accurately. There are contracting and extending paths in the U-Net design. The contracting path consists of encoder layers. This encoder layers receives contextual information and decrease the spatial resolution of the input. The expanding path includes decoder layers that decoded the already encoded data and use the information from the contracting path via skip connections to generate the binary segmentation map. Finding the appropriate features in the input image is the primary goal of the contracting path. The contracting path is same as the convolutional neural network. The operation of convolutional neural network is to convolutional operation followed by rectified linear unit (ReLU). Every block comprises of two consecutive 3×3 convolutional layers, succeeded by an activation function of ReLU. Following the convolution operation, max pooling 2×2 operations with stride 2 is used. The next convolutional layer doubles the number of filters employed after each max pooling step. A layer that begins with 64 feature channels, for instance, will have 128 channels following the subsequent pooling and convolution processes. On the other hand, expansive path which takes the extracted input features and generate segmentation mask. Each block is followed by up sampling layer. The decoder layers in the expansive path upsampling the feature maps. The decoder layers can detect the features more precisely by using the skip connections from the contracting path to the expansive path, which helps to preserve lost spatial information. Figure 2 illustrates how the U-Net network converts a gray scale input image of size 572×572×1 into a binary segmented output map of size 388×388×2. Since no padding has been used, the output size is smaller than the input size, which can be observed. We have to keep the input size the same if we can use the padding. The input image rapidly decreases width and height as the number of channels grows along the contracting path. If more channels exist, the network may gather higher-level data. A last convolution operation at the bottleneck produces a feature map with 30×30×1024 pixels. After removing the feature map from the bottleneck, the expansive path resizes it to fit the initial input size upsampling layers, which decrease the number of channels in the feature map while improving its spatial resolution, are used to achieve this. The decoder layers use the skip connections from the contraction path to locate and refine the image's features. In the end, each pixel in the output image corresponds to a label in the input image that is connected to a particular object or class. Each pixel in this output map shows either the background or the foreground because it is a binary segmentation map. Basic steps in proposed methodology: ‒ The input image is delivered to the contracting path, which seeks to capture relevant details about the input image while reducing the spatial dimensions of the image. ‒ After extracting complex characteristics and patterns, the feature map proceeds to the expanding path.
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 14, No. 3, June 2025: 1863-1869 1866 ‒ The expanding path upsamples the input feature map and combines learned features using both convolution and up-convolution processes to produce a segmentation map. ‒ Skip connections are used to double the feature channels and concatenate the relevant feature map from the contracting path. ‒ Each pixel in the feature map should be classified separately in a segmentation map that is produced after upsampling it in the expanding path. Figure 2. Proposed system architecture 3. RESULTS AND DISCUSSION In this section, we present the experimental setup and outcomes of our method. We begin by describing the dataset used for training and evaluation purposes. The training process is then described, together with the particulars and configurations that were used. We develop appropriate evaluation metrics and apply them to our network to evaluate its performance. Carried out a few tests to evaluate our suggested model against SOTA techniques. 3.1. Dataset The MaskedFace-CelebA dataset is available to the public. This dataset is constructed from CelebA dataset using the MaskTheFace tool. This dataset is used in the field of computer vision, especially for face analysis tasks. We assess the models mentioned previously using the masked-face datasets. This dataset contains 21,844 masked face images and corresponding target or ground truth images on occluded area. Images of size are 256×256 pixels. These images are randomly divided into 17,476 (80%) images for training, 2,184 (10%) images for validation, and 2,184 (10%) images for testing from the MaskedFace- CelebA dataset. 3.2. Evaluation metric In our experiment, we use accuracy as the main parameter to determine how closely the predicted mask and ground truth match. This metric is associated with four values i.e., true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN). The formula for accuracy is shown in (1). 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑁+𝐹𝑃 (1) The number of occluded area pixels, or masked area pixels that have been correctly identified, is represented by TP. The number of background pixels that are incorrectly classified as background is known as FP. The number of occluded area pixels that are incorrectly classified as background is known as FP. The number of background pixels that are correctly classified as background is known as TN. 3.3. Experimental settings The network is trained using the input images and the segmentation maps that correspond to them. All the experiments are implemented with Tensorflow which is Python framework. An 8-core 32G+1 V100 GPU PC and a 3.8 GHz CPU are used in the experiment to train the network. Adam is employed as an
  • 5. Int J Artif Intell ISSN: 2252-8938  Detection of partially occluded area in face image using U-Net model (Jyothsna Cherapanamjeri) 1867 optimizer to adjust the occluded face image segmentation network's parameters. This work employed the following super parameters: batch size of 16, image resolution of 256, epochs of 100, and starting learning rate η of 0.05. In this manuscript, 100 was utilized for epochs, which are the number of experimental training rounds. The images used for experimental training and testing were converted into 256×256 resolution. Batch_Size is the number of samples selected for one training. 3.4. Experimental results To evaluate the efficacy of the proposed U-Net we first conduct the experiments on the MaskedFace-CelebA dataset for the task of masked area segmentation. The results of our proposed model as shown in Figure 3. The image as shown in Figure 3(a) is the masked face image which is input to our proposed model. The image as shown in Figure 3(b) is the mask which is the ground truth label. This is what our model must predicted for the given masked face image. The image as shown in Figure 3(c) predicted occluded region. The white region denotes the masked area which is occlude area and the black region denotes the no occluded area. Notice that if the mask is entirely black this means there are no occluded area deposits in the given masked face image. (a) (b) (c) Figure 3. Experimental result for U-Net model on MaskedFace-CelebA dataset of (a) the original occludes face image, (b) ground truth, and (c) predicted occluded area 3.5. Comparative study We have contrasted the obtained experimental results with those of other semantic segmentation approaches in order to assess the effectiveness of our suggested method further. The Table 1 shows the comparative analysis of various existing semantic segmentation models. This comparative analysis says that superiority of our proposed model for detection of occluded region in human face images. We noticed that our model performs 97.7% accuracy with other three methods as shown in Table 1. From the comparative study as shown in Figure 4, it is observed that proposed system provides appreciable accuracy of 97.7%. It outperforms compared to existing image segmentation algorithms. Table 1. Comparative analysis of various image segmentation techniques S. no Method Accuracy (%) 1 FCN [12] 90.3 DeepLabv3+ [24] 93.4 3 PSPNet [25] 94.2 4 The proposed model 97.7 Figure 4. Graphical representation of comparative study between proposed system and existing system approaches with reference to accuracy 86 88 90 92 94 96 98 100 Fully Convolutional Network (FCN) Deep Lab V3+ Pyramid Scene Parsing Network (PSPNet) The Proposed Model Accuracy Method
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 14, No. 3, June 2025: 1863-1869 1868 3.6. Ablation study Role of using skip connections: we investigate the effectiveness of using skip connections. Assume that a U-Net without skip connections would produce a blurry mess. If you would be interested, it could be interesting to try disabling the skip connection and see what happens. By gradually adding more features to a convolutions-based vision model, they transform it from an unstable and inaccurate model into a SOTA one. By avoiding the bottleneck and giving the model access to the encoder's intermediate activations-which include these fine-grained, high-resolution details-skip connections help solve this issue. That is what is meant by recover fine grained details in the prediction. 4. CONCLUSION AND FUTURE WORK This paper proposed a novel method for binary segmentation map on occluded area detection in face images called U-Net model. This proposed model is improved version of fully convolutional network. The experimental results show that our model efficiently segmenting occluded area in human face image, in our case occluded area is masked area. Despite the promising results on dataset. Experiments on publicly available dataset which is MaskedFace-CelebA dataset shows 97.7% segmentation accuracy. It produces high perpetual quality results compared to other SOTA image segmentation methods. The main advantage of our model has fewer parameters, limited dataset and faster in speed. This work is useful for masked face recognition in real-time. There are promising areas for further study and advancements in face mask identification, such as improving our model's performance in difficult situations like high occlusion or complex mask patterns. Enhancing the dataset for different sizes of masks and shapes. We also think that merging with facial recognition research has a lot of potential. We can increase the precision and dependability of facial recognition systems by eliminating masks as a preprocessing step. FUNDING INFORMATION Authors state no funding involved. AUTHOR CONTRIBUTIONS STATEMENT This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration. Name of Author C M So Va Fo I R D O E Vi Su P Fu Jyothsna Cherapanamjeri ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ B. Narendra Kumar Rao ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT Authors state no conflict of interest. DATA AVAILABILITY The data that support the findings of this study are openly available at https://drive.google.com/drive/folders/1EJbxfgTVHDBNvfe7KzESwJoWc8e4J2HJ?usp=share_link. REFERENCES [1] S. Pan, X. Liu, N. Xie, and Y. Chong, “EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation,” BMC Bioinformatics, vol. 24, no. 1, Mar. 2023, doi: 10.1186/s12859-023-05196-1. [2] R. Alshawi, M. T. Hoque, and M. C. Flanagin, “A depth-wise separable U-Net architecture with multiscale filters to detect sinkholes,” Remote Sensing, vol. 15, no. 5, Feb. 2023, doi: 10.3390/rs15051384. [3] H. Dong, B. Yu, W. Wu, and C. He, “Enhanced lightweight end-to-end semantic segmentation for high-resolution remote sensing images,” IEEE Access, vol. 10, pp. 70947–70954, 2022, doi: 10.1109/ACCESS.2022.3182370. [4] Z.-J. Gao, Y. He, and Y. Li, “A novel lightweight Swin-Unet network for semantic segmentation of COVID-19 lesion in CT images,” IEEE Access, vol. 11, pp. 950–962, 2023, doi: 10.1109/ACCESS.2022.3232721.
  • 7. Int J Artif Intell ISSN: 2252-8938  Detection of partially occluded area in face image using U-Net model (Jyothsna Cherapanamjeri) 1869 [5] B. Chen, Y. Liu, Z. Zhang, G. Lu, and A. W. K. Kong, “TransAttUnet: Multi-Level Attention-Guided U-Net with transformer for medical image segmentation,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 1, pp. 55–68, Feb. 2024, doi: 10.1109/TETCI.2023.3309626. [6] X. Zuo, H. Lin, D. Wang, and Z. Cui, “A method of crop seedling plant segmentation on edge information fusion model,” IEEE Access, vol. 10, pp. 95281–95293, 2022, doi: 10.1109/ACCESS.2022.3187825. [7] S. Gite, A. Mishra, and K. Kotecha, “Enhanced lung image segmentation using deep learning,” Neural Computing and Applications, vol. 35, no. 31, pp. 22839–22853, Nov. 2023, doi: 10.1007/s00521-021-06719-8. [8] H. Lu, Y. She, J. Tie, and S. Xu, “Half-UNet: a simplified U-Net architecture for medical image segmentation,” Frontiers in Neuroinformatics, vol. 16, Jun. 2022, doi: 10.3389/fninf.2022.911679. [9] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241, 2015, doi: 10.1007/978-3-319-24574-4_28. [10] J. Chen et al., “TransUNet: transformers make strong encoders for medical image segmentation,” arXiv-Computer Science, 2021. [11] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “DS-TransUNet: dual swin transformer U-Net for medical image segmentation,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–15, 2022, doi: 10.1109/TIM.2022.3178991. [12] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, Apr. 2017, doi: 10.1109/TPAMI.2016.2572683. [13] H. Li, J. Liu, and D. Wang, “A fast instance segmentation technique for log end faces based on metric learning,” Forests, vol. 14, no. 4, Apr. 2023, doi: 10.3390/f14040795. [14] S. Sahu, H. Sarma, and D. J. Bora, “Image segmentation and its different techniques: an in-depth analysis,” in 2018 International Conference on Research in Intelligent and Computing in Engineering (RICE), Aug. 2018, pp. 1–7, doi: 10.1109/RICE.2018.8509038. [15] S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3523-3542, 2022, doi: 10.1109/TPAMI.2021.3059968. [16] H. Ramadan, C. Lachqar, and H. Tairi, “A survey of recent interactive image segmentation methods,” Computational Visual Media, vol. 6, no. 4, pp. 355–384, Dec. 2020, doi: 10.1007/s41095-020-0177-5. [17] U. Sehar and M. L. Naseem, “How deep learning is empowering semantic segmentation,” Multimedia Tools and Applications, vol. 81, no. 21, pp. 30519–30544, Sep. 2022, doi: 10.1007/s11042-022-12821-3. [18] I. Ahmed, M. Ahmad, F. A. Khan, and M. Asif, “Comparison of deep-learning-based segmentation models: using top view person images,” IEEE Access, vol. 8, pp. 136361–136373, 2020, doi: 10.1109/ACCESS.2020.3011406. [19] H. Zhang, H. Sun, W. Ao, and G. Dimirovski, “A survey on instance segmentation: recent advances and challenges,” International Journal of Innovative Computing, Information and Control, vol. 17, no. 3, pp. 1041–1053, 2021, doi: 10.24507/ijicic.17.03.1041. [20] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 2980–2988, doi: 10.1109/ICCV.2017.322. [21] Musyarofah, V. Schmidt, and M. Kada, “Object detection of aerial image using mask-region convolutional neural network (mask R-CNN),” IOP Conference Series: Earth and Environmental Science, Jun. 2020, doi: 10.1088/1755-1315/500/1/012090. [22] K. Lin et al., “Face detection and segmentation based on improved mask R-CNN,” Discrete Dynamics in Nature and Society, vol. 2020, no. 1, pp. 1–11, May 2020, doi: 10.1155/2020/9242917. [23] L. Shen et al., “Fusing attention mechanism with Mask R-CNN for instance segmentation of grape cluster in the field,” Frontiers in Plant Science, vol. 13, Jul. 2022, doi: 10.3389/fpls.2022.934450. [24] C.-Y. Hsu, R. Hu, Y. Xiang, X. Long, and Z. Li, “Improving the Deeplabv3+ model with attention mechanisms applied to eye detection and segmentation,” Mathematics, vol. 10, no. 15, Jul. 2022, doi: 10.3390/math10152597. [25] C. Zhang, J. Zhao, and Y. Feng, “Research on semantic segmentation based on improved PSPNet,” in 2023 International Conference on Intelligent Perception and Computer Vision (CIPCV), May 2023, pp. 1–6, doi: 10.1109/CIPCV58883.2023.00012. BIOGRAPHIES OF AUTHORS Jyothsna Cherapanamjeri is pursuing her Ph.D. JNTUA, Anantapur, master degree in 2009 in SV University and bachelor degree in 2005 in JNTUH, Hyderabad, Andhra Pradesh, India. She is currently doing Ph.D. in JNTUA, Anantapur. Her area of interests is artificial intelligence, machine learning, computer vision, deep learning, data science, and IoT. She has 15 plus years of teaching experience. GATE qualified in 2007. APRCET qualified in 2019. She registered for Google Scholar and Research Gate for latest developments in technology. She has 5 publications in the field of artificial intelligence. She can be contacted at email: jyothsnamtech@gmail.com or jyothsna_cse@513@yahoo.com. B. Narendra Kumar Rao was obtained Bachelor Degree in Computer Science and Engineering from University of Madras, M.Tech. and Ph.D. in computer science from JNTU, Hyderabad. He has more than 22 years of experience in area of computer science and engineering which includes four years of industrial experience and sixteen years of teaching experience. Research interests include software engineering, deep learning, and embedded systems. Currently he is working as Professor and Head, Department of Computer Science and Engineering at Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College). He has 35 publications in his credit till date in reputed journals and conferences. He can be contacted at email: narendrakumarraob@gmail.com.