IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 14, No. 3, June 2025, pp. 1960~1967
ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i3.pp1960-1967  1960
Journal homepage: http://ijai.iaescore.com
Convolutional neural network based encoder-decoder for
efficient real-time object detection
Mothiram Rajasekaran1
, Chitra Sabapathy Ranganathan2
, Nagarajan Mohankumar3
,
Rajeshkumar Sampathrajan4
, Thayalagaran Merlin Inbamalar5
, Nageshvaran Nandhini6
,
Shanmugam Sujatha7
1
Senior Solution Consultant, Pine Candle Way, Saint Augustine, United States
2
Associate Vice President, Mphasis Corporation, Chandler, United States
3
Symbiosis Institute of Technology, Symbiosis International (Deemed University), Nagpur Campus, Pune, India
4
Principal Cloud Architect, McKinsey & Company, Chandler, United States
5
Department of Electronics and Instrumentation Engineering, Saveetha Engineering College, Chennai, India
6
Department of Information Technology, P. S. V College of Engineering and Technology, Krishnagiri, India
7
Department of Biomedical Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences,
Saveetha University, Chennai, India
Article Info ABSTRACT
Article history:
Received Feb 27, 2024
Revised Nov 26, 2024
Accepted Jan 27, 2025
Convolutional neural networks (CNN) are applied to a variety of computer
vision problems, such as object recognition, image classification, semantic
segmentation, and many others. One of the most important and difficult
issues in computer vision, object detection, has attracted a lot of attention
lately. Object detection validating the occurrence of the object in the picture
or video and then properly locating it for recognition. However, under
certain circumstances, such as when an item has issues like occlusion,
distortion, or small size, there may still be subpar detection performance.
This work aims to propose an efficient deep learning model with CNN and
encoder decoder for efficient object detection. The proposed model is
experimented on Microsoft Common Objects in Context (MS-COCO)
dataset and achieved mean average precision (mAP) of about 54.1% and
accuracy of 99%. The investigational outcomes amply showed that the
suggested mechanism could achieve a high detection efficiency compared
with the existing techniques and needed little computational resources.
Keywords:
Convolutional neural networks
Deep learning
Encoder-decoder
Mean average precision
MS-COCO dataset
Object detection
This is an open access article under the CC BY-SA license.
Corresponding Author:
Nagarajan Mohankumar
Symbiosis Institute of Technology, Symbiosis International (Deemed University), Nagpur Campus
Pune, India
Email: nmkprofessor@gmail.com
1. INTRODUCTION
More than 90% of human understanding is visual, and various imaging equipment are frequently
used in fields that are directly related to human activity and living [1]. The processing of photos and other
information has been successfully adopted in various industries to the ongoing growth of machine learning
algorithms. The primary research challenge in computer vision, object detection, has drawn increasing
attention from academics. The object discovery typically contains two stages: first, looking for the item in the
image; second, employing bounding boxes to find the object. Convolutional neural networks (CNN) has
become highly effective at object detection in recent years [2]–[5], region based convolutional neural
network (R-CNN) [6], YOLO [7], the spatial pyramid pooling network (SPP) [8], and Fast R-CNN [9] object
detection techniques that are used in this field of study. Due to computational hardware and data availability,
traditional object detection algorithms have significant drawbacks [10]. Conversely, with the development of
Int J Artif Intell ISSN: 2252-8938 
Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1961
artificial intelligence (AI) and processing power in recent years, the entire process can now be automated
with little to no human involvement. The primary distinction is that traditional object detection techniques
rely on human experience standards and expert judgement to extract features, whereas AI uses a sophisticated
neural network that can be trained to routinely identify powerful and judicial features.
In particular, encoder-decoder models based on fully convolutional networks (FCNs) have
significantly enhanced performance, such as semantic segmentation [11], [12], edge recognition [13] object
exposure [14], and crowd counting [15]. Essentially, the trend of popular object identification techniques are
operate within the encoder-decoder framework. For the detection task, some researchers created structures
based on the encoder-decoder paradigm and attained cutting-edge performance [16]. With regard to
benchmark datasets, CNN-based encoder-decoder models are particularly crucial for continuously improving
detection performance [17]. Convolution is done by the encoder, whereas deconvolution, un-pooling, and
up-sampling are done by the decoder to forecast pixel-wise class labels. The up-sampling decode that
corresponds the low-resolution encoder feature maps, is the important feature. This architecture employs the
encoder's pooling indicators to up-sample to map pixel-wise categorization while also significantly reducing
the number of trainable parameters. This paper is structured as follows.
The goal of object detection, which is typically done with photos or videos, is to find borders as well
as to show the object's range and location. The next step is to classify the object's category and to provide the
categorization likelihood. This task is more difficult than simple picture classification because the positions
of many items must be determined from the image or video. CNNs have been used for the detection and
classification of objects with success [18]. Current models include ways to categorise either a full input
window for each scene for a bounding box of several objects. Semantic segmentation has had a breakthrough
thanks to FCN. It has provided a potent method for boosting the effectiveness of CNNs by providing inputs
of any size [19]. The encoder-decoder-based concept that presented by [20]. It suggested for feature learning
that is unsupervised; then, neural networks backed by encoder-decoders have emerged as a potential
replacement for further aids. An intriguing pedestrian collision alert system for advanced driver assistance
systems was suggested in [21]. However, it is only capable of detecting and warning pedestrians. Facial
feature localization [22] extracted information from input strings that could only be one dimension using the
Viterbi decoding technique. Support vector machine (SVM)-based predictive modeling [23] utilised the
similar concept to expand SVM outcomes using two-dimensional maps.
As an attention generating module that learns to specifically attend to significant locations for every
pixel by employing bidirectional long short-term memory (Bi-LSTM) module within the feature maps,
paediatric intensive care audit network (PiCANet) was proposed in [24]. For C-elegans tissues with FCN
inference, coarse multi-class segmentation CNN with FCN architecture. In order to forecast pixel-level labels
and to improve the label map using conditional random field (CRF), network achieves denser score maps
using FCN architecture. One of the current major trends in CNN architecture design is the incorporation of
encoder and decoder to improve performance. Apart from these object detection models; several detection
algorithms are implemented on hardware platforms to improve the detection performance.
Pyramid scene analysis network (PSPNet) is yet another effective CNN architecture that was just
released. It is intended for prediction jobs at the pixel level. The global pyramid pooling structure that
combined global and local hints that produce the results builds the pixel-level features for effective
segmentation. Due to the PSPNet architecture's extreme complexity, training and testing processes need for a
sizable amount of processing power and graphics processing units (GPU) capabilities. The concept of
panoptic segmentation (PS) was recently introduced in a study about pixel-wise segmentation. To complete a
broad segmentation task, PS combines segmenting instances and segmentation based on semantics.
Comparatively speaking, it performs well when compared to previous visual geometry group (VGG) based
networks, although size is the design's main flaw.
The prophet algorithm, K-means clustering, and seasonal autoregressive integrated moving-average
methods act a task in enhancing the cloud infrastructures. Also, it grouping servers into clusters with similar
utilization patterns. K-means clustering enhances the resource allocation efficiency [25]. Internet of things
(IoT)-driven image recognition system utilizing CNNs to notice and quantify microplastics [26]. The data
collected by sensors is forward to a centralized monitoring system that decides whether or not an alarm
activated in the event if the situation diverge from their ideal state [27]. K-nearest neighbor (KNN) and SVM
algorithm forms a precise arrangement model to utilize the important data expectation exactness [28].
SVM with recurrent neural networks are powerful classification that makes it feasible to classify patients’
risks and predict how they will react to therapy [29]. Cloud computing grants the seizure prediction system to
improve accessible and scalable [30] and it examines the feature selection developed in for improving
accuracy [31]. Hybrid machine learning techniques like SVM with CNN algorithm to anticipates Alzheimer’s
sickness [32].
 ISSN: 2252-8938
Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967
1962
2. PROPOSED DEEP LEARNING MODEL
The proposed architecture is a pixel-wise model that built on two decoupled FCNs for encoding as
well as decoding as explained in Figure 1. The previously described encoder is built using the first 16
convolutional layers of VGG-19 network, then a batch normalised (BN) layer, function of activation, pooling
layer, as well as dropout units. The decoder network is composed of layers for upsampling, deconvolution,
activation, batch normalisation, dropout, and a multi-class classification. Every decoder is matched to a
pooling unit of an encoder in the system's overall encoder-decoder interface. Consequently, the decoder CNN
has 16 de-convolution layers. To probabilities of output class meant to each individual pixel individually, the
decoder sends its computations to softmax classifier.
Figure 1. Proposed architecture for real-time object detection
The key benefits of our suggested decoupled architecture are its simple training with various
environmental factors and ease of customization. For pixel-wise classification, the encoder creates
low-resolution feature maps, which the decoder up-samples through convolutioning the trainable filters to
yield intense feature maps [33]. The fundamental component of the suggested method is the decoding
procedure, which provides several useful advantages in terms of improving boundary delineation and
reduction. Also much improved is the ability to provide training by lowering the amount of trainable
attributes. It offers a simple training, which trains both the encoder as well as decoder at the same time.
With an input image, the network begins training and acts during the network to the top layers.
Adopting convolution with a prearranged set of filter banks to fabricate feature maps, the batch normalisation
process is fulfilled by the encoder. Afterwards, activations are accomplished by rectified linear units
(ReLUs). The max-pooling function is then fulfilled with a window size of 2x2 and a tread of 1. This
outcomes in a two-fold subsampling of the last image. Multiple pooling layers able to increase translation
invariance for effective categorization jobs, but the feature maps' spatial resolution is unnecessarily reduced.
Therefore, prior to the sub-sampling function, the boundary information needs to be recorded and
stored in the encoder feature maps. However, it is not practical to save the entire encoder feature maps
because to memory limitations. The best option is keep the max-pooling indicators in storage. For each 2x2
pooling window, two bits are used to memorise the positions of each max-pooling feature-map. Having a lot
of feature maps on hand is a really effective solution. With this approach, the encoder can store data much
more efficiently and fully connected layers can be dropped.
3. RESULTS AND DISCUSSION
The MS-COCO dataset that consist of 91 item with 2.5 million labelled examples in 328k images, is
used to train the proposed object detection algorithm. On a single 12 GB NVIDIA Tesla K40c GPU, the
suggested network was trained. The network is trained until the accuracy as well as loss do not significantly
grow or decrease and the loss has converged. The whole network is established and trained utilizing the Caffe
Berkeley Vision Library. Caffe provides a flexibility while it relates creating network layers as well as
training the network to meet the suggested specifications. Thus, once converges, it is trained, and no
considerable reduce in training loss is seen. The entire results are then evaluated, examined, and subsequently
Int J Artif Intell ISSN: 2252-8938 
Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1963
contrasted with the specified benchmark results.The dataset is divide by training and testing. Here, 90% is
allotted for training and 10% is allotted for testing.
Many weights are 0 because training models frequently use the ReLU activation function. In this
work, it was found that after creating the sparsity model, the gradient vanished during training with ReLU6.
This is as a result of the mask excluding 50% of the weights from the gradient update. As indicated in Table 1,
the public dataset MS-COCO evaluated and contrasted with the earlier techniques. In this work, there are 5 k
and 118 k photos are utilised for testing and training the model, respectively. To ensure that the suggested
method works, the outcomes of each trial were examined. For all classes, average precision (AP) is typically
determined, and its middling is known as the mean average precision (mAP). Additionally, for AP75
candidate images, regions with above 75% accuracy are counted, and the AP50 designates the 50% area
properly. Figure 2 shows the multi-object detection results received via training model on MS-COCO dataset.
Figure 2. Screenshot formulti-object detection of complex scenes using proposed model trained on
MS-COCO dataset
For complex scenes, the proposed CNN based encoder decoder model achieved better detection
performance. The detection results include various objects such as horse, potted plant and person as shown in
Figure 3. For this detection, floating point operations per second (FLOPs) is about 128.46 with model size is
134.22 MB. Figure 3 illustrates the detection of multiple objects on MS-COCO dataset using proposed
model. There are various objects are detected from sample complex images in MS-COCO dataset.
Figure 3. Results for object detection of complex scenes using proposed model trained on MS-COCO dataset
The proposed model achieved mAP of 54.1% at 327 FPS as shown in Table 1. With the help of this
investigation, the model's performance in real-time was guaranteed. MS-COCO dataset contains the FPS
value is 327, the percentage of mAP value is 54.1%, AP50 value is 77.2% and AP75 value is 69.3%. Table 2
demonstrates the comparative results of proposed model with existing approaches. Figure 4 explains the
execution analysis of single-shot detector (SSD), YOLOv3, EfficientDet, YOLOv4 tiny, RetinaNet, and
proposed CNN-based encoder decoder model for object detection. Compare to all other models, the proposed
model has provided better results.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967
1964
Table 1. Results for proposed object detection model
Dataset FPS mAP (%) AP50 (%) AP75 (%)
MS-COCO 327 54.1 77.2 69.3
Table 2. Comparison of proposed CNN-based object detection model with existing algorithms
Model Architecture AP75 (%) AP50 (%) mAP (%) FPS
SSD VGG 30.3 48.5 28.8 36
YOLOv3 Darknet-53 34.3 58 33 66
EfficientDet EfficientNet 35.8 52.2 33.8 16
YOLOv4 tiny CSPNet-15 20 40 22 330
RetinaNet ResNet101 36.8 53.1 34.4 11
Proposed VGG-19 69.3 77.2 54.1 327
Figure 4. Performance analysis of existing approaches with proposed detection model
The outcomes of every experimentation are examined to confirm the efficiency of the proposed
method. For assessment, AP is utilized, that concerns to the region under the precision-recall curve. Usually,
AP is computed for all classes, and its average is determined as the mAP. In addition, the AP50 denotes to
the 50% region correctly detected in comparison to the ground truth, and for AP75 candidate images over
75% parts are considered. This study assured the operation of the model for real-time applications with a
good recognition accurateness.
4. CONCLUSION
We have noticed that recent efforts on object detection using CNN-based encoder-decoder models
have addressed salient object detection (SOD) as a classification task at the pixel level. The proposed method
was demonstrated through experimental findings on the open-source MS-COCO 2017 dataset to be capable
of good detection accuracy and quick execution. The objective of this work going forward is to significantly
enhance multiple object detection for high quality images without sacrificing prediction speed. It employs the
unique technique of pooling indices as well, which uses fewer processing parameters and speeds up
inference.With a mAP of 54.1 and 327 FPS, the suggested network model is highly suited for multiple object
identification. To sum up, the model's ease of training and the proposed method's low computational resource
requirements are its key features. As a result, the suggested approach is practical for many real-time
applications and offers a more economical alternative. Overall, the suggested method results in a system for
cutting-edge auto driving systems that is more affordable and more effective.
FUNDING INFORMATION
Funding information is not available.
AUTHOR CONTRIBUTIONS STATEMENT
This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author
contributions, reduce authorship disputes, and facilitate collaboration.
0
20
40
60
80
100
SSD YOLOv3 EfficientDet YOLOv4 tiny RetinaNet Proposed
Percentage
(%)
Models
Performance Comparison
AP75 (%) AP50 (%) mAP (%)
Int J Artif Intell ISSN: 2252-8938 
Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1965
Name of Author C M So Va Fo I R D O E Vi Su P Fu
Mothiram Rajasekaran        
Chitra Sabapathy Ranganathan          
Nagarajan Mohankumar          
Rajeshkumar Sampathrajan        
Thayalagaran Merlin Inbamalar        
Nageshvaran Nandhini       
Shanmugam Sujatha       
C : Conceptualization
M : Methodology
So : Software
Va : Validation
Fo : Formal analysis
I : Investigation
R : Resources
D : Data Curation
O : Writing - Original Draft
E : Writing - Review & Editing
Vi : Visualization
Su : Supervision
P : Project administration
Fu : Funding acquisition
CONFLICT OF INTEREST STATEMENT
The authors have no conflict of interest relevant to this paper.
DATA AVAILABILITY
The data that support the findings of this study are available on request from the corresponding
author, [NM]. The data, which contain information that could compromise the privacy of research
participants, are not publicly available due to certain restrictions.
REFERENCES
[1] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: a survey,” Proceedings of the IEEE, vol. 111, no. 3,
pp. 257–276, 2023, doi: 10.1109/JPROC.2023.3238524.
[2] Z. Li et al., “Deep learning-based object detection techniques for remote sensing images: a survey,” Remote Sensing, vol. 14,
no. 10, 2022, doi: 10.3390/rs14102385.
[3] J. Jegan, M. R. Suguna, M. Shobana, H. Azath, S. Murugan, and M. Rajmohan, “IoT-enabled black box for driver behavior
analysis using cloud computing,” in 2024 International Conference on Advances in Data Engineering and Intelligent Computing
Systems (ADICS), 2024, pp. 1–6, doi: 10.1109/ADICS58448.2024.10533471.
[4] K. Muhammad, J. Ahmad, Z. Lv, P. Bellavista, P. Yang, and S. W. Baik, “Efficient deep CNN-based fire detection and
localization in video surveillance applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 7,
pp. 1419–1434, 2019, doi: 10.1109/TSMC.2018.2830099.
[5] J.-M. Guo, J.-S. Yang, S. Seshathiri, and H.-W. Wu, “A light-weight CNN for object detection with sparse model and knowledge
distillation,” Electronics, vol. 11, no. 4, Feb. 2022, doi: 10.3390/electronics11040575.
[6] S. Srinivasan, R. Raja, C. Jehan, S. Murugan, C. Srinivasan, and M. Muthulekshmi, “IoT-enabled facial recognition for smart
hospitality for contactless guest services and identity verification,” in 2024 11th International Conference on Reliability, Infocom
Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024, pp. 1–6, doi:
10.1109/ICRITO61523.2024.10522363.
[7] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv-Computer Science, pp. 1–6, 2018, doi:
10.48550/arXiv.1804.02767.
[8] Y. H. Wu, Y. Liu, X. Zhan, and M. M. Cheng, “P2T: pyramid pooling transformer for scene understanding,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 12760–12771, 2023, doi: 10.1109/TPAMI.2022.3202765.
[9] H. Jiang and E. Learned-Miller, “Face detection with the faster R-CNN,” in 2017 12th IEEE International Conference on
Automatic Face & Gesture Recognition (FG 2017), May 2017, pp. 650–657, doi: 10.1109/FG.2017.82.
[10] Z. Wang, J. Zhu, S. Fu, S. Mao, and Y. Ye, “RFPNet: Reorganizing feature pyramid networks for medical image segmentation,”
Computers in Biology and Medicine, vol. 163, 2023, doi: 10.1016/j.compbiomed.2023.107108.
[11] A. Tragakis, C. Kaul, R. Murray-Smith, and D. Husmeier, “The fully convolutional transformer for medical image segmentation,”
in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3649–3658, doi:
10.1109/WACV56688.2023.00365.
[12] J. Ramasamy, E. Srividhya, V. Vaidehi, S. Vimaladevi, N. Mohankumar, and S. Murugan, “Cloud-enabled isolation forest for
anomaly detection in UAV-based power line inspection,” in 2024 2nd International Conference on Networking and
Communications (ICNWC), 2024, pp. 1–6, doi: 10.1109/ICNWC60771.2024.10537407.
[13] D. Bai, X. Zheng, T. Liu, K. Li, and J. Yang, “Finger disability recognition based on holistically-nested edge detection,” in
Intelligent Robotics and Applications, 2022, pp. 146–154, doi: 10.1007/978-3-031-13844-7_15.
[14] M. R. Sudha et al., “Predictive modeling for healthcare worker well-being with cloud computing and machine learning for stress
management,” International Journal of Electrical and Computer Engineering, vol. 15, no. 1, pp. 1218–1228, 2025, doi:
10.11591/ijece.v15i1.pp1218-1228.
[15] Y. Xie, Y. Lu, and S. Wang, “RSANet: deep recurrent scale-aware network for crowd counting,” Proceedings - International
Conference on Image Processing, ICIP, pp. 1531–1535, 2020, doi: 10.1109/ICIP40778.2020.9191086.
[16] I. Filali, M. S. Allili, and N. Benblidia, “Multi-scale salient object detection using graph ranking and global–local saliency
refinement,” Signal Processing: Image Communication, vol. 47, pp. 380–401, 2016, doi: 10.1016/j.image.2016.07.007.
[17] Z. Wu, G. Allibert, F. Meriaudeau, C. Ma, and C. Demonceaux, “HiDAnet: RGB-D salient object detection via hierarchical depth
awareness,” IEEE Transactions on Image Processing, vol. 32, pp. 2160–2173, 2023, doi: 10.1109/TIP.2023.3263111.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967
1966
[18] P. Maheswari, S. Gowriswari, S. Balasubramani, A. R. Babu, N. K. Jijith, and S. Murugan, “Intelligent headlights for adapting beam
patterns with raspberry pi and convolutional neural networks,” in 2024 2nd International Conference on Device Intelligence,
Computing and Communication Technologies (DICCT), 2024, pp. 182–187, doi: 10.1109/DICCT61038.2024.10533159.
[19] J. Hai, Y. Hao, F. Zou, F. Lin, and S. Han, “Advanced RetinexNet: A fully convolutional network for low-light image
enhancement,” Signal Processing: Image Communication, vol. 112, 2023, doi: 10.1016/j.image.2022.116916.
[20] D. Stavens and S. Thrun, “Unsupervised learning of invariant features using video,” Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, pp. 1649–1656, 2010, doi: 10.1109/CVPR.2010.5539773.
[21] C. C. Sekhar, K. Vijayalakshmi, A. S. Rao, V. Vedanarayanan, M. B. Sahaai, and S. Murugan, “Cloud-based water tank
management and control system,” in 2023 2nd International Conference on Smart Technologies for Smart Nation, SmartTechCon
2023, 2023, pp. 641–646, doi: 10.1109/SmartTechCon57526.2023.10391730.
[22] S. M. Hanif, L. Prevost, R. Belaroussi, and M. Milgram, “Real-time facial feature localization by combining space displacement
neural networks,” Pattern Recognition Letters, vol. 29, no. 8, pp. 1094–1104, 2008, doi: 10.1016/j.patrec.2007.09.016.
[23] B. J. Ganesh, P. Vijayan, V. Vaidehi, S. Murugan, R. Meenakshi, and M. Rajmohan, “SVM-based predictive modeling of
drowsiness in hospital staff for occupational safety solution via IoT infrastructure,” in 2024 2nd International Conference on
Computer, Communication and Control (IC4), 2024, pp. 1–5, doi: 10.1109/IC457434.2024.10486429.
[24] N. Liu, J. Han, and M. H. Yang, “PiCANet: pixel-wise contextual attention learning for accurate saliency detection,” IEEE
Transactions on Image Processing, vol. 29, pp. 6438–6451, 2020, doi: 10.1109/TIP.2020.2988568.
[25] A. R. Rathinam, B. S. Vathani, A. Komathi, J. Lenin, B. Bharathi, and S. M. Urugan, “Advances and predictions in predictive
auto-scaling and maintenance algorithms for cloud computing,” 2nd International Conference on Automation, Computing and
Renewable Systems, ICACRS 2023 - Proceedings, pp. 395–400, 2023, doi: 10.1109/ICACRS58579.2023.10404186.
[26] M. D. A. Hasan, K. Balasubadra, G. Vadivel, N. Arunfred, M. V. Ishwarya, and S. Murugan, “IoT-driven image recognition for
microplastic analysis in water systems using convolutional neural networks,” in 2024 2nd International Conference on Computer,
Communication and Control (IC4), 2024, pp. 1–6, doi: 10.1109/IC457434.2024.10486490.
[27] S. Selvarasu, K. Bashkaran, K. Radhika, S. Valarmathy, and S. Murugan, “IoT-enabled medication safety: real-time temperature
and storage monitoring for enhanced medication quality in hospitals,” 2nd International Conference on Automation, Computing
and Renewable Systems, ICACRS 2023 - Proceedings, pp. 256–261, 2023, doi: 10.1109/ICACRS58579.2023.10405212.
[28] K. Padmanaban, A. M. S. Kumar, H. Azath, A. K. Velmurugan, and M. Subbiah, “Hybrid data mining technique based breast
cancer prediction,” AIP Conference Proceedings, vol. 2523, 2023, doi: 10.1063/5.0110216.
[29] N. Mohankumar et al., “Advancing chronic pain relief cloud-based remote management with machine learning in healthcare,”
Indonesian Journal of Electrical Engineering and Computer Science, vol. 37, no. 2, pp. 1042–1052, 2025, doi:
10.11591/ijeecs.v37.i2.pp1042-1052.
[30] M. Vadivel, V. B. Marin, S. Balasubramani, S. Hemalatha, S. Murugan, and S. Velmurugan, “Cloud-based passenger experience
management in bus fare ticketing systems using random forest algorithm,” in 2024 11th International Conference on Reliability,
Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024, pp. 1–6, doi:
10.1109/ICRITO61523.2024.10522226.
[31] M. P. Aarthi, C. M. Reddy, A. Anbarasi, N. Mohankumar, M. V. Ishwarya, and S. Murugan, “Cloud-based road safety for real-
time vehicle rash driving alerts with random forest algorithm,” in 2024 3rd International Conference for Innovation in
Technology (INOCON), 2024, pp. 1–6, doi: 10.1109/INOCON60754.2024.10511316.
[32] M. S. Kumar, H. Azath, A. K. Velmurugan, K. Padmanaban, and M. Subbiah, “Prediction of Alzheimer’s disease using hybrid
machine learning technique,” AIP Conference Proceedings, vol. 2523, 2023, doi: 10.1063/5.0110283.
[33] E. P. Kannan and T. V. Chithra, “Lagrange interpolation for natural colour image demosaicing,” International Journal of
Advances in Signal and Image Sciences, vol. 7, no. 2, pp. 21–30, 2021, doi: 10.29284/ijasis.7.2.2021.21-30.
BIOGRAPHIES OF AUTHORS
Mothiram Rajasekaran is an accomplished IT leader with over 13 years of
experience in big data and cloud solutions. He is recognized for his expertise in Apache
Spark, Hadoop, Hive, Impala, and other cutting-edge technologies. He is proficiency in AI
and machine learning enables him to deliver data-driven insights and innovative solutions. He
excels in collaborating directly with clients, leading the design and implementation of data
and application migrations to private and public clouds. He is leadership accelerates the
adoption of emerging features and ensures the delivery of advanced data solutions for
organizations. He can be contacted at email: mothiramrajasekaransekar@gmail.com.
Chitra Sabapathy Ranganathan is (Client Partner | Account Management | IT
Transformation Strategy | Digital Engineering Solutions & Advisory | Agile Delivery
Adoption | Sales | CoE & CoP Setup. Results-oriented, accomplished business technology
leader with 23+ years of experience in software engineering and design. Proven track record
of conceptualizing, architecting, and delivering reliable and scalable systems in a variety of
areas comprising multi-technologies including cloud, big data, AI, ML, advance analytics,
blockchain, mainframe, and business intelligence. Executed complex engagements across
multiple verticals, manage sales, IT Delivery and Operations, established vision, strategy, and
journey maps that align with business priorities. Enterprise leader in digital engineering
solutions & advisory, agile delivery adoption & management, pre-sales, CoE & CoP setup, IT
transformation strategy, enterprise quality & digital assurance. He can be contacted at email:
chitrasabapathyranganathan@gmail.com.
Int J Artif Intell ISSN: 2252-8938 
Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran)
1967
Dr. Nagarajan Mohankumar was born in India in 1978. He received his B.E.
degree from Bharathiyar University, Tamilnadu, India in 2000 and M.E. and Ph.D degree
from Jadavpur University, Kolkata in 2004 & 2010. He joined the Nano Device Simulation
Laboratory in 2007 and worked as a Senior Research Fellow under CSIR direct Scheme till
September 2009. Later he joined SKP Engineering College as a Professor to develop research
activities in the field of VLSI and NANO technology. He is currently working as a
ResearchProfessor at Symbiosis Institute of Technology, Nagpur Campus, Symbiosis
(International) Deemed University, Pune, India. He is a senior member of IEEE. He has about
85 international journal publications in reputed journals and about 50 international conference
proceedings. He received the carrier award for young teachers (CAYT) from AICTE, New
Delhi in the year of 2012-2014. His research interest includes modeling and simulation study
of HEMTs, optimization of devices for RF applications and characterization of advanced
HEMT architecture, terahertz electronics, high frequency imaging, sensors and
communication. He can be contacted at email: nmkprofessor@gmail.com.
Rajeshkumar Sampathrajan is a Principal Cloud Architect at McKinsey, where
he leads a team of engineers and architects in designing and building highly scalable,
resilient, and distributed systems using the latest cloud native technology in Google Cloud
Platform (GCP). He has over 17 years of experience in the IT industry, spanning various
domains such as banking, retail, healthcare, and consulting. With multiple GCP certifications,
as well as credentials in Azure, Snowflake, HashiCorp, Teradata, Cloudera, and ITIL. She is
an expert in cloud computing, big data, machine learning, and security. He has successfully
delivered solutions for complex and large-scale data analytics, data engineering, and data
science projects, leveraging GCP BigQuery, Vertex AI, Dataiku, and other tools. He is
passionate about helping clients transform their businesses with data-driven insights and
innovative solutions. He can be contacted at email: rajesampathrajan@gmail.com.
Dr. Thayalagaran Merlin Inbamalar serves as an Associate Professor in the
Department of Electronics and Instrumentation Engineering at Saveetha Engineering College,
Chennai, India. She earned her B.E. in electronics and instrumentation engineering in 2006
from Karunya Institute of Technology, Coimbatore, affiliated with Anna University, and her
M.E. in applied electronics in 2008 from St. Joseph’s College of Engineering, Chennai, also
affiliated with Anna University. She completed her Ph.D. at Anna University in 2024. With
16 years of teaching experience, she has contributed to numerous national and international
journals, as well as patents. Her areas of expertise include image processing, instrumentation,
and control. She can be contacted at email: merlininbamalar@gmail.com.
Nageshvaran Nandhini has done her B.Tech. (information technology) at
Kongu Engineering College, Erode and M.E. (Computer Science and Engineering) at Perumal
Manimekalai College of Engineering, Hosur. She has started her teaching career in the year
2015 and she has more than 7 years of teaching experience. She has organized seminars,
workshops for the benefit of students and guided several students’ projects. She has published
4 papers in international journals and presented 5 papers in various national / international
conferences. She has authored 3 books and book chapters. She is the member of computer
society of India. She is currently serving as an Assistant Professor in the Department of
Information Technology, P.S.V. College of Engineering and Technology, Krishnagiri,
Tamilnadu, India. She can be contacted at email: nandhini065@gmail.com.
Shanmugam Sujatha is an adjunct professor, Saveetha School of Engineering,
Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu, India. She
published her research articles in many international and national conferences and journals.
Her research areas include network security and machine learning. She can be contacted at
email: sujathasmvr@gmail.com.

More Related Content

PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Customer segmentation using association rule mining on retail transaction data
PDF
Averaged bars for cryptocurrency price forecasting across different horizons
PDF
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
PDF
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
PDF
Assured time series forecasting using inertial measurement unit, neural netwo...
Comparative analysis of machine learning models for fake news detection in so...
Enhancing plagiarism detection using data pre-processing and machine learning...
Improvisation in detection of pomegranate leaf disease using transfer learni...
Customer segmentation using association rule mining on retail transaction data
Averaged bars for cryptocurrency price forecasting across different horizons
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
Assured time series forecasting using inertial measurement unit, neural netwo...

More from IAESIJAI (20)

PDF
Detection of partially occluded area in face image using U-Net model
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Novel artificial intelligence-based ensemble learning for optimized software ...
PDF
GradeZen: automated grading ecosystem using deep learning for educational ass...
PDF
Leveraging artificial intelligence through long short-term memory approach fo...
PDF
Application of the adaptive neuro-fuzzy inference system for prediction of th...
PDF
Novel preemptive intelligent artificial intelligence-model for detecting inco...
PDF
Techniques of Quran reciters recognition: a review
PDF
ApDeC: A rule generator for Alzheimer's disease prediction
PDF
Exploring patient-patient interactions graphs by network analysis
PDF
Review on class imbalance techniques to strengthen model prediction
PDF
Artificial intelligence multilingual image-to-speech for accessibility and te...
PDF
Comprehensive survey of automated plant leaf disease identification technique...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Graph-based methods for transaction databases: a comparative study
PDF
Developing a website for English-speaking practice to English as a foreign la...
Detection of partially occluded area in face image using U-Net model
Flame analysis and combustion estimation using large language and vision assi...
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
The influence of sentiment analysis in enhancing early warning system model f...
Novel artificial intelligence-based ensemble learning for optimized software ...
GradeZen: automated grading ecosystem using deep learning for educational ass...
Leveraging artificial intelligence through long short-term memory approach fo...
Application of the adaptive neuro-fuzzy inference system for prediction of th...
Novel preemptive intelligent artificial intelligence-model for detecting inco...
Techniques of Quran reciters recognition: a review
ApDeC: A rule generator for Alzheimer's disease prediction
Exploring patient-patient interactions graphs by network analysis
Review on class imbalance techniques to strengthen model prediction
Artificial intelligence multilingual image-to-speech for accessibility and te...
Comprehensive survey of automated plant leaf disease identification technique...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Graph-based methods for transaction databases: a comparative study
Developing a website for English-speaking practice to English as a foreign la...
Ad

Recently uploaded (20)

PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
CloudStack 4.21: First Look Webinar slides
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
August Patch Tuesday
PDF
Getting Started with Data Integration: FME Form 101
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Architecture types and enterprise applications.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
STKI Israel Market Study 2025 version august
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Five Habits of High-Impact Board Members
PPT
What is a Computer? Input Devices /output devices
1 - Historical Antecedents, Social Consideration.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Getting started with AI Agents and Multi-Agent Systems
CloudStack 4.21: First Look Webinar slides
WOOl fibre morphology and structure.pdf for textiles
August Patch Tuesday
Getting Started with Data Integration: FME Form 101
O2C Customer Invoices to Receipt V15A.pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Benefits of Physical activity for teenagers.pptx
Assigned Numbers - 2025 - Bluetooth® Document
A contest of sentiment analysis: k-nearest neighbor versus neural network
Architecture types and enterprise applications.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
STKI Israel Market Study 2025 version august
DP Operators-handbook-extract for the Mautical Institute
Web Crawler for Trend Tracking Gen Z Insights.pptx
A novel scalable deep ensemble learning framework for big data classification...
Five Habits of High-Impact Board Members
What is a Computer? Input Devices /output devices
Ad

Convolutional neural network based encoder-decoder for efficient real-time object detection

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 14, No. 3, June 2025, pp. 1960~1967 ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i3.pp1960-1967  1960 Journal homepage: http://ijai.iaescore.com Convolutional neural network based encoder-decoder for efficient real-time object detection Mothiram Rajasekaran1 , Chitra Sabapathy Ranganathan2 , Nagarajan Mohankumar3 , Rajeshkumar Sampathrajan4 , Thayalagaran Merlin Inbamalar5 , Nageshvaran Nandhini6 , Shanmugam Sujatha7 1 Senior Solution Consultant, Pine Candle Way, Saint Augustine, United States 2 Associate Vice President, Mphasis Corporation, Chandler, United States 3 Symbiosis Institute of Technology, Symbiosis International (Deemed University), Nagpur Campus, Pune, India 4 Principal Cloud Architect, McKinsey & Company, Chandler, United States 5 Department of Electronics and Instrumentation Engineering, Saveetha Engineering College, Chennai, India 6 Department of Information Technology, P. S. V College of Engineering and Technology, Krishnagiri, India 7 Department of Biomedical Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, India Article Info ABSTRACT Article history: Received Feb 27, 2024 Revised Nov 26, 2024 Accepted Jan 27, 2025 Convolutional neural networks (CNN) are applied to a variety of computer vision problems, such as object recognition, image classification, semantic segmentation, and many others. One of the most important and difficult issues in computer vision, object detection, has attracted a lot of attention lately. Object detection validating the occurrence of the object in the picture or video and then properly locating it for recognition. However, under certain circumstances, such as when an item has issues like occlusion, distortion, or small size, there may still be subpar detection performance. This work aims to propose an efficient deep learning model with CNN and encoder decoder for efficient object detection. The proposed model is experimented on Microsoft Common Objects in Context (MS-COCO) dataset and achieved mean average precision (mAP) of about 54.1% and accuracy of 99%. The investigational outcomes amply showed that the suggested mechanism could achieve a high detection efficiency compared with the existing techniques and needed little computational resources. Keywords: Convolutional neural networks Deep learning Encoder-decoder Mean average precision MS-COCO dataset Object detection This is an open access article under the CC BY-SA license. Corresponding Author: Nagarajan Mohankumar Symbiosis Institute of Technology, Symbiosis International (Deemed University), Nagpur Campus Pune, India Email: nmkprofessor@gmail.com 1. INTRODUCTION More than 90% of human understanding is visual, and various imaging equipment are frequently used in fields that are directly related to human activity and living [1]. The processing of photos and other information has been successfully adopted in various industries to the ongoing growth of machine learning algorithms. The primary research challenge in computer vision, object detection, has drawn increasing attention from academics. The object discovery typically contains two stages: first, looking for the item in the image; second, employing bounding boxes to find the object. Convolutional neural networks (CNN) has become highly effective at object detection in recent years [2]–[5], region based convolutional neural network (R-CNN) [6], YOLO [7], the spatial pyramid pooling network (SPP) [8], and Fast R-CNN [9] object detection techniques that are used in this field of study. Due to computational hardware and data availability, traditional object detection algorithms have significant drawbacks [10]. Conversely, with the development of
  • 2. Int J Artif Intell ISSN: 2252-8938  Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran) 1961 artificial intelligence (AI) and processing power in recent years, the entire process can now be automated with little to no human involvement. The primary distinction is that traditional object detection techniques rely on human experience standards and expert judgement to extract features, whereas AI uses a sophisticated neural network that can be trained to routinely identify powerful and judicial features. In particular, encoder-decoder models based on fully convolutional networks (FCNs) have significantly enhanced performance, such as semantic segmentation [11], [12], edge recognition [13] object exposure [14], and crowd counting [15]. Essentially, the trend of popular object identification techniques are operate within the encoder-decoder framework. For the detection task, some researchers created structures based on the encoder-decoder paradigm and attained cutting-edge performance [16]. With regard to benchmark datasets, CNN-based encoder-decoder models are particularly crucial for continuously improving detection performance [17]. Convolution is done by the encoder, whereas deconvolution, un-pooling, and up-sampling are done by the decoder to forecast pixel-wise class labels. The up-sampling decode that corresponds the low-resolution encoder feature maps, is the important feature. This architecture employs the encoder's pooling indicators to up-sample to map pixel-wise categorization while also significantly reducing the number of trainable parameters. This paper is structured as follows. The goal of object detection, which is typically done with photos or videos, is to find borders as well as to show the object's range and location. The next step is to classify the object's category and to provide the categorization likelihood. This task is more difficult than simple picture classification because the positions of many items must be determined from the image or video. CNNs have been used for the detection and classification of objects with success [18]. Current models include ways to categorise either a full input window for each scene for a bounding box of several objects. Semantic segmentation has had a breakthrough thanks to FCN. It has provided a potent method for boosting the effectiveness of CNNs by providing inputs of any size [19]. The encoder-decoder-based concept that presented by [20]. It suggested for feature learning that is unsupervised; then, neural networks backed by encoder-decoders have emerged as a potential replacement for further aids. An intriguing pedestrian collision alert system for advanced driver assistance systems was suggested in [21]. However, it is only capable of detecting and warning pedestrians. Facial feature localization [22] extracted information from input strings that could only be one dimension using the Viterbi decoding technique. Support vector machine (SVM)-based predictive modeling [23] utilised the similar concept to expand SVM outcomes using two-dimensional maps. As an attention generating module that learns to specifically attend to significant locations for every pixel by employing bidirectional long short-term memory (Bi-LSTM) module within the feature maps, paediatric intensive care audit network (PiCANet) was proposed in [24]. For C-elegans tissues with FCN inference, coarse multi-class segmentation CNN with FCN architecture. In order to forecast pixel-level labels and to improve the label map using conditional random field (CRF), network achieves denser score maps using FCN architecture. One of the current major trends in CNN architecture design is the incorporation of encoder and decoder to improve performance. Apart from these object detection models; several detection algorithms are implemented on hardware platforms to improve the detection performance. Pyramid scene analysis network (PSPNet) is yet another effective CNN architecture that was just released. It is intended for prediction jobs at the pixel level. The global pyramid pooling structure that combined global and local hints that produce the results builds the pixel-level features for effective segmentation. Due to the PSPNet architecture's extreme complexity, training and testing processes need for a sizable amount of processing power and graphics processing units (GPU) capabilities. The concept of panoptic segmentation (PS) was recently introduced in a study about pixel-wise segmentation. To complete a broad segmentation task, PS combines segmenting instances and segmentation based on semantics. Comparatively speaking, it performs well when compared to previous visual geometry group (VGG) based networks, although size is the design's main flaw. The prophet algorithm, K-means clustering, and seasonal autoregressive integrated moving-average methods act a task in enhancing the cloud infrastructures. Also, it grouping servers into clusters with similar utilization patterns. K-means clustering enhances the resource allocation efficiency [25]. Internet of things (IoT)-driven image recognition system utilizing CNNs to notice and quantify microplastics [26]. The data collected by sensors is forward to a centralized monitoring system that decides whether or not an alarm activated in the event if the situation diverge from their ideal state [27]. K-nearest neighbor (KNN) and SVM algorithm forms a precise arrangement model to utilize the important data expectation exactness [28]. SVM with recurrent neural networks are powerful classification that makes it feasible to classify patients’ risks and predict how they will react to therapy [29]. Cloud computing grants the seizure prediction system to improve accessible and scalable [30] and it examines the feature selection developed in for improving accuracy [31]. Hybrid machine learning techniques like SVM with CNN algorithm to anticipates Alzheimer’s sickness [32].
  • 3.  ISSN: 2252-8938 Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967 1962 2. PROPOSED DEEP LEARNING MODEL The proposed architecture is a pixel-wise model that built on two decoupled FCNs for encoding as well as decoding as explained in Figure 1. The previously described encoder is built using the first 16 convolutional layers of VGG-19 network, then a batch normalised (BN) layer, function of activation, pooling layer, as well as dropout units. The decoder network is composed of layers for upsampling, deconvolution, activation, batch normalisation, dropout, and a multi-class classification. Every decoder is matched to a pooling unit of an encoder in the system's overall encoder-decoder interface. Consequently, the decoder CNN has 16 de-convolution layers. To probabilities of output class meant to each individual pixel individually, the decoder sends its computations to softmax classifier. Figure 1. Proposed architecture for real-time object detection The key benefits of our suggested decoupled architecture are its simple training with various environmental factors and ease of customization. For pixel-wise classification, the encoder creates low-resolution feature maps, which the decoder up-samples through convolutioning the trainable filters to yield intense feature maps [33]. The fundamental component of the suggested method is the decoding procedure, which provides several useful advantages in terms of improving boundary delineation and reduction. Also much improved is the ability to provide training by lowering the amount of trainable attributes. It offers a simple training, which trains both the encoder as well as decoder at the same time. With an input image, the network begins training and acts during the network to the top layers. Adopting convolution with a prearranged set of filter banks to fabricate feature maps, the batch normalisation process is fulfilled by the encoder. Afterwards, activations are accomplished by rectified linear units (ReLUs). The max-pooling function is then fulfilled with a window size of 2x2 and a tread of 1. This outcomes in a two-fold subsampling of the last image. Multiple pooling layers able to increase translation invariance for effective categorization jobs, but the feature maps' spatial resolution is unnecessarily reduced. Therefore, prior to the sub-sampling function, the boundary information needs to be recorded and stored in the encoder feature maps. However, it is not practical to save the entire encoder feature maps because to memory limitations. The best option is keep the max-pooling indicators in storage. For each 2x2 pooling window, two bits are used to memorise the positions of each max-pooling feature-map. Having a lot of feature maps on hand is a really effective solution. With this approach, the encoder can store data much more efficiently and fully connected layers can be dropped. 3. RESULTS AND DISCUSSION The MS-COCO dataset that consist of 91 item with 2.5 million labelled examples in 328k images, is used to train the proposed object detection algorithm. On a single 12 GB NVIDIA Tesla K40c GPU, the suggested network was trained. The network is trained until the accuracy as well as loss do not significantly grow or decrease and the loss has converged. The whole network is established and trained utilizing the Caffe Berkeley Vision Library. Caffe provides a flexibility while it relates creating network layers as well as training the network to meet the suggested specifications. Thus, once converges, it is trained, and no considerable reduce in training loss is seen. The entire results are then evaluated, examined, and subsequently
  • 4. Int J Artif Intell ISSN: 2252-8938  Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran) 1963 contrasted with the specified benchmark results.The dataset is divide by training and testing. Here, 90% is allotted for training and 10% is allotted for testing. Many weights are 0 because training models frequently use the ReLU activation function. In this work, it was found that after creating the sparsity model, the gradient vanished during training with ReLU6. This is as a result of the mask excluding 50% of the weights from the gradient update. As indicated in Table 1, the public dataset MS-COCO evaluated and contrasted with the earlier techniques. In this work, there are 5 k and 118 k photos are utilised for testing and training the model, respectively. To ensure that the suggested method works, the outcomes of each trial were examined. For all classes, average precision (AP) is typically determined, and its middling is known as the mean average precision (mAP). Additionally, for AP75 candidate images, regions with above 75% accuracy are counted, and the AP50 designates the 50% area properly. Figure 2 shows the multi-object detection results received via training model on MS-COCO dataset. Figure 2. Screenshot formulti-object detection of complex scenes using proposed model trained on MS-COCO dataset For complex scenes, the proposed CNN based encoder decoder model achieved better detection performance. The detection results include various objects such as horse, potted plant and person as shown in Figure 3. For this detection, floating point operations per second (FLOPs) is about 128.46 with model size is 134.22 MB. Figure 3 illustrates the detection of multiple objects on MS-COCO dataset using proposed model. There are various objects are detected from sample complex images in MS-COCO dataset. Figure 3. Results for object detection of complex scenes using proposed model trained on MS-COCO dataset The proposed model achieved mAP of 54.1% at 327 FPS as shown in Table 1. With the help of this investigation, the model's performance in real-time was guaranteed. MS-COCO dataset contains the FPS value is 327, the percentage of mAP value is 54.1%, AP50 value is 77.2% and AP75 value is 69.3%. Table 2 demonstrates the comparative results of proposed model with existing approaches. Figure 4 explains the execution analysis of single-shot detector (SSD), YOLOv3, EfficientDet, YOLOv4 tiny, RetinaNet, and proposed CNN-based encoder decoder model for object detection. Compare to all other models, the proposed model has provided better results.
  • 5.  ISSN: 2252-8938 Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967 1964 Table 1. Results for proposed object detection model Dataset FPS mAP (%) AP50 (%) AP75 (%) MS-COCO 327 54.1 77.2 69.3 Table 2. Comparison of proposed CNN-based object detection model with existing algorithms Model Architecture AP75 (%) AP50 (%) mAP (%) FPS SSD VGG 30.3 48.5 28.8 36 YOLOv3 Darknet-53 34.3 58 33 66 EfficientDet EfficientNet 35.8 52.2 33.8 16 YOLOv4 tiny CSPNet-15 20 40 22 330 RetinaNet ResNet101 36.8 53.1 34.4 11 Proposed VGG-19 69.3 77.2 54.1 327 Figure 4. Performance analysis of existing approaches with proposed detection model The outcomes of every experimentation are examined to confirm the efficiency of the proposed method. For assessment, AP is utilized, that concerns to the region under the precision-recall curve. Usually, AP is computed for all classes, and its average is determined as the mAP. In addition, the AP50 denotes to the 50% region correctly detected in comparison to the ground truth, and for AP75 candidate images over 75% parts are considered. This study assured the operation of the model for real-time applications with a good recognition accurateness. 4. CONCLUSION We have noticed that recent efforts on object detection using CNN-based encoder-decoder models have addressed salient object detection (SOD) as a classification task at the pixel level. The proposed method was demonstrated through experimental findings on the open-source MS-COCO 2017 dataset to be capable of good detection accuracy and quick execution. The objective of this work going forward is to significantly enhance multiple object detection for high quality images without sacrificing prediction speed. It employs the unique technique of pooling indices as well, which uses fewer processing parameters and speeds up inference.With a mAP of 54.1 and 327 FPS, the suggested network model is highly suited for multiple object identification. To sum up, the model's ease of training and the proposed method's low computational resource requirements are its key features. As a result, the suggested approach is practical for many real-time applications and offers a more economical alternative. Overall, the suggested method results in a system for cutting-edge auto driving systems that is more affordable and more effective. FUNDING INFORMATION Funding information is not available. AUTHOR CONTRIBUTIONS STATEMENT This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration. 0 20 40 60 80 100 SSD YOLOv3 EfficientDet YOLOv4 tiny RetinaNet Proposed Percentage (%) Models Performance Comparison AP75 (%) AP50 (%) mAP (%)
  • 6. Int J Artif Intell ISSN: 2252-8938  Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran) 1965 Name of Author C M So Va Fo I R D O E Vi Su P Fu Mothiram Rajasekaran         Chitra Sabapathy Ranganathan           Nagarajan Mohankumar           Rajeshkumar Sampathrajan         Thayalagaran Merlin Inbamalar         Nageshvaran Nandhini        Shanmugam Sujatha        C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT The authors have no conflict of interest relevant to this paper. DATA AVAILABILITY The data that support the findings of this study are available on request from the corresponding author, [NM]. The data, which contain information that could compromise the privacy of research participants, are not publicly available due to certain restrictions. REFERENCES [1] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: a survey,” Proceedings of the IEEE, vol. 111, no. 3, pp. 257–276, 2023, doi: 10.1109/JPROC.2023.3238524. [2] Z. Li et al., “Deep learning-based object detection techniques for remote sensing images: a survey,” Remote Sensing, vol. 14, no. 10, 2022, doi: 10.3390/rs14102385. [3] J. Jegan, M. R. Suguna, M. Shobana, H. Azath, S. Murugan, and M. Rajmohan, “IoT-enabled black box for driver behavior analysis using cloud computing,” in 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), 2024, pp. 1–6, doi: 10.1109/ADICS58448.2024.10533471. [4] K. Muhammad, J. Ahmad, Z. Lv, P. Bellavista, P. Yang, and S. W. Baik, “Efficient deep CNN-based fire detection and localization in video surveillance applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 7, pp. 1419–1434, 2019, doi: 10.1109/TSMC.2018.2830099. [5] J.-M. Guo, J.-S. Yang, S. Seshathiri, and H.-W. Wu, “A light-weight CNN for object detection with sparse model and knowledge distillation,” Electronics, vol. 11, no. 4, Feb. 2022, doi: 10.3390/electronics11040575. [6] S. Srinivasan, R. Raja, C. Jehan, S. Murugan, C. Srinivasan, and M. Muthulekshmi, “IoT-enabled facial recognition for smart hospitality for contactless guest services and identity verification,” in 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024, pp. 1–6, doi: 10.1109/ICRITO61523.2024.10522363. [7] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv-Computer Science, pp. 1–6, 2018, doi: 10.48550/arXiv.1804.02767. [8] Y. H. Wu, Y. Liu, X. Zhan, and M. M. Cheng, “P2T: pyramid pooling transformer for scene understanding,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 12760–12771, 2023, doi: 10.1109/TPAMI.2022.3202765. [9] H. Jiang and E. Learned-Miller, “Face detection with the faster R-CNN,” in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), May 2017, pp. 650–657, doi: 10.1109/FG.2017.82. [10] Z. Wang, J. Zhu, S. Fu, S. Mao, and Y. Ye, “RFPNet: Reorganizing feature pyramid networks for medical image segmentation,” Computers in Biology and Medicine, vol. 163, 2023, doi: 10.1016/j.compbiomed.2023.107108. [11] A. Tragakis, C. Kaul, R. Murray-Smith, and D. Husmeier, “The fully convolutional transformer for medical image segmentation,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3649–3658, doi: 10.1109/WACV56688.2023.00365. [12] J. Ramasamy, E. Srividhya, V. Vaidehi, S. Vimaladevi, N. Mohankumar, and S. Murugan, “Cloud-enabled isolation forest for anomaly detection in UAV-based power line inspection,” in 2024 2nd International Conference on Networking and Communications (ICNWC), 2024, pp. 1–6, doi: 10.1109/ICNWC60771.2024.10537407. [13] D. Bai, X. Zheng, T. Liu, K. Li, and J. Yang, “Finger disability recognition based on holistically-nested edge detection,” in Intelligent Robotics and Applications, 2022, pp. 146–154, doi: 10.1007/978-3-031-13844-7_15. [14] M. R. Sudha et al., “Predictive modeling for healthcare worker well-being with cloud computing and machine learning for stress management,” International Journal of Electrical and Computer Engineering, vol. 15, no. 1, pp. 1218–1228, 2025, doi: 10.11591/ijece.v15i1.pp1218-1228. [15] Y. Xie, Y. Lu, and S. Wang, “RSANet: deep recurrent scale-aware network for crowd counting,” Proceedings - International Conference on Image Processing, ICIP, pp. 1531–1535, 2020, doi: 10.1109/ICIP40778.2020.9191086. [16] I. Filali, M. S. Allili, and N. Benblidia, “Multi-scale salient object detection using graph ranking and global–local saliency refinement,” Signal Processing: Image Communication, vol. 47, pp. 380–401, 2016, doi: 10.1016/j.image.2016.07.007. [17] Z. Wu, G. Allibert, F. Meriaudeau, C. Ma, and C. Demonceaux, “HiDAnet: RGB-D salient object detection via hierarchical depth awareness,” IEEE Transactions on Image Processing, vol. 32, pp. 2160–2173, 2023, doi: 10.1109/TIP.2023.3263111.
  • 7.  ISSN: 2252-8938 Int J Artif Intell, Vol. 14, No. 3, June 2025: 1960-1967 1966 [18] P. Maheswari, S. Gowriswari, S. Balasubramani, A. R. Babu, N. K. Jijith, and S. Murugan, “Intelligent headlights for adapting beam patterns with raspberry pi and convolutional neural networks,” in 2024 2nd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT), 2024, pp. 182–187, doi: 10.1109/DICCT61038.2024.10533159. [19] J. Hai, Y. Hao, F. Zou, F. Lin, and S. Han, “Advanced RetinexNet: A fully convolutional network for low-light image enhancement,” Signal Processing: Image Communication, vol. 112, 2023, doi: 10.1016/j.image.2022.116916. [20] D. Stavens and S. Thrun, “Unsupervised learning of invariant features using video,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1649–1656, 2010, doi: 10.1109/CVPR.2010.5539773. [21] C. C. Sekhar, K. Vijayalakshmi, A. S. Rao, V. Vedanarayanan, M. B. Sahaai, and S. Murugan, “Cloud-based water tank management and control system,” in 2023 2nd International Conference on Smart Technologies for Smart Nation, SmartTechCon 2023, 2023, pp. 641–646, doi: 10.1109/SmartTechCon57526.2023.10391730. [22] S. M. Hanif, L. Prevost, R. Belaroussi, and M. Milgram, “Real-time facial feature localization by combining space displacement neural networks,” Pattern Recognition Letters, vol. 29, no. 8, pp. 1094–1104, 2008, doi: 10.1016/j.patrec.2007.09.016. [23] B. J. Ganesh, P. Vijayan, V. Vaidehi, S. Murugan, R. Meenakshi, and M. Rajmohan, “SVM-based predictive modeling of drowsiness in hospital staff for occupational safety solution via IoT infrastructure,” in 2024 2nd International Conference on Computer, Communication and Control (IC4), 2024, pp. 1–5, doi: 10.1109/IC457434.2024.10486429. [24] N. Liu, J. Han, and M. H. Yang, “PiCANet: pixel-wise contextual attention learning for accurate saliency detection,” IEEE Transactions on Image Processing, vol. 29, pp. 6438–6451, 2020, doi: 10.1109/TIP.2020.2988568. [25] A. R. Rathinam, B. S. Vathani, A. Komathi, J. Lenin, B. Bharathi, and S. M. Urugan, “Advances and predictions in predictive auto-scaling and maintenance algorithms for cloud computing,” 2nd International Conference on Automation, Computing and Renewable Systems, ICACRS 2023 - Proceedings, pp. 395–400, 2023, doi: 10.1109/ICACRS58579.2023.10404186. [26] M. D. A. Hasan, K. Balasubadra, G. Vadivel, N. Arunfred, M. V. Ishwarya, and S. Murugan, “IoT-driven image recognition for microplastic analysis in water systems using convolutional neural networks,” in 2024 2nd International Conference on Computer, Communication and Control (IC4), 2024, pp. 1–6, doi: 10.1109/IC457434.2024.10486490. [27] S. Selvarasu, K. Bashkaran, K. Radhika, S. Valarmathy, and S. Murugan, “IoT-enabled medication safety: real-time temperature and storage monitoring for enhanced medication quality in hospitals,” 2nd International Conference on Automation, Computing and Renewable Systems, ICACRS 2023 - Proceedings, pp. 256–261, 2023, doi: 10.1109/ICACRS58579.2023.10405212. [28] K. Padmanaban, A. M. S. Kumar, H. Azath, A. K. Velmurugan, and M. Subbiah, “Hybrid data mining technique based breast cancer prediction,” AIP Conference Proceedings, vol. 2523, 2023, doi: 10.1063/5.0110216. [29] N. Mohankumar et al., “Advancing chronic pain relief cloud-based remote management with machine learning in healthcare,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 37, no. 2, pp. 1042–1052, 2025, doi: 10.11591/ijeecs.v37.i2.pp1042-1052. [30] M. Vadivel, V. B. Marin, S. Balasubramani, S. Hemalatha, S. Murugan, and S. Velmurugan, “Cloud-based passenger experience management in bus fare ticketing systems using random forest algorithm,” in 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024, pp. 1–6, doi: 10.1109/ICRITO61523.2024.10522226. [31] M. P. Aarthi, C. M. Reddy, A. Anbarasi, N. Mohankumar, M. V. Ishwarya, and S. Murugan, “Cloud-based road safety for real- time vehicle rash driving alerts with random forest algorithm,” in 2024 3rd International Conference for Innovation in Technology (INOCON), 2024, pp. 1–6, doi: 10.1109/INOCON60754.2024.10511316. [32] M. S. Kumar, H. Azath, A. K. Velmurugan, K. Padmanaban, and M. Subbiah, “Prediction of Alzheimer’s disease using hybrid machine learning technique,” AIP Conference Proceedings, vol. 2523, 2023, doi: 10.1063/5.0110283. [33] E. P. Kannan and T. V. Chithra, “Lagrange interpolation for natural colour image demosaicing,” International Journal of Advances in Signal and Image Sciences, vol. 7, no. 2, pp. 21–30, 2021, doi: 10.29284/ijasis.7.2.2021.21-30. BIOGRAPHIES OF AUTHORS Mothiram Rajasekaran is an accomplished IT leader with over 13 years of experience in big data and cloud solutions. He is recognized for his expertise in Apache Spark, Hadoop, Hive, Impala, and other cutting-edge technologies. He is proficiency in AI and machine learning enables him to deliver data-driven insights and innovative solutions. He excels in collaborating directly with clients, leading the design and implementation of data and application migrations to private and public clouds. He is leadership accelerates the adoption of emerging features and ensures the delivery of advanced data solutions for organizations. He can be contacted at email: mothiramrajasekaransekar@gmail.com. Chitra Sabapathy Ranganathan is (Client Partner | Account Management | IT Transformation Strategy | Digital Engineering Solutions & Advisory | Agile Delivery Adoption | Sales | CoE & CoP Setup. Results-oriented, accomplished business technology leader with 23+ years of experience in software engineering and design. Proven track record of conceptualizing, architecting, and delivering reliable and scalable systems in a variety of areas comprising multi-technologies including cloud, big data, AI, ML, advance analytics, blockchain, mainframe, and business intelligence. Executed complex engagements across multiple verticals, manage sales, IT Delivery and Operations, established vision, strategy, and journey maps that align with business priorities. Enterprise leader in digital engineering solutions & advisory, agile delivery adoption & management, pre-sales, CoE & CoP setup, IT transformation strategy, enterprise quality & digital assurance. He can be contacted at email: chitrasabapathyranganathan@gmail.com.
  • 8. Int J Artif Intell ISSN: 2252-8938  Convolutional neural network based encoder-decoder for efficient real-time … (Mothiram Rajasekaran) 1967 Dr. Nagarajan Mohankumar was born in India in 1978. He received his B.E. degree from Bharathiyar University, Tamilnadu, India in 2000 and M.E. and Ph.D degree from Jadavpur University, Kolkata in 2004 & 2010. He joined the Nano Device Simulation Laboratory in 2007 and worked as a Senior Research Fellow under CSIR direct Scheme till September 2009. Later he joined SKP Engineering College as a Professor to develop research activities in the field of VLSI and NANO technology. He is currently working as a ResearchProfessor at Symbiosis Institute of Technology, Nagpur Campus, Symbiosis (International) Deemed University, Pune, India. He is a senior member of IEEE. He has about 85 international journal publications in reputed journals and about 50 international conference proceedings. He received the carrier award for young teachers (CAYT) from AICTE, New Delhi in the year of 2012-2014. His research interest includes modeling and simulation study of HEMTs, optimization of devices for RF applications and characterization of advanced HEMT architecture, terahertz electronics, high frequency imaging, sensors and communication. He can be contacted at email: nmkprofessor@gmail.com. Rajeshkumar Sampathrajan is a Principal Cloud Architect at McKinsey, where he leads a team of engineers and architects in designing and building highly scalable, resilient, and distributed systems using the latest cloud native technology in Google Cloud Platform (GCP). He has over 17 years of experience in the IT industry, spanning various domains such as banking, retail, healthcare, and consulting. With multiple GCP certifications, as well as credentials in Azure, Snowflake, HashiCorp, Teradata, Cloudera, and ITIL. She is an expert in cloud computing, big data, machine learning, and security. He has successfully delivered solutions for complex and large-scale data analytics, data engineering, and data science projects, leveraging GCP BigQuery, Vertex AI, Dataiku, and other tools. He is passionate about helping clients transform their businesses with data-driven insights and innovative solutions. He can be contacted at email: rajesampathrajan@gmail.com. Dr. Thayalagaran Merlin Inbamalar serves as an Associate Professor in the Department of Electronics and Instrumentation Engineering at Saveetha Engineering College, Chennai, India. She earned her B.E. in electronics and instrumentation engineering in 2006 from Karunya Institute of Technology, Coimbatore, affiliated with Anna University, and her M.E. in applied electronics in 2008 from St. Joseph’s College of Engineering, Chennai, also affiliated with Anna University. She completed her Ph.D. at Anna University in 2024. With 16 years of teaching experience, she has contributed to numerous national and international journals, as well as patents. Her areas of expertise include image processing, instrumentation, and control. She can be contacted at email: merlininbamalar@gmail.com. Nageshvaran Nandhini has done her B.Tech. (information technology) at Kongu Engineering College, Erode and M.E. (Computer Science and Engineering) at Perumal Manimekalai College of Engineering, Hosur. She has started her teaching career in the year 2015 and she has more than 7 years of teaching experience. She has organized seminars, workshops for the benefit of students and guided several students’ projects. She has published 4 papers in international journals and presented 5 papers in various national / international conferences. She has authored 3 books and book chapters. She is the member of computer society of India. She is currently serving as an Assistant Professor in the Department of Information Technology, P.S.V. College of Engineering and Technology, Krishnagiri, Tamilnadu, India. She can be contacted at email: nandhini065@gmail.com. Shanmugam Sujatha is an adjunct professor, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu, India. She published her research articles in many international and national conferences and journals. Her research areas include network security and machine learning. She can be contacted at email: sujathasmvr@gmail.com.