IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 2, June 2024, pp. 1838~1844
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i2.pp1838-1844  1838
Journal homepage: http://ijai.iaescore.com
A lightweight you only look once for real-time dangerous
weapons detection
Aicha Khalfaoui1
, Abdelmajid Badri2
, Ilham El Mourabit3
Laboratory of Electronics, Energy, Automatic & Information Processing, Faculty of Sciences and Techniques Mohammedia, University
Hassan II Casablanca, Morocco
Article Info ABSTRACT
Article history:
Received May 23, 2023
Revised Nov 17, 2023
Accepted Dec 14, 2023
Deep neural networks are currently employed to detect weapons, and although
these techniques provide a high level of accuracy, it still suffers from large
weight parameters and a slow inference speed. When considering real-world
applications like weapon detection, these methods are frequently unsuitable
for deployment on embedded devices due to their large number of parameters
and poor efficiency. The most recent object detection technique, which falls
under the YOLOv5 (You Only Look Once version 5) family, is commonly
used for detecting weapons. However, it faces some difficulties such as high
computational parameters and an unfavorable detection rate. to solve these
shortcomings. an enhanced lightweight Yolov5s approach is suggested.
Which consists of a combination of YOLOv5 and GhostNet modules. To
evaluate the efficacy of the suggested technique, a set of experiments was
performed on the Sohas weapon dataset., which is commonly used as a
reference dataset in the field. Compared to the original YOLOv5, the results
indicate a slight increase in the proposed model's mean Average Precision
(mAP). Furthermore, there has been a reduction of 2.7 in giga floating point
operations per second (GFLOPs) and weights, and the number of model
parameters has decreased by 1.42.
Keywords:
Deep learning
Embedded devices
GhostNet
Weapons detection
YOLOv5
This is an open access article under the CC BY-SA license.
Corresponding Author:
Aicha Khalfaoui
Laboratory of Electronics, Energy, Automatic & Information Processing
Faculty of Sciences and Techniques Mohammedia, University Hassan II Casablanca
Mohammedia, Morocco.
Email: aicha96khalfaoui@gmail.com
1. INTRODUCTION
Abnormal object detection is a crucial component of video surveillance systems in smart cities. It
involves using advanced algorithms and artificial intelligence to identify and flag any objects or events deemed
abnormal or potentially harmful to the surrounding environment or people. This technology is used to enhance
public safety and security in various urban areas, such as streets, parking lots, and public transportation systems.
The primary goal of abnormal object detection is to quickly and accurately detect any potential threats and alert
the authorities or relevant stakeholders, allowing for a swift response to prevent or mitigate any harm.
Detecting dangerous weapons from surveillance video can be challenging. Carrobles et al. [1] show
that security personnel monitoring CCTV lose focus after 20 minutes. González et al. [2], Research has
indicated that in continuous video monitoring lasting 12 minutes, a security guard could overlook up to more
than 45% of activities. Furthermore, after a period of 22 minutes of continuous surveillance, more than 95%
of events could be missed. To overcome this, deep learning can be employed to automate surveillance video
feed monitoring, allowing for the detection of critical events.
Int J Artif Intell ISSN: 2252-8938 
A lightweight you only look once for real-time dangerous weapons detection (Aicha Khalfaoui)
1839
There are two primary categories of research related to identifying objects in images or videos, as
outlined in reference [3]. One group of methods aims to detect guns and knives using traditional algorithms that
do not involve deep learning, while another group aims to improve object detection accuracy through the
utilization of deep learning methods. Classical algorithms primarily utilize color-based segmentation, corner
detection, and appearance features for object recognition. However, the effectiveness of these algorithms is
impacted by the quality of frames or images they rely on, which can be considered a limitation, making it
challenging to interpret frames with occlusion or noise. Additionally, when the color segments of the foreground
and background are similar, it can be difficult to interpret the results when utilizing color-based segmentation, as
explained in [4]. Deep learning algorithms primarily rely on neural networks. A major benefit of employing a
neural network architecture is its ability to learn feature extraction automatically during training. Additionally,
training on a larger dataset enables the neural network model to recognize occluded frames.
The researches [5], [6] the authors presented a system for scanning luggage with X-rays in their
respective studies that utilized various object recognition methods like sliding window, You only look once and
Faster Region-based convolutional neural network (Faster R-CNN) to classify and detect objects. The system was
designed to classify objects into several categories including knives and knife parts, guns and gun parts. However,
the accuracy of object detection was limited under different lighting conditions and in the case of high occlusion.
A technique for identifying anomalies involves utilizing timed image-based CNN to detect actions [7].
In their study, Verma and Dhillon [8] described how they used the internet movie firearm database
(IMFDB) to train R-CNN algorithm with a classification head built on the visual geometry group (VGG16)
structure to identify handguns. Bhatti et al. [9]. created a custom database for YOLOv4 training and evaluated
its performance against cutting-edge methods. Their research focused on identifying specific objects such as
pistols, revolvers, wallets, metal detectors, and cell phones in videos. The study compared the YOLO model
with R-CNN, single shot multibox detector (SSD), and another version of YOLO. While some classification
models in static mode displayed encouraging results, they were less accurate and slower in real-time situations
when running on devices with limited resources. Although the models achieved high F1 scores on the initial
dataset, they were not suitable for scenarios that included background objects. Kanehisa and Neto [10]
investigate the implementation of the YOLO algorithm for the purpose of developing a system for detecting
firearms, demonstrating its effectiveness in this specific task.
While these studies have demonstrated satisfactory outcomes, their ability to deal with challenging
scenarios and resource-constrained environments is somewhat restricted. To address the issues highlighted
earlier, we conducted a series of experiments and developed a specialized CNN architecture.
The YOLO detection network has gained popularity since its proposal [11]–[14], mainly due to its fast and
accurate performance. Recently, the fifth version of this network has been introduced. Our study utilizes
YOLOv5 and introduces a new network that significantly reduces GFLOPs and network weight while
maintaining high accuracy. We also provide an explanation for the feasibility of this approach. Our experiments
demonstrate that our method has faster convergence and a higher mAP compared to the original YOLOv5.
This article is structured into four key parts. The introductory section serves as the first part, followed
by a section dedicated to detailing the proposed approach. The third part is centered on demonstrating the
experimental results and their significance. Lastly, the fourth section offers a summary of the primary
discoveries of the research and suggests future research directions.
2. METHOD
2.1. The Yolov5 network
Ultralytics proposed YOLOv5 [15], which is an enhanced version of YOLOv4 and serves as a
detection model that incorporates the strengths of prior editions and other networks like CSPNet and PANet
[16], [17]. As a result, it achieves a satisfactory balance between precision and speed. The structure of YOLOv5
is designed to be more efficient than its previous versions, despite its smaller size. Figure 1 shows a graphic
illustration of its fundamental structure.
The YOLOv5 algorithm integrates multiple scale prediction and merges the feature pyramid network
(FPN) and path aggregation network (PANet) networks. The FPN network transfers profound semantic
characteristics to less profound layers, which in turn enhances semantic expression at various scales. Similarly,
PANet network transmits the shallow layer's localization data to the deep layer, thereby improving localization
ability at different scales. By combining these two networks, YOLOv5 can boost both semantic representation
and localization skills on multiple scales. Its architecture primarily consists of C3, Conv, and spatial pyramid
pooling features (SPPF) modules. Moreover, the network width and depth can be regulated through YOLOv5's
utilization of depth and width multiples.
In broad terms, YOLOv5 has enhanced its performance by incorporating the following four elements:
i) Input improvements such as adaptive image scaling, adaptive anchor box computation, and mosaic data
augmentation; ii) Improved backbone architecture utilizing cross stage partial network (CSPNet) and Focus
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 2, June 2024: 1838-1844
1840
module. iii) Enhanced neck architecture with FPN and PANet networks. iv) Replacing the intersection over
union (IoU) with the complete intersection over union (CIoU) for loss function. These improvements have
contributed to YOLOv5's overall performance enhancements.
Figure 1. The principal component of YOLOv5s [18]
2.2. GhostNet module
Neural networks usually require a significant and extensive amount of parameters , particularly in
initial fully connected layers, to better-fit datasets. The emergence of convolutional neural networks has
enabled the application of filters to minimize the number of parameters required. In creating a network designed
to complete detection tasks, numerous feature maps containing hundreds of channels are typically required,
resulting in a large model. Model compression involves reducing a neural network's total number of parameters,
making it easier to deploy on embedded devices.
Some techniques have been developed by researchers to decrease the size of models. For example,
ShuffleNet [19], channel shuffle is used to improve the flow of information between channel groups. Xception
[20], in contrast, uses a split convolution operation and a more effective feature fusion approach to optimize
the parameters of the model. MobileNets [21], leverage a set of depth-wise separable convolutions to achieve
superior performance with fewer parameters. Finally, SqueezeNet [22], substitutes a 1x1 convolution kernel
for a 3x3 kernel and decreases the number of input channels after compression, the 480 Mb original size of the
model is reduced to 4.8 Mb. Han et al. came up with a novel technique called GhostNet to overcome the issue
of excessive parameters leading to high resource consumption and to facilitate deploying neural networks on
embedded devices more easily. The GhostNet method is designed to produce a larger number of feature maps
using more affordable operations [23], [24].
Figure 2 shows that Conv and C3 modules have been replaced by Ghostconv and C3Ghost,
respectively. There are an equal number of input channels and output channels in both modules, where C1
stands for the input channels and C2 for the output channels. GhostConv is made up of two Conv modules with
the number of hidden channels being half of the input and output channels of the module.
3. RESULTS AND DISCUSSION
3.1. Dataset description
In this work, the Sohas dataset is used [25], The dataset contains 4,014 images featuring weapons,
categorized by the type of handheld weapon object used, namely pistols and knives. The model was trained on
70% of the images, while 20% were allocated for validation, and the remaining 10% were utilized to test the
Int J Artif Intell ISSN: 2252-8938 
A lightweight you only look once for real-time dangerous weapons detection (Aicha Khalfaoui)
1841
model. The two categories were approximately balanced. As the Sohas dataset lacks blurry image samples
typically found in surveillance video frames, data augmentation is performed including horizontal flip, rotation,
and blur. To create a more diverse and extensive dataset. This can aid the model in generalizing better with
images obtained from surveillance videos.
3.2. Training protocol
The experiment was carried out on a Linux operating system, utilizing an NVIDIA Tesla T4 graphic
card, and incorporating CUDA 11.1, PyTorch 1.12.1, and Python 3.7 software environment. As the optimizer,
we employed the stochastic gradient descent (SGD) algorithm, while keeping the rest of the original YOLOv5
hyperparameters intact. To ensure thorough training, the total number of epochs for the experiment was set to
100, allowing the model to converge and achieve optimal results.
3.3. Results
There exist five primary versions of yolov5, which are YOLOv5x, YOLOv5l, YOLOv5m, YOLOv5s and
finally YOLOv5n. The differences between the yolov5's versions relate to their model size, depth, and the number
of parameters used. The results of our tests are presented in Table 1, which involved incorporating GhostNet into
different locations and comparing model performance using the metrics of weights, GFLOPs, and the number of
parameters. The results demonstrated that YOLOv5n and YOLOv5s are the two versions with smaller model sizes
than the other official versions. This makes them more memory-efficient, this can offer a benefit in environments
with limited resources, such as embedded systems or mobile devices.
Figure 2. Primary architecture of yolov5 incorporating the Ghostnet module [18]
Table 1. Model’s parameters
Models Params (Millions) Weights (Mb) GFLOPs
YOLOv5n 1.76 3.9 4.1
YOLOv5n-GHOST-ALL 0.94 2.3 2.3
YOLOv5n-GHOST-BACKBONE 1.29 3 3
YOLOv5n-GHOST-HEAD 1.42 3.2 3.6
YOLOv5s 7.02 14.4 15.9
YOLOv5s-GHOST-ALL 3.7 7.8 8.2
YOLOv5s-GHOST-BACKBONE 5.1 10.6 10.5
YOLOv5s-GHOST-HEAD 5.6 11.7 13.2
YOLOv5m 20.89 40.3 48.1
YOLOv5m-GHOST-ALL 8.55 16.8 18.4
YOLOv5m-GHOST- BACKBONE 15.5 30.1 38
YOLOv5m-GHOST-HEAD 13.89 27 28.5
YOLOv5l 46.17 88.6 108
YOLOv5l-GHOST-ALL 15.62 30.5 33.3
YOLOv5l-GHOST-BACKBONE 32.7 63.1 82
YOLOv5l-GHOST-HEAD 29.02 56 59.3
YOLOv5x 86.25 165 204.3
YOLOv5x-GHOST-ALL 25.09 48.7 53.3
YOLOv5x-GHOST-BACKBONE 59.2 113 151.3
YOLOv5x-GHOST-HEAD 52.1 100 106.4
Table 2 provides a comparison of the latest detection models based on the (mAP) and detection time.
yolov5s with GhostNet integrated achieve better results in terms of mAP. While yolov5n with GhostNet integrated
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 2, June 2024: 1838-1844
1842
represent the smallest detection time. Yolov5s-Head-Ghost shows suitable trade-off between precision and speed.
which makes it an effective choice for embedded and real time applications
The training loss curve shows that the inclusion of GhostNet in the head part of the model leads to a
faster rate of convergence, which is likely due to the model's compact design and reduced number of
parameters. Figure 3 demonstrate, that the model's loss is high in the beginning, as it makes inaccurate
predictions during training. However, through iterative optimization, the loss decreases as the model improves
its predictions. As the loss curve converges towards zero, it indicates the model is effectively capturing data
patterns and generalizing well to unseen data. The convergence also suggests the model is not overfitting,
performing well on new samples. Both training and validation loss decrease together, which confirms the model
is learning meaningful representations and not overfitting. and, Table 2 reveals a higher mAp with GhostNet.
This indicates that GhostNet is an efficient addition to the model. Figure 4 represents some test example
detection.
Table 2. Evaluation metrics of each model
Models mAP@0.5 Detection-Time(ms)
YOLOv3 0.98 14.6
Tiny-YOLOv3 0.959 7.69
YOLOv5n 0.976 5.9
YOLOv5n-GHOST-ALL 0.981 6.8
YOLOv5n-GHOST-BACKBONE 0.98 6.5
YOLOv5n-GHOST-HEAD 0.984 6.4
YOLOv5s 0.991 11.5
YOLOv5s-GHOST-ALL 0.993 9.9
YOLOv5s-GHOST-BACKBONE 0.992 10.5
YOLOv5s-GHOST-HEAD 0.994 10.9
Figure 3. loss curves of Yolov5s-Head-Ghost
3.2. Discussion
This study evaluates the effectiveness of GhostNet in several locations within the YOLOv5 model.
and shows that using GhostNet can achieve similar or better accuracy rate with fewer parameters and less
computation, making it suitable for embedding devices. The Yolov5s-Head-Ghost performs most efficiently
compared to other models, achieving better mAP with fewer weights and GFLOPs. The smaller model size of
YOLOv5s means that it can perform inference faster than the larger models. This shows that GhostNet can
effectively prune deep neural networks without compromising weapons detection performance. The pruning
effect depends on the network depth, GhostNet is a module that can be easily applied to other classical models
to reduce computation, but the tradeoff between accuracy and model size should still be considered for different
memory situations.
Int J Artif Intell ISSN: 2252-8938 
A lightweight you only look once for real-time dangerous weapons detection (Aicha Khalfaoui)
1843
Figure 4. Sample detection results for the test set
4. CONCLUSION
In this research paper, GhostNet is presented as a potential solution to decrease the amount of
computation required by deep neural networks. and create more efficient neural architectures that are suitable
for embedding devices. The GhostNet module is shown to be easily implemented to yolov5 models while
maintaining similar performance. This was demonstrated by comparing metrics such as mAP, FPS, and loss
curves. By only integrating GhostNet in the head part, the proposed method is shown to improve mAP and
reduce the loss. It is suggested that GhostNet could potentially replace ordinary convolution in the future, as it
can achieve similar effects through cheaper operations. In future work, we aim to train the model on other
larger datasets and further improve the detection time for a mobile application requiring a lightweight, accurate,
fast detection model.
REFERENCES
[1] M. M. F.-Carrobles, O. Deniz, and F. Maroto, “Gun and knife detection based on faster R-CNN for video surveillance,” in Iberian
Conference on Pattern Recognition and Image Analysis, Cham: Springer, 2019, pp. 441–452. doi: 10.1007/978-3-030-31321-0_38.
[2] J. L. S. González, C. Zaccaro, J. A. Á.-García, L. M. S. Morillo, and F. S. Caparrini, “Real-time gun detection in CCTV: An open
problem,” Neural Networks, vol. 132, pp. 297–308, Dec. 2020. doi: 10.1016/j.neunet.2020.09.013.
[3] K. U. Sharma and N. V. Thakur, “A review and an approach for object detection in images,” International Journal of Computational
Vision and Robotics, vol. 7, no. 1–2, pp. 196–237, 2017. doi: 10.1504/IJCVR.2017.081234.
[4] R. K. Tiwari and G. K. Verma, “A computer vision based framework for visual gun detection using harris interest point detector,”
in Procedia Computer Science, 2015, vol. 54, pp. 703–712. doi: 10.1016/j.procs.2015.06.083.
[5] M. E. Kundegorski, S. Akcay, M. Devereux, A. Mouton, and T. P. Breckon, “On using feature descriptors as visual words for object
detection within X-ray baggage security screening,” 2016. doi: 10.1049/ic.2016.0080.
[6] J. Zhang, W. Xing, M. Xing, and G. Sun, “Terahertz image detection with the improved faster region-based convolutional neural
network,” Sensors, vol. 18, no. 7, p. 2327, Jul. 2018. doi: 10.3390/s18072327.
[7] A. M. Atto, A. Benoit, and P. Lambert, “Timed-image based deep learning for action recognition in video sequences,” Pattern
Recognition, vol. 104, p. 107353, Aug. 2020. doi: 10.1016/j.patcog.2020.107353.
[8] G. K. Verma and A. Dhillon, “A handheld gun detection using faster R-CNN deep learning,” in Proceedings of the 7th International
Conference on Computer and Communication Technology, Nov. 2017, pp. 84–88. doi: 10.1145/3154979.3154988.
[9] M. T. Bhatti, M. G. Khan, M. Aslam, and M. J. Fiaz, “Weapon detection in real-time CCTV videos using deep learning,” IEEE
Access, vol. 9, pp. 34366–34382, 2021. doi: 10.1109/ACCESS.2021.3059170.
[10] R. Kanehisa and A. Neto, “Firearm detection using convolutional neural networks,” in Proceedings of the 11th International
Conference on Agents and Artificial Intelligence, 2019, pp. 707–714. doi: 10.5220/0007397707070714.
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” in 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91.
[12] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings - 30th IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2017, 2017, pp. 6517–6525. doi: 10.1109/CVPR.2017.690.
[13] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv, 2018.
[14] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv, 2020.
[15] “ultralytics/yolov5,” Ultralytics. [Online]. Available: https://github.com/ultralytics/yolov5
[16] C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh, “CSPNet: A new backbone that can enhance
learning capability of CNN,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020,
pp. 1571–1580. doi: 10.1109/CVPRW50498.2020.00203.
[17] K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “PANet: Few-shot image semantic segmentation with prototype alignment,” in
Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9196–9205. doi: 10.1109/ICCV.2019.00929.
[18] Y. Zhang, W. Cai, S. Fan, R. Song, and J. Jin, “Object detection based on YOLOv5 and GhostNet for Orchard Pests,” Information,
vol. 13, no. 11, p. 548, Nov. 2022. doi: 10.3390/info13110548.
[19] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,”
in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 6848–6856. doi:
10.1109/CVPR.2018.00716.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 2, June 2024: 1838-1844
1844
[20] F. Chollet, “Xception: deep learning with depthwise separable convolutions,” in 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Jul. 2017, pp. 1800–1807. doi: 10.1109/CVPR.2017.195.
[21] H. A. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto,
“MobileNets: efficient convolutional neural networks for mobile vision applications,” arXiv Computer Vision and Pattern
Recognition, vol. 14, no. 2, pp. 53–57, 2009.
[22] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x
fewer parameters and <0.5MB model size,” arXiv, 2016.
[23] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,” in Proceedings of the
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 1577–1586. doi:
10.1109/CVPR42600.2020.00165.
[24] K. Han et al., “GhostNets on heterogeneous devices via cheap operations,” International Journal of Computer Vision, vol. 130, no.
4, pp. 1050–1069, Apr. 2022. doi: 10.1007/s11263-022-01575-y.
[25] F. Pérez-Hernández, S. Tabik, A. Lamas, R. Olmos, H. Fujita, and F. Herrera, “Object detection binary classifiers methodology
based on deep learning to identify small objects handled similarly: Application in video surveillance,” Knowledge-Based Systems,
vol. 194, p. 105590, Apr. 2020. doi: 10.1016/j.knosys.2020.105590.
BIOGRAPHIES OF AUTHORS
Aicha Khalfaoui was born in Errachidia, Morocco on February 28, 1996. She
received her M.Sc. degree in industrial computing and instrumentation engineering from the
Faculty of Sciences and Techniques of Errachidia. She is currently a Ph.D. student in the
Laboratory of Electronics, Energy, Automatic, and Information Processing (EEA&TI)
Hassan II University, Mohammedia-Casablanca, Morocco. Her work studies and interests
focus on improving embedded real-time vision systems using Deep Learning techniques. She
can be contacted at email: aicha96khalfaoui@gmail.com.
Abdelmajid Badri is holder of a doctorate in Electronics and Image Processing
in 1992 at the University of Poitiers-France. In 1996, he obtained the diploma of the
authorization to Manage Researches (HDR) at the University of Poitiers-France, image
processing. He was a University Professor (PES-C) at the University Hassan II
Mohammedia-Casablanca Morocco (FSTM). In 2018, he became director of the superior
school of technology of Casablanca Morocco (EST). He is a member of the laboratory
EEA&TI (Electronics, Energy, Automatic and Information Processing) which he managed
since 1996. He managed several doctoral theses. He is a co author of several national and
international publications. He can be contacted at email: Abdelmajid_badri@yahoo.fr.
Ilham El Mourabit is an Assistant Professor and Researcher, holder of a doctoral
degree in Electronics and telecommunication systems from Hassan II university. She received
her M.Sc. degree in Electronic and Automatic Systems Engineering (Telecommunication and
Information Technologies specialty) from the Faculty of Sciences and Technology of
Mohammedia, Morocco. Currently working as an assistant professor at the FSTM. She is a
member of the EEA&TI Laboratory (Electronics, Energy, Automatic and Information
Processing), at Hassan II University Casablanca. Her main research areas are geolocation
technologies in wireless networks, image processing, computer vision, digital signal
processing, and vehicular communications. She can be contacted at email:
elmourabit.ilham@gmail.com.

More Related Content

PDF
Real Time Object Detection System with YOLO and CNN Models: A Review
PDF
YOLO-G_A_Lightweight_Network_Model_for_Improving_t.pdf
PDF
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
PDF
Enhancing fire detection capabilities: Leveraging you only look once for swif...
PDF
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
PDF
EMERGENCY VEHICLE SOUND DETECTION SYSTEMS IN TRAFFIC CONGESTION
PDF
You only look once model-based object identification in computer vision
PDF
YOLOv4: A Face Mask Detection System
Real Time Object Detection System with YOLO and CNN Models: A Review
YOLO-G_A_Lightweight_Network_Model_for_Improving_t.pdf
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Enhancing fire detection capabilities: Leveraging you only look once for swif...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
EMERGENCY VEHICLE SOUND DETECTION SYSTEMS IN TRAFFIC CONGESTION
You only look once model-based object identification in computer vision
YOLOv4: A Face Mask Detection System

Similar to A lightweight you only look once for real-time dangerous weapons detection (20)

PDF
Enhancement of YOLOv5 for automatic weed detection through backbone optimization
PDF
Real time ship detection using YOLOv5
PDF
Object Detetcion using SSD-MobileNet
PDF
DYNAMIC ENERGY MANAGEMENT USING REAL TIME OBJECT DETECTION
PDF
Deep-learning based single object tracker for night surveillance
PDF
Enhancing video anomaly detection for human suspicious behavior through deep ...
PDF
Deep Learning Approach for Event Monitoring System
PDF
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEM
PDF
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEM
PDF
Performance investigation of two-stage detection techniques using traffic lig...
PDF
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
PDF
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
PDF
information-11-00583-v3.pdf
PDF
MILITANT INTRUSION DETECTION USING MACHINE LEARNING
PDF
Object Detection Using YOLO Models
PDF
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
PDF
IRJET- Prediction of Anomalous Activities in a Video
PDF
An Overview of Various Techniques Involved in Detection of Anomalies from Sur...
PDF
Object Detection and Tracking AI Robot
PDF
ROAD POTHOLE DETECTION USING YOLOV4 DARKNET
Enhancement of YOLOv5 for automatic weed detection through backbone optimization
Real time ship detection using YOLOv5
Object Detetcion using SSD-MobileNet
DYNAMIC ENERGY MANAGEMENT USING REAL TIME OBJECT DETECTION
Deep-learning based single object tracker for night surveillance
Enhancing video anomaly detection for human suspicious behavior through deep ...
Deep Learning Approach for Event Monitoring System
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEM
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEM
Performance investigation of two-stage detection techniques using traffic lig...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
information-11-00583-v3.pdf
MILITANT INTRUSION DETECTION USING MACHINE LEARNING
Object Detection Using YOLO Models
An assistive model of obstacle detection based on deep learning: YOLOv3 for v...
IRJET- Prediction of Anomalous Activities in a Video
An Overview of Various Techniques Involved in Detection of Anomalies from Sur...
Object Detection and Tracking AI Robot
ROAD POTHOLE DETECTION USING YOLOV4 DARKNET
Ad

More from IAESIJAI (20)

PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Customer segmentation using association rule mining on retail transaction data
PDF
Averaged bars for cryptocurrency price forecasting across different horizons
PDF
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
PDF
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
PDF
Assured time series forecasting using inertial measurement unit, neural netwo...
PDF
Detection of partially occluded area in face image using U-Net model
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Novel artificial intelligence-based ensemble learning for optimized software ...
PDF
GradeZen: automated grading ecosystem using deep learning for educational ass...
PDF
Leveraging artificial intelligence through long short-term memory approach fo...
PDF
Application of the adaptive neuro-fuzzy inference system for prediction of th...
PDF
Novel preemptive intelligent artificial intelligence-model for detecting inco...
PDF
Techniques of Quran reciters recognition: a review
PDF
ApDeC: A rule generator for Alzheimer's disease prediction
Convolutional neural network based encoder-decoder for efficient real-time ob...
Comparative analysis of machine learning models for fake news detection in so...
Enhancing plagiarism detection using data pre-processing and machine learning...
Improvisation in detection of pomegranate leaf disease using transfer learni...
Customer segmentation using association rule mining on retail transaction data
Averaged bars for cryptocurrency price forecasting across different horizons
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
Assured time series forecasting using inertial measurement unit, neural netwo...
Detection of partially occluded area in face image using U-Net model
Flame analysis and combustion estimation using large language and vision assi...
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
The influence of sentiment analysis in enhancing early warning system model f...
Novel artificial intelligence-based ensemble learning for optimized software ...
GradeZen: automated grading ecosystem using deep learning for educational ass...
Leveraging artificial intelligence through long short-term memory approach fo...
Application of the adaptive neuro-fuzzy inference system for prediction of th...
Novel preemptive intelligent artificial intelligence-model for detecting inco...
Techniques of Quran reciters recognition: a review
ApDeC: A rule generator for Alzheimer's disease prediction
Ad

Recently uploaded (20)

PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Five Habits of High-Impact Board Members
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
The various Industrial Revolutions .pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
August Patch Tuesday
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Unlock new opportunities with location data.pdf
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
sustainability-14-14877-v2.pddhzftheheeeee
1 - Historical Antecedents, Social Consideration.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
A review of recent deep learning applications in wood surface defect identifi...
Five Habits of High-Impact Board Members
Enhancing emotion recognition model for a student engagement use case through...
Tartificialntelligence_presentation.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Getting Started with Data Integration: FME Form 101
Hindi spoken digit analysis for native and non-native speakers
DP Operators-handbook-extract for the Mautical Institute
The various Industrial Revolutions .pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Module 1.ppt Iot fundamentals and Architecture
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
August Patch Tuesday
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Unlock new opportunities with location data.pdf
Web Crawler for Trend Tracking Gen Z Insights.pptx

A lightweight you only look once for real-time dangerous weapons detection

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 2, June 2024, pp. 1838~1844 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i2.pp1838-1844  1838 Journal homepage: http://ijai.iaescore.com A lightweight you only look once for real-time dangerous weapons detection Aicha Khalfaoui1 , Abdelmajid Badri2 , Ilham El Mourabit3 Laboratory of Electronics, Energy, Automatic & Information Processing, Faculty of Sciences and Techniques Mohammedia, University Hassan II Casablanca, Morocco Article Info ABSTRACT Article history: Received May 23, 2023 Revised Nov 17, 2023 Accepted Dec 14, 2023 Deep neural networks are currently employed to detect weapons, and although these techniques provide a high level of accuracy, it still suffers from large weight parameters and a slow inference speed. When considering real-world applications like weapon detection, these methods are frequently unsuitable for deployment on embedded devices due to their large number of parameters and poor efficiency. The most recent object detection technique, which falls under the YOLOv5 (You Only Look Once version 5) family, is commonly used for detecting weapons. However, it faces some difficulties such as high computational parameters and an unfavorable detection rate. to solve these shortcomings. an enhanced lightweight Yolov5s approach is suggested. Which consists of a combination of YOLOv5 and GhostNet modules. To evaluate the efficacy of the suggested technique, a set of experiments was performed on the Sohas weapon dataset., which is commonly used as a reference dataset in the field. Compared to the original YOLOv5, the results indicate a slight increase in the proposed model's mean Average Precision (mAP). Furthermore, there has been a reduction of 2.7 in giga floating point operations per second (GFLOPs) and weights, and the number of model parameters has decreased by 1.42. Keywords: Deep learning Embedded devices GhostNet Weapons detection YOLOv5 This is an open access article under the CC BY-SA license. Corresponding Author: Aicha Khalfaoui Laboratory of Electronics, Energy, Automatic & Information Processing Faculty of Sciences and Techniques Mohammedia, University Hassan II Casablanca Mohammedia, Morocco. Email: aicha96khalfaoui@gmail.com 1. INTRODUCTION Abnormal object detection is a crucial component of video surveillance systems in smart cities. It involves using advanced algorithms and artificial intelligence to identify and flag any objects or events deemed abnormal or potentially harmful to the surrounding environment or people. This technology is used to enhance public safety and security in various urban areas, such as streets, parking lots, and public transportation systems. The primary goal of abnormal object detection is to quickly and accurately detect any potential threats and alert the authorities or relevant stakeholders, allowing for a swift response to prevent or mitigate any harm. Detecting dangerous weapons from surveillance video can be challenging. Carrobles et al. [1] show that security personnel monitoring CCTV lose focus after 20 minutes. González et al. [2], Research has indicated that in continuous video monitoring lasting 12 minutes, a security guard could overlook up to more than 45% of activities. Furthermore, after a period of 22 minutes of continuous surveillance, more than 95% of events could be missed. To overcome this, deep learning can be employed to automate surveillance video feed monitoring, allowing for the detection of critical events.
  • 2. Int J Artif Intell ISSN: 2252-8938  A lightweight you only look once for real-time dangerous weapons detection (Aicha Khalfaoui) 1839 There are two primary categories of research related to identifying objects in images or videos, as outlined in reference [3]. One group of methods aims to detect guns and knives using traditional algorithms that do not involve deep learning, while another group aims to improve object detection accuracy through the utilization of deep learning methods. Classical algorithms primarily utilize color-based segmentation, corner detection, and appearance features for object recognition. However, the effectiveness of these algorithms is impacted by the quality of frames or images they rely on, which can be considered a limitation, making it challenging to interpret frames with occlusion or noise. Additionally, when the color segments of the foreground and background are similar, it can be difficult to interpret the results when utilizing color-based segmentation, as explained in [4]. Deep learning algorithms primarily rely on neural networks. A major benefit of employing a neural network architecture is its ability to learn feature extraction automatically during training. Additionally, training on a larger dataset enables the neural network model to recognize occluded frames. The researches [5], [6] the authors presented a system for scanning luggage with X-rays in their respective studies that utilized various object recognition methods like sliding window, You only look once and Faster Region-based convolutional neural network (Faster R-CNN) to classify and detect objects. The system was designed to classify objects into several categories including knives and knife parts, guns and gun parts. However, the accuracy of object detection was limited under different lighting conditions and in the case of high occlusion. A technique for identifying anomalies involves utilizing timed image-based CNN to detect actions [7]. In their study, Verma and Dhillon [8] described how they used the internet movie firearm database (IMFDB) to train R-CNN algorithm with a classification head built on the visual geometry group (VGG16) structure to identify handguns. Bhatti et al. [9]. created a custom database for YOLOv4 training and evaluated its performance against cutting-edge methods. Their research focused on identifying specific objects such as pistols, revolvers, wallets, metal detectors, and cell phones in videos. The study compared the YOLO model with R-CNN, single shot multibox detector (SSD), and another version of YOLO. While some classification models in static mode displayed encouraging results, they were less accurate and slower in real-time situations when running on devices with limited resources. Although the models achieved high F1 scores on the initial dataset, they were not suitable for scenarios that included background objects. Kanehisa and Neto [10] investigate the implementation of the YOLO algorithm for the purpose of developing a system for detecting firearms, demonstrating its effectiveness in this specific task. While these studies have demonstrated satisfactory outcomes, their ability to deal with challenging scenarios and resource-constrained environments is somewhat restricted. To address the issues highlighted earlier, we conducted a series of experiments and developed a specialized CNN architecture. The YOLO detection network has gained popularity since its proposal [11]–[14], mainly due to its fast and accurate performance. Recently, the fifth version of this network has been introduced. Our study utilizes YOLOv5 and introduces a new network that significantly reduces GFLOPs and network weight while maintaining high accuracy. We also provide an explanation for the feasibility of this approach. Our experiments demonstrate that our method has faster convergence and a higher mAP compared to the original YOLOv5. This article is structured into four key parts. The introductory section serves as the first part, followed by a section dedicated to detailing the proposed approach. The third part is centered on demonstrating the experimental results and their significance. Lastly, the fourth section offers a summary of the primary discoveries of the research and suggests future research directions. 2. METHOD 2.1. The Yolov5 network Ultralytics proposed YOLOv5 [15], which is an enhanced version of YOLOv4 and serves as a detection model that incorporates the strengths of prior editions and other networks like CSPNet and PANet [16], [17]. As a result, it achieves a satisfactory balance between precision and speed. The structure of YOLOv5 is designed to be more efficient than its previous versions, despite its smaller size. Figure 1 shows a graphic illustration of its fundamental structure. The YOLOv5 algorithm integrates multiple scale prediction and merges the feature pyramid network (FPN) and path aggregation network (PANet) networks. The FPN network transfers profound semantic characteristics to less profound layers, which in turn enhances semantic expression at various scales. Similarly, PANet network transmits the shallow layer's localization data to the deep layer, thereby improving localization ability at different scales. By combining these two networks, YOLOv5 can boost both semantic representation and localization skills on multiple scales. Its architecture primarily consists of C3, Conv, and spatial pyramid pooling features (SPPF) modules. Moreover, the network width and depth can be regulated through YOLOv5's utilization of depth and width multiples. In broad terms, YOLOv5 has enhanced its performance by incorporating the following four elements: i) Input improvements such as adaptive image scaling, adaptive anchor box computation, and mosaic data augmentation; ii) Improved backbone architecture utilizing cross stage partial network (CSPNet) and Focus
  • 3.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 2, June 2024: 1838-1844 1840 module. iii) Enhanced neck architecture with FPN and PANet networks. iv) Replacing the intersection over union (IoU) with the complete intersection over union (CIoU) for loss function. These improvements have contributed to YOLOv5's overall performance enhancements. Figure 1. The principal component of YOLOv5s [18] 2.2. GhostNet module Neural networks usually require a significant and extensive amount of parameters , particularly in initial fully connected layers, to better-fit datasets. The emergence of convolutional neural networks has enabled the application of filters to minimize the number of parameters required. In creating a network designed to complete detection tasks, numerous feature maps containing hundreds of channels are typically required, resulting in a large model. Model compression involves reducing a neural network's total number of parameters, making it easier to deploy on embedded devices. Some techniques have been developed by researchers to decrease the size of models. For example, ShuffleNet [19], channel shuffle is used to improve the flow of information between channel groups. Xception [20], in contrast, uses a split convolution operation and a more effective feature fusion approach to optimize the parameters of the model. MobileNets [21], leverage a set of depth-wise separable convolutions to achieve superior performance with fewer parameters. Finally, SqueezeNet [22], substitutes a 1x1 convolution kernel for a 3x3 kernel and decreases the number of input channels after compression, the 480 Mb original size of the model is reduced to 4.8 Mb. Han et al. came up with a novel technique called GhostNet to overcome the issue of excessive parameters leading to high resource consumption and to facilitate deploying neural networks on embedded devices more easily. The GhostNet method is designed to produce a larger number of feature maps using more affordable operations [23], [24]. Figure 2 shows that Conv and C3 modules have been replaced by Ghostconv and C3Ghost, respectively. There are an equal number of input channels and output channels in both modules, where C1 stands for the input channels and C2 for the output channels. GhostConv is made up of two Conv modules with the number of hidden channels being half of the input and output channels of the module. 3. RESULTS AND DISCUSSION 3.1. Dataset description In this work, the Sohas dataset is used [25], The dataset contains 4,014 images featuring weapons, categorized by the type of handheld weapon object used, namely pistols and knives. The model was trained on 70% of the images, while 20% were allocated for validation, and the remaining 10% were utilized to test the
  • 4. Int J Artif Intell ISSN: 2252-8938  A lightweight you only look once for real-time dangerous weapons detection (Aicha Khalfaoui) 1841 model. The two categories were approximately balanced. As the Sohas dataset lacks blurry image samples typically found in surveillance video frames, data augmentation is performed including horizontal flip, rotation, and blur. To create a more diverse and extensive dataset. This can aid the model in generalizing better with images obtained from surveillance videos. 3.2. Training protocol The experiment was carried out on a Linux operating system, utilizing an NVIDIA Tesla T4 graphic card, and incorporating CUDA 11.1, PyTorch 1.12.1, and Python 3.7 software environment. As the optimizer, we employed the stochastic gradient descent (SGD) algorithm, while keeping the rest of the original YOLOv5 hyperparameters intact. To ensure thorough training, the total number of epochs for the experiment was set to 100, allowing the model to converge and achieve optimal results. 3.3. Results There exist five primary versions of yolov5, which are YOLOv5x, YOLOv5l, YOLOv5m, YOLOv5s and finally YOLOv5n. The differences between the yolov5's versions relate to their model size, depth, and the number of parameters used. The results of our tests are presented in Table 1, which involved incorporating GhostNet into different locations and comparing model performance using the metrics of weights, GFLOPs, and the number of parameters. The results demonstrated that YOLOv5n and YOLOv5s are the two versions with smaller model sizes than the other official versions. This makes them more memory-efficient, this can offer a benefit in environments with limited resources, such as embedded systems or mobile devices. Figure 2. Primary architecture of yolov5 incorporating the Ghostnet module [18] Table 1. Model’s parameters Models Params (Millions) Weights (Mb) GFLOPs YOLOv5n 1.76 3.9 4.1 YOLOv5n-GHOST-ALL 0.94 2.3 2.3 YOLOv5n-GHOST-BACKBONE 1.29 3 3 YOLOv5n-GHOST-HEAD 1.42 3.2 3.6 YOLOv5s 7.02 14.4 15.9 YOLOv5s-GHOST-ALL 3.7 7.8 8.2 YOLOv5s-GHOST-BACKBONE 5.1 10.6 10.5 YOLOv5s-GHOST-HEAD 5.6 11.7 13.2 YOLOv5m 20.89 40.3 48.1 YOLOv5m-GHOST-ALL 8.55 16.8 18.4 YOLOv5m-GHOST- BACKBONE 15.5 30.1 38 YOLOv5m-GHOST-HEAD 13.89 27 28.5 YOLOv5l 46.17 88.6 108 YOLOv5l-GHOST-ALL 15.62 30.5 33.3 YOLOv5l-GHOST-BACKBONE 32.7 63.1 82 YOLOv5l-GHOST-HEAD 29.02 56 59.3 YOLOv5x 86.25 165 204.3 YOLOv5x-GHOST-ALL 25.09 48.7 53.3 YOLOv5x-GHOST-BACKBONE 59.2 113 151.3 YOLOv5x-GHOST-HEAD 52.1 100 106.4 Table 2 provides a comparison of the latest detection models based on the (mAP) and detection time. yolov5s with GhostNet integrated achieve better results in terms of mAP. While yolov5n with GhostNet integrated
  • 5.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 2, June 2024: 1838-1844 1842 represent the smallest detection time. Yolov5s-Head-Ghost shows suitable trade-off between precision and speed. which makes it an effective choice for embedded and real time applications The training loss curve shows that the inclusion of GhostNet in the head part of the model leads to a faster rate of convergence, which is likely due to the model's compact design and reduced number of parameters. Figure 3 demonstrate, that the model's loss is high in the beginning, as it makes inaccurate predictions during training. However, through iterative optimization, the loss decreases as the model improves its predictions. As the loss curve converges towards zero, it indicates the model is effectively capturing data patterns and generalizing well to unseen data. The convergence also suggests the model is not overfitting, performing well on new samples. Both training and validation loss decrease together, which confirms the model is learning meaningful representations and not overfitting. and, Table 2 reveals a higher mAp with GhostNet. This indicates that GhostNet is an efficient addition to the model. Figure 4 represents some test example detection. Table 2. Evaluation metrics of each model Models mAP@0.5 Detection-Time(ms) YOLOv3 0.98 14.6 Tiny-YOLOv3 0.959 7.69 YOLOv5n 0.976 5.9 YOLOv5n-GHOST-ALL 0.981 6.8 YOLOv5n-GHOST-BACKBONE 0.98 6.5 YOLOv5n-GHOST-HEAD 0.984 6.4 YOLOv5s 0.991 11.5 YOLOv5s-GHOST-ALL 0.993 9.9 YOLOv5s-GHOST-BACKBONE 0.992 10.5 YOLOv5s-GHOST-HEAD 0.994 10.9 Figure 3. loss curves of Yolov5s-Head-Ghost 3.2. Discussion This study evaluates the effectiveness of GhostNet in several locations within the YOLOv5 model. and shows that using GhostNet can achieve similar or better accuracy rate with fewer parameters and less computation, making it suitable for embedding devices. The Yolov5s-Head-Ghost performs most efficiently compared to other models, achieving better mAP with fewer weights and GFLOPs. The smaller model size of YOLOv5s means that it can perform inference faster than the larger models. This shows that GhostNet can effectively prune deep neural networks without compromising weapons detection performance. The pruning effect depends on the network depth, GhostNet is a module that can be easily applied to other classical models to reduce computation, but the tradeoff between accuracy and model size should still be considered for different memory situations.
  • 6. Int J Artif Intell ISSN: 2252-8938  A lightweight you only look once for real-time dangerous weapons detection (Aicha Khalfaoui) 1843 Figure 4. Sample detection results for the test set 4. CONCLUSION In this research paper, GhostNet is presented as a potential solution to decrease the amount of computation required by deep neural networks. and create more efficient neural architectures that are suitable for embedding devices. The GhostNet module is shown to be easily implemented to yolov5 models while maintaining similar performance. This was demonstrated by comparing metrics such as mAP, FPS, and loss curves. By only integrating GhostNet in the head part, the proposed method is shown to improve mAP and reduce the loss. It is suggested that GhostNet could potentially replace ordinary convolution in the future, as it can achieve similar effects through cheaper operations. In future work, we aim to train the model on other larger datasets and further improve the detection time for a mobile application requiring a lightweight, accurate, fast detection model. REFERENCES [1] M. M. F.-Carrobles, O. Deniz, and F. Maroto, “Gun and knife detection based on faster R-CNN for video surveillance,” in Iberian Conference on Pattern Recognition and Image Analysis, Cham: Springer, 2019, pp. 441–452. doi: 10.1007/978-3-030-31321-0_38. [2] J. L. S. González, C. Zaccaro, J. A. Á.-García, L. M. S. Morillo, and F. S. Caparrini, “Real-time gun detection in CCTV: An open problem,” Neural Networks, vol. 132, pp. 297–308, Dec. 2020. doi: 10.1016/j.neunet.2020.09.013. [3] K. U. Sharma and N. V. Thakur, “A review and an approach for object detection in images,” International Journal of Computational Vision and Robotics, vol. 7, no. 1–2, pp. 196–237, 2017. doi: 10.1504/IJCVR.2017.081234. [4] R. K. Tiwari and G. K. Verma, “A computer vision based framework for visual gun detection using harris interest point detector,” in Procedia Computer Science, 2015, vol. 54, pp. 703–712. doi: 10.1016/j.procs.2015.06.083. [5] M. E. Kundegorski, S. Akcay, M. Devereux, A. Mouton, and T. P. Breckon, “On using feature descriptors as visual words for object detection within X-ray baggage security screening,” 2016. doi: 10.1049/ic.2016.0080. [6] J. Zhang, W. Xing, M. Xing, and G. Sun, “Terahertz image detection with the improved faster region-based convolutional neural network,” Sensors, vol. 18, no. 7, p. 2327, Jul. 2018. doi: 10.3390/s18072327. [7] A. M. Atto, A. Benoit, and P. Lambert, “Timed-image based deep learning for action recognition in video sequences,” Pattern Recognition, vol. 104, p. 107353, Aug. 2020. doi: 10.1016/j.patcog.2020.107353. [8] G. K. Verma and A. Dhillon, “A handheld gun detection using faster R-CNN deep learning,” in Proceedings of the 7th International Conference on Computer and Communication Technology, Nov. 2017, pp. 84–88. doi: 10.1145/3154979.3154988. [9] M. T. Bhatti, M. G. Khan, M. Aslam, and M. J. Fiaz, “Weapon detection in real-time CCTV videos using deep learning,” IEEE Access, vol. 9, pp. 34366–34382, 2021. doi: 10.1109/ACCESS.2021.3059170. [10] R. Kanehisa and A. Neto, “Firearm detection using convolutional neural networks,” in Proceedings of the 11th International Conference on Agents and Artificial Intelligence, 2019, pp. 707–714. doi: 10.5220/0007397707070714. [11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91. [12] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, pp. 6517–6525. doi: 10.1109/CVPR.2017.690. [13] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv, 2018. [14] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv, 2020. [15] “ultralytics/yolov5,” Ultralytics. [Online]. Available: https://github.com/ultralytics/yolov5 [16] C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 1571–1580. doi: 10.1109/CVPRW50498.2020.00203. [17] K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “PANet: Few-shot image semantic segmentation with prototype alignment,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9196–9205. doi: 10.1109/ICCV.2019.00929. [18] Y. Zhang, W. Cai, S. Fan, R. Song, and J. Jin, “Object detection based on YOLOv5 and GhostNet for Orchard Pests,” Information, vol. 13, no. 11, p. 548, Nov. 2022. doi: 10.3390/info13110548. [19] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 6848–6856. doi: 10.1109/CVPR.2018.00716.
  • 7.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 2, June 2024: 1838-1844 1844 [20] F. Chollet, “Xception: deep learning with depthwise separable convolutions,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 1800–1807. doi: 10.1109/CVPR.2017.195. [21] H. A. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, “MobileNets: efficient convolutional neural networks for mobile vision applications,” arXiv Computer Vision and Pattern Recognition, vol. 14, no. 2, pp. 53–57, 2009. [22] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” arXiv, 2016. [23] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 1577–1586. doi: 10.1109/CVPR42600.2020.00165. [24] K. Han et al., “GhostNets on heterogeneous devices via cheap operations,” International Journal of Computer Vision, vol. 130, no. 4, pp. 1050–1069, Apr. 2022. doi: 10.1007/s11263-022-01575-y. [25] F. Pérez-Hernández, S. Tabik, A. Lamas, R. Olmos, H. Fujita, and F. Herrera, “Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance,” Knowledge-Based Systems, vol. 194, p. 105590, Apr. 2020. doi: 10.1016/j.knosys.2020.105590. BIOGRAPHIES OF AUTHORS Aicha Khalfaoui was born in Errachidia, Morocco on February 28, 1996. She received her M.Sc. degree in industrial computing and instrumentation engineering from the Faculty of Sciences and Techniques of Errachidia. She is currently a Ph.D. student in the Laboratory of Electronics, Energy, Automatic, and Information Processing (EEA&TI) Hassan II University, Mohammedia-Casablanca, Morocco. Her work studies and interests focus on improving embedded real-time vision systems using Deep Learning techniques. She can be contacted at email: aicha96khalfaoui@gmail.com. Abdelmajid Badri is holder of a doctorate in Electronics and Image Processing in 1992 at the University of Poitiers-France. In 1996, he obtained the diploma of the authorization to Manage Researches (HDR) at the University of Poitiers-France, image processing. He was a University Professor (PES-C) at the University Hassan II Mohammedia-Casablanca Morocco (FSTM). In 2018, he became director of the superior school of technology of Casablanca Morocco (EST). He is a member of the laboratory EEA&TI (Electronics, Energy, Automatic and Information Processing) which he managed since 1996. He managed several doctoral theses. He is a co author of several national and international publications. He can be contacted at email: Abdelmajid_badri@yahoo.fr. Ilham El Mourabit is an Assistant Professor and Researcher, holder of a doctoral degree in Electronics and telecommunication systems from Hassan II university. She received her M.Sc. degree in Electronic and Automatic Systems Engineering (Telecommunication and Information Technologies specialty) from the Faculty of Sciences and Technology of Mohammedia, Morocco. Currently working as an assistant professor at the FSTM. She is a member of the EEA&TI Laboratory (Electronics, Energy, Automatic and Information Processing), at Hassan II University Casablanca. Her main research areas are geolocation technologies in wireless networks, image processing, computer vision, digital signal processing, and vehicular communications. She can be contacted at email: elmourabit.ilham@gmail.com.