VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 1
COP : TARGET RECKS USING YOLOv8
Abhishek Mandavkar¹, Dishant Save², Yash Patil³, Prof. Janhavi Sangoi⁴
¹ (Department of Computer Engineering, Mumbai University, Maharashtra, India)
² (Department of Computer Engineering, Mumbai University, Maharashtra, India)
³ (Department of Computer Engineering, Mumbai University, Maharashtra, India)
⁴ (Professor, Department of Computer Engineering, Mumbai University, Maharashtra, India)
Abstract: With the increasing need for effective wildlife monitoring and conservation efforts, computer vision
technologies have emerged as powerful tools for automating animal detection in diverse environments. This paper
introduces an innovative framework for the detection of Indian exclusive animals—species found exclusively in
India—employing the YOLOv8 (You Only Look One-level) object detection model. The proposed system is
reinforced by a meticulously annotated dataset created through the Computer Vision Annotation Tool (CVAT),
focusing specifically on the distinctive fauna inhabiting the Indian subcontinent. The YOLOv8 model, renowned
for its speed and accuracy, is employed to detect animals in images and video frames. The YOLOv8 model is
tailored to detect and classify indigenous animal species, ensuring its adaptability to the unique ecological
contexts of India. By harnessing the real-time capabilities of YOLOv8, the system enables efficient and timely
monitoring of exclusive wildlife populations, addressing the urgent need for accurate and scalable solutions in
conservation efforts. The CVAT annotated dataset encapsulates a diverse array of Indian endemic species,
encompassing various habitats and environmental conditions. The manual annotation process ensures precision
in delineating bounding boxes around animals, contributing significantly to the enhancement of the model's
detection accuracy for region-specific fauna. Addressing challenges such as diverse animal poses, complex
backgrounds, and varying lighting conditions, our framework demonstrates its adaptability to the specific
conditions prevalent in India. This work contributes to the growing body of research in wildlife conservation and
monitoring, providing a scalable and accurate solution for automated animal detection. The proposed framework
stands as a valuable tool for researchers, conservationists, and wildlife managers dedicated to safeguarding the
unique biodiversity of India and its integral role in global ecological balance.
Keywords- Android, Annotations, CVAT, Detection, Endemic species, YOLOv8.
I. INTRODUCTION
From self-driving cars and surveillance systems to augmented reality and healthcare, the ability to precisely
identify and locate objects within an image or video stream has revolutionized various industries. Object detection,
the process of identifying and locating objects within images or video frames, is a pivotal technology with
applications ranging from autonomous vehicles to surveillance systems and beyond. Among the myriad object
detection methods available, YOLO (You Only Look Once) stands out as a groundbreaking approach known for
its speed and accuracy. In this comprehensive guide, we delve into the exciting world of custom object detection
using YOLO. YOLO, first introduced in 2015 and improved in subsequent versions, offers real-time object
detection capabilities. It is particularly renowned for its ability to simultaneously locate and classify objects within
an image with remarkable speed. However, while the standard YOLO model is impressive, customizing it to
detect specific objects tailored to your unique requirements is where its true potential shines.
Establishing custom object detection with YOLO opens up a multitude of possibilities. Whether it's
recognizing specialized industrial components, unique species in ecological research, or custom products in the
retail sector, this project demonstrates the versatility and potential of YOLO to adapt and excel in diverse contexts.
This endeavor has delved into the intricacies of data preparation, model training, and fine-tuning to empower
YOLO to identify custom objects. The journey of establishing custom object detection using YOLO is one marked
by technical challenges, innovative solutions, and a commitment to pushing the boundaries of computer vision.In
this report, we will share the methodologies, insights, and outcomes of our journey in establishing custom object
detection using YOLO. Our success underscores the transformative power of this technology, providing solutions
for various industries and serving as a testament to the ever-evolving landscape of artificial intelligence and
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 2
computer vision. This project has encompassed every facet of custom object detection, from the meticulous
curation and preparation of datasets to the fine-tuning of YOLO's neural networks.
The outcomes of this endeavour not only provide practical solutions but also serve as a testament to the
boundless possibilities of modern artificial intelligence. This project will not only inspire further exploration but
also empower others to embark on their own journeys in the ever-evolving field of computer vision.
II. REVIEW OF LITERATURE SURVEY
Nithin Kumar ,Nagarathna and Francesco Flammini [1], A zoology subfield called entomology covers nonentity-
related exploration. We discovered that a further thorough disquisition is needed to identify the species position
of insects due to the vast number of dangerous nonentity populations. Entomology exploration is pivotal because
it opens new avenues and benefits for chemistry, drug, engineering, and the medicinal inspired by insects and
nature. Insects rob and annihilate a third of the world’s crops, performing in the loss of multitudinous products,
and businesses suffer losses. Quick and accurate identification of insects is essential to avoid fiscal losses and
progress the study of entomology. Scientists are also inspired by insects while developing robotics, detectors,
mechanical structures, aerodynamics, and intelligent systems. These factors make scholarly exploration on
nonentity discovery pivotal for demonstrating biodiversity. Relating the order position an nonentity belongs to
is vital in determining insects. Knowing the order position is necessary to separate the type. Nonentity orders
dating back to 2002 were discovered.
Sergejs Kodors, Edgars Rubauskis, Marks Sondors, lmars Apeinans, unars Lacis, Imants Zarembo [2], Pears are
the third most economically important fruit crop encyclopaedically reaching25.7 million tons in 2021. Although
it isn’t the most important fruit crop in Latvia, it forms a veritably important niche product with high added value,
and the area of pear growing is about 200. Timely and accurate vaticinator of fruit yield is also of great profitable
significance to optimally plan post-harvest conditioning, storehouse installations and deals. Developing similar
yield soothsaying systems has been going on for a long time for colourful fruit factory species. Still, the
disadvantage of all these soothsaying systems is the need for high- quality, large- scale data because the delicacy
of the developed model and the correspondence of the read and real gathered yield depend on it.
Tausif Diwan1 & G. Anirudh2 & Jitendra V. Tembhurne1 [3], provides a comprehensive exploration of the You
Only Look Once (YOLO) object detection framework. The paper covers various aspects related to YOLO,
including challenges in object detection, architectural advancements, datasets commonly used for evaluation, and
practical applications. It addresses the inherent challenges in object detection tasks, such as handling objects of
varying sizes, dealing with class imbalance, and addressing the need for large annotated datasets. The paper
explores the evolution and improvements made in YOLO's architecture and its successors. These datasets are
essential for benchmarking and validating the effectiveness of detection algorithms. The paper likely explores
real-world applications where YOLO and related models are employed for object detection. The authors likely
employed key components of the deep learning tech stack. Overall, the paper offers valuable insights into the
challenges, advancements, datasets, and real-world applications associated with YOLO-based object detection,
making it a valuable resource for researchers and practitioners in the field of computer vision.
Sita M. Yadav, Sandeep M. Chaware [4], it explores video object detection techniques, including both traditional
computer vision methods and deep learning approaches. The paper likely discusses various methods and
algorithms used for detecting objects within videos. It may compare the effectiveness of traditional computer
vision techniques, which include methods like feature extraction and tracking, with modern deep learning
methods, such as convolutional neural networks (CNNs) and their variants, for video object detection tasks.
Overall, this paper likely provides insights into the evolving landscape of video object detection, highlighting the
advantages and limitations of both traditional and deep learning-based approaches in the context of this critical
computer vision task. It may also discuss applications and challenges associated with video object detection.
Zhuo Bian, And LiangliangWang. [5] Frame difference is a quick diversity grounded segmentation approach for
object discovery, unfortunately, it gets trapped in over-segmented when the pixels of interest over time lap each
other. This paper presents a rather fast visual object discovery approach able of approaching the position of
moving object under heavy background noise or big imbrication caused by negative similarity. Object discovery
is a introductory but critical content confederated nearly with image segmentation, object shadowing and
recognition in computer vision, all of which have considerable implicit demand in. the field of videotape
surveillance, mortal- computer commerce, virtual reality, robotics, intelligent transportation system and others.
It’s no wonder that a large number of attempts have been made to exploit what features can be used to represent
the interest area from noninterest corridor which plays the crucial part, despite central challenges from script
complexity, scale variation, occlusion, illumination changing and others.
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 3
Mohammed Boukabous, Mostafa Azizi [6], this journal presents an innovative approach to crime prediction
through the integration of object detection and deep learning techniques. The research is conducted at the
Mathematics, Signal and Image Processing, and Computing Research Laboratory (MATSI) at the Superior School
of Technology (ESTO), Mohammed First University in Oujda, Morocco. In this study, the authors leverage the
power of computer vision and deep learning to develop a system capable of predicting crimes based on image and
video data. This research holds the potential to significantly contribute to the field of law enforcement and public
safety by automating the process of crime prediction through advanced technology. The paper's findings and
methodologies could prove valuable for both researchers and practitioners working on enhancing crime prevention
strategies.
Thomas Tsao and Doris Y. Tsao[7] Through a process of perceptual association that’s still not well understood,
the primate visual system transforms visual input conforming of a sluice of retinal images into a percept of stable,
separate objects. This process has traditionally been broken down into two separate problems the segmentation
problem, which addresses how visual pixels can be grouped into distinct objects within a single image and the
shadowing problem, which addresses how objects can be linked across images despite changing appearance. Both
problems are largely gruelling. In this paper, we explore the computational origin of the capability to member and
invariantly track objects and show that this problem can in principle be answered without literacy, supervised or
unsupervised. Completing image grounded approaches to segmentation and shadowing, a figure grounded
approach considers vision as an inverse plates problem. In this paper, we show that the problem of inferring 3D
shells from images is in fact completely constrained, if the input is in the form of a sequence of images of a scene
in which either the bystander or objects are moving.
Mingqi Gao, Feng Zheng, James J. Q. Yu, Caifeng Shan, Guiguang Ding ,Jungong Han [8] As one of the
abecedarian problems in the Feld of videotape understanding, videotape object segmentation aims at segmenting
objects of interest throughout the given videotape sequence. Lately, with the advancements of deep literacy ways,
deep neural networks have shown outstanding performance advancements in numerous computer vision
operations, with videotape object segmentation being one of the most supported and intensely delved. Latterly,
we summarise the datasets for training and testing a videotape object segmentation algorithm, as well as common
challenges and evaluation criteria. Next, former workshop are grouped and reviewed grounded on how they prize
and use spatial and temporal features, where their infrastructures, benefactions and the differences among each
other are developed. This composition is anticipated to serve as a tutorial and source of reference for learners
intended to snappily grasp the current progress in this exploration area and interpreters interested in applying the
videotape object segmentation styles to their problems.
Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah [9], this journal presents an innovative approach
to self-supervised video object segmentation. The research addresses the growing need for automated video object
segmentation, a fundamental task in computer vision with applications ranging from video editing to autonomous
navigation. In this paper, the authors propose a novel method that leverages self-supervised learning techniques
to achieve accurate and robust video object segmentation. This innovative approach allows the model to learn
from unlabeled video data, reducing the need for extensive manual annotations, which is a critical advantage in
practical applications. The proposed method is extensively evaluated on various benchmark datasets,
demonstrating impressive results in terms of video object segmentation accuracy and generalization across diverse
scenarios. The authors also highlight the potential real-world applications of their approach, emphasizing its
relevance in fields such as video editing, robotics, and surveillance.
Yadang Chen, Duolin Wang, Zhiguo Chen, Zhi-Xin Yang, and Enhua Wu [10], videotape object segmentation,
which aims to draw a detailed object mask on videotape frames, is extensively applicable to colourful fields similar
as autopilots, videotape editing, and videotape conflation. Propagation- grounded styles use the target’s temporal
consonance, and calculate on the mask from former frames. For illustration, Mask Track combines the
segmentation mask of the former frame with the current frame to form the mask of the current frame. Still, these
styles suffer from occlusion problems and error drift. Matching- grounded styles uses the first frame of a given
videotape as a reference frame and descry the segmented object singly in each frame. These styles are more
robust and reduce the impact of occlusion, but don’t take full advantage of spatiotemporal information.
Consequently, the performance and delicacy of some mongrel algorithms are bettered on the former two classes.
Yue Wu HKUST, Rongrong Gao HKUST, Jaesik Park POSTECH, Qifeng Chen HKUST [11], An artificial
intelligence system prognosticate a phot ore aplastic videotape conditioned on once visual observation? With an
accurate videotape vaticinator model, an intelligent agent can plan its stir according to the prognosticated
videotape. Unborn video generation ways can also be used to synthesize a long videotape by constantly extending
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 4
the future of the videotape. Utmost being styles attack the videotape vaticinator task by generating unborn
videotape frames one by one in an unsupervised fashion. These approaches synthesize unborn frames at the pixel
position without unequivocal modelling of the movements or semantics of the scene. Therefore, it’s difficult for
the model to grasp the conception of object boundaries to produce different movements for different objects.
Haidi Zhu, Haoran Wei, Baoqing Li, Xiaobing Yuan and Nasser Kehtarnavaz [12], Videotape object discovery
involves detecting objects using videotape data as compared to conventional object discovery using static images.
Two operations that have played a major part in the growth of videotape object discovery are independent driving
and videotape surveillance. In 2015, videotape object discovery came a new task of the Image Net Large Scale
Visual Recognition Challenge (ILSVRC2015). In general, object discovery approaches can be grouped into two
major orders one- stage sensors and two- stage sensors. One- stage sensors are frequently more computationally
effective than two- stage sensors. Still, two- stage sensors are shown to produce advanced rigor compared to one-
stage sensors. Still, using object discovery on each image frame doesn’t take into consideration the following
attributes in videotape data Since there live both spatial and temporal correlations between image frames, there
are point birth redundancies between conterminous frames. Detecting objects from poor quality frames leads to
low rigor. Videotape object discovery approaches attempt to address the above challenges. Some approaches
make use of the spatial-temporal information to ameliorate delicacy, similar as fusing features on different
situations.
III. ANALYSIS TABLE
Table 1 Analysis Table
Title Summary Advantages Open Challenges
YOLO-Based
Light-Weight Deep
Learning Models
for Insect Detection
System with Field
Adaption [1]
The most inconceivable
diversity, cornucopia, spread,
and rigidity in biology are set
up in insects. The foundation of
nonentity study and pest
operation is nonentity
recognition. Still, utmost of the
current nonentity recognition
exploration depends on a small
number of nonentity taxonomic
experts. We can use computers
to separate insects directly
rather of professionals because
of the quick advancement of
computer technology.
The deep feature
extraction function
makes it perform well in
image, audio, and text
data. Easy to update data
through back
propagation. Different
architectures are suitable
for different problems.
The hidden layer reduces
the dependence of
algorithm on feature
engineering.
One of the main
challenges of deep
learning is the need for
large amounts of data and
computational resources.
Neural networks learn
from data by adjusting
their parameters to
minimize a loss function,
which measures how well
they fit the data.
Rapid Prototyping
of Pear Detection
Neural Network
with YOLO
Architecture in
Photographs [2]
Fruit yield estimation and
soothsaying are essential
processes for data- grounded
decision- making in
agribusiness to optimise fruit-
growing and selling operations.
The yield soothsaying is
grounded on the operation of
literal data, which was
collected in the result of
periodic yield estimation.
One of the main
advantages of YOLO v7
is its speed. It can process
images at a rate of 155
frames per second, much
faster than other state-of-
the-art object detection
algorithms. Even the
original baseline YOLO
model was capable of
processing at a maximum
rate of 45 frames per
second.
YOLO may struggle with
small or overlapping
objects, as it can only
predict a fixed number of
boxes per cell. It may also
miss some objects that do
not fit well into the grid,
or produce false positives
for background regions.
Object detection
using YOLO:
challenges,
The paper addresses key
challenges in object detection,
such as handling objects of
The paper likely provides
a comprehensive
overview of object
Some open challenges in
this context include
addressing object
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 5
architectural
successors, datasets
and applications [3]
different sizes, dealing with
class imbalances, and the need
for extensive annotated
datasets. It explores the
evolution of YOLO's
architecture and how
subsequent versions have
enhanced object detection
capabilities.
detection using the
YOLO (You Only Look
Once) framework. It may
cover the challenges
faced in object detection
tasks, the evolution of
YOLO architecture,
available datasets, and
real-world applications.
detection difficulties like
handling diverse object
sizes and class imbalance,
as well as the need for
large annotated datasets.
Video Object
Detection through
Traditional and
Deep Learning
Methods [4]
The paper explores video
object detection techniques,
covering traditional computer
vision methods and deep
learning approaches. The paper
provides insights into the
evolving landscape of video
object detection, highlighting
the pros and cons of both
approaches and discussing
practical applications and
challenges in this critical field
of computer vision.
It delivers a
comprehensive overview
of video object detection
techniques,
encompassing both
traditional computer
vision methods and
modern deep learning
approaches. This breadth
of coverage enables
readers to develop a
holistic understanding of
the field and make
informed decisions about
methodology selection.
Video object detection
presents several
challenges in achieving
real-time performance,
maintaining object
tracking consistency amid
occlusions and motion,
handling scale and
viewpoint variations, and
detecting multiple objects
simultaneously.
Detecting Moving
Object via
Projection of
Forward Backward
Frame Difference
[5]
- Introduce the importance of
moving object detection in
computer vision and video
analysis. Explain the specific
approach of "Projection of
Forward-Backward Frame
Difference" and its
significance.
Provide an overview of
significant research
papers, projects, or case
studies that have
contributed to the
development and
understanding of this
specific technique. Cite
and summarize these
influential sources.
Discuss the challenges
and limitations associated
with the "Projection of
Forward-Backward
Frame Difference"
method, such as handling
complex scenes or
dynamic lighting
conditions.
Image and video-
based crime
prediction using
object detection and
deep learning [6]
By harnessing computer vision
and state-of-the-art object
detection methods, the authors
aim to automatically predict
crimes from image and video
data. Researchers and
practitioners in the field can
benefit from the findings and
methodologies presented in this
paper.
Firstly, it represents an
innovative and advanced
approach to crime
prediction, harnessing
the capabilities of
computer vision and deep
learning algorithms.
Secondly, the research
conducted at MATSI in
Morocco demonstrates
the potential to automate
the crime prediction
process using image and
video data, which can
significantly improve the
efficiency and accuracy
Ensuring the accuracy
and reliability of object
detection in complex real-
world environments,
where lighting, weather,
and object occlusions can
affect results, remains a
significant hurdle.
Second, handling privacy
and ethical concerns
related to the use of image
and video data in crime
prediction systems is
essential. Third, the
scalability and real-time
processing capabilities of
such systems must be
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 6
of law enforcement
efforts.
addressed to make them
practical for law
enforcement agencies.
A topological
solution to object
segmentation and
tracking. [7]
The world is composed of
objects, the ground, and the
sky. Visual perception of
objects requires working two
abecedarian challenges
segmenting visual input into
separate units and tracking
individualities of these units
despite appearance changes
due to object distortion,
changing perspective, and
dynamic occlusion.
Topological solutions for
object segmentation and
tracking offer several
advantages, which make
them a promising
approach in the field of
computer vision and
image analysis.
Address the current
challenges in topological
object segmentation and
tracking. Suggest
potential future directions
for research in this field,
such as combining TDA
with deep learning
techniques or developing
real-time topological
solutions.
Deep learning for
video object
segmentation: a
review. [8]
- Introduce the significance of
video object segmentation and
its applications in computer
vision, robotics, and
autonomous systems. Explain
the importance of deep learning
techniques in advancing video
object segmentation. Provide
an overview of the foundational
concepts of deep learning,
including neural networks and
convolutional neural networks
(CNNs).
Deep learning for video
object segmentation
offers several
advantages, making it a
prominent approach in
computer vision and
video analysis. They can
learn complex features
and temporal
dependencies in video
sequences, leading to
precise segmentation
results.
Describe commonly used
datasets for video object
segmentation, such as
DAVIS, SegTrack, and
YouTube-Objects.
Highlight benchmark
challenges that have
driven the development of
deep learning algorithms
in this domain.
Self-Supervised
Video Object
Segmentation via
Cutout Prediction
and Tagging [9]
This journal paper introduces a
promising self-supervised
approach for video object
segmentation, offering a
valuable contribution to the
field of computer vision. The
combination of cutout
prediction, tagging, and a novel
loss function enhances the
model's ability to segment
objects accurately in videos,
opening up opportunities for
more efficient and automated
video analysis in various
applications.
By integrating cutout
prediction and tagging
mechanisms, the method
enhances object
boundary understanding
and tracking
performance, making it
robust across diverse
video scenarios. These
innovations hold the
potential to significantly
improve video analysis
applications, including
video editing, robotics,
and surveillance, by
automating and
enhancing object
segmentation tasks.
Achieving a balance
between segmentation
accuracy and
computational efficiency
is essential, particularly in
robotics and autonomous
systems demanding low-
latency processing.
Scaling the approach to
handle extensive video
datasets and deploying it
in resource-constrained
environments poses
challenges in data
management, model
optimization, and
hardware constraints.
Global video object
segmentation with
spatial constraint
Module [10]
In this paper we present a
feather light and effective semi-
supervised videotape object
segmentation network
grounded on the space- time
CNN extracts features
automatically. Single
feature extraction by
CNN. The object
detection rate is high.
Overall, our solution is
efficient and compatible,
and we hope it will set a
strong baseline for other
real-time video object
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 7
memory frame. The algorithm
uses a global environment(
GC) module to achieve high-
performance, real- time
segmentation.
Adding feature pyramid
model, and good for
small object detection.
Based on Google Net,
fast in speed.
segmentation solutions in
the future.
Future Video
Synthesis with
Object Motion
Prediction [11]
We present an approach to
prognosticate unborn
videotape frames given a
sequence of nonstop videotape
frames in the history. With this
procedure, our system exhibits
much lower tearing or
deformation artefact compared
to other approaches.
They can be used in
interior and exterior
passage areas. While
they are particularly
useful in spaces with
many people, they can
also be used in certain
areas of homes or
residential areas such as
entryways or gardens.
For future frame
prediction. Our method
produces future Frames
by firstly decomposing
possible moving objects
into Currently-moving or
static objects.
A Review of Video
Object Detection:
Datasets, Metrics
and Methods [12]
Although there are well
established object discovery
styles grounded on stationary
images, their operation to
videotape data on a frame by
frame base faces two failings
lack of computational
effectiveness due to
redundancy across image
frames or by not using a
temporal and spatial
correlation of features across
image frames, and lack of
robustness to real- world
conditions similar as stir blur
and occlusion.
It allows us to be more
efficient and to focus on
some other task while the
machine works on its
own. Object Detection is
a key task of Computer
Vision. Thanks to it, the
machine is able to locate,
identify and classify an
object in an image.
Challenges still remain
for further improving the
accuracy and speed of the
video object detection
methods. This section
presents the major
challenges and possible
future trends as related to
video
object detection.
IV. CONCLUSION
The YOLOv8-CVAT framework, encapsulated within an Android application, stands as a beacon of
technological innovation with successful implementation of YOLOv8-CVAT within an Android application
propelled the boundaries of wildlife monitoring but has also laid the foundation for a transformative paradigm in
conservation efforts [8]. Initially tailored for Indian exclusive species, reveals its potential for cross-domain
application by seamlessly integrating with various other datasets. This work sparks further collaborations,
adaptations, and innovations, leading to a future where advanced technology becomes an integral ally in the
ceaseless effort to protect and preserve the diverse and unique wildlife that graces our planet.
REFERENCES
[1] Kumar, N.; Nagarathna; Flammini, F. YOLO-Based Light-Weight Deep Learning Models for Insect Detection System with Field Adaption.
Agriculture 2023, 13, 741. https://doi.org/10.3390/ agriculture13030741
[2] Sergejs Kodors, Marks Sondors, Gunārs Lācis, Edgars Rubauskis, Ilmārs Apeināns, Imants Zarembo, Environment. Technology.
Resources. Rezekne, Latvia Proceedings of the 14th International Scientific and Practical Conference. Volume 1, 81-85, 2023
[3] Tausif Diwan1 & G. Anirudh2 & Jitendra V. Tembhurne1, “Object detection using YOLO: challenges, architectural successors, datasets
and applications”, Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India and
Department of Data science and analytics, Central University of Rajasthan, Jaipur, Rajasthan, India, 2022
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023)
ISSN(Online): 2581-7280 Article No. 20
PP 01-08
www.viva-technology.org/New/IJRI 8
[4] Sita M. Yadav, Sandeep M. Chaware, “Video Object Detection through Traditional and Deep Learning Methods”, International Journal
of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume-9 Issue-4, April, 2020
[5] Zhuo Bian1 and LiangliangWang2, “Detecting Moving Object via Projection of Forward-Backward Frame Difference”, International
Journal of Future Generation Communication and Networking Vol. 9, No. 5 (2016), pp. 143-152
http://dx.doi.org/10.14257/ijfgcn.2016.9.5.14
[6] Mohammed Boukabous, Mostafa Azizi, “Image and video-based crime prediction using object detection and deep learning”, Bulletin of
Electrical Engineering and Informatics Vol. 12, No. 3, June 2023, pp. 1630~1638
[7] Thomas Tsaoa and Doris Y. Tsao, “A topological solution to object segmentation and tracking”, National Academy of Sciences elected
in, 2020
[8] Mingqi Gao1,2 · Feng Zheng2 · James J. Q. Yu2 · Caifeng Shan3 · Guiguang Ding4 · Jungong Han, “Deep learning for video object
segmentation: a review”, Artifcial Intelligence Review (2023) 56:457–531, April 2022
[9] Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah, “Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging”, MBZ University of Artificial Intelligence UAE, 22 April 2022
[10] Yadang Chen1 , Duolin Wang1 (), Zhiguo Chen1 , Zhi-Xin Yang2 , and Enhua Wu, “Global video object segmentation with spatial
constraint module”, Computational Visual Media https://doi.org/10.1007/s41095-022-0282-8, 2022
[11] Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen, “Future Video Synthesis with Object Motion Prediction”, arXiv:2004.00542v2
[cs.CV] 15 Apr 2020
[12] Haidi Zhu 1,2, Haoran Wei 3 , Baoqing Li 1,* , Xiaobing Yuan 1 and Nasser Kehtarnavaz, “A Review of Video Object Detection:
Datasets, Metrics and Methods”, Appl. Sci. 2020, 10, 7834

More Related Content

PDF
Wildlife Identification using Object Detection using Computer Vision and YOLO
PDF
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issue...
PDF
Pothole Detection for Safer Commutes with the help of Deep learning and IOT D...
PPTX
PLANT DISEASE IDENTIFICATION using AIML and Data science
PDF
Godeye An Efficient System for Blinds
PDF
Real Time Object Detection with Audio Feedback using Yolo v3
PDF
Enhancement of YOLOv5 for automatic weed detection through backbone optimization
PDF
Fra scienza e impresa: l’innovazione nei processi produttivi –Esempi di innov...
Wildlife Identification using Object Detection using Computer Vision and YOLO
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issue...
Pothole Detection for Safer Commutes with the help of Deep learning and IOT D...
PLANT DISEASE IDENTIFICATION using AIML and Data science
Godeye An Efficient System for Blinds
Real Time Object Detection with Audio Feedback using Yolo v3
Enhancement of YOLOv5 for automatic weed detection through backbone optimization
Fra scienza e impresa: l’innovazione nei processi produttivi –Esempi di innov...

Similar to COP : TARGET RECKS USING YOLOv8 Android (20)

PDF
AN APPROACH TO DESIGN A RECTANGULAR MICROSTRIP PATCH ANTENNA IN S BAND BY TLM...
PDF
Pattern recognition using video surveillance for wildlife applications
PDF
IRJET- Indoor Shopping System for Visually Impaired People
PDF
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
PDF
AnREUExperienceWithMicroAssemblyWorkcellResearch
PPTX
"EcoTrack: AI-Powered Wildlife Identification and Conservation Tool".pptx
PDF
Potential of Artificial Intelligence in Removal of Medical Waste
PDF
Endangered Species Conservation
PDF
A Novel Approach for Machine Learning-Based Identification of Human Activities
PDF
Soft Computing in Robotics
PDF
Voice based Application as Medicine Spotter for Visually Impaired
PDF
Gated recurrent unit decision model for device argumentation in ambient assis...
PDF
3214ijscai01.pdf
PDF
Bacteria identification from microscopic
PDF
Automated Robotic System for Medical Assistance
PDF
Robotics in Healthcare
PDF
Smart prison technology and challenges: a systematic literature reviews
PDF
Analysis_of_Navigation_Assistants_for_Blind_and_Visually_Impaired_People_A_Sy...
PDF
Machine Learning in Robotics
PDF
iThings-2012, Besançon, France, 20 November, 2012
AN APPROACH TO DESIGN A RECTANGULAR MICROSTRIP PATCH ANTENNA IN S BAND BY TLM...
Pattern recognition using video surveillance for wildlife applications
IRJET- Indoor Shopping System for Visually Impaired People
Determination of Various Diseases in Two Most Consumed Fruits using Artificia...
AnREUExperienceWithMicroAssemblyWorkcellResearch
"EcoTrack: AI-Powered Wildlife Identification and Conservation Tool".pptx
Potential of Artificial Intelligence in Removal of Medical Waste
Endangered Species Conservation
A Novel Approach for Machine Learning-Based Identification of Human Activities
Soft Computing in Robotics
Voice based Application as Medicine Spotter for Visually Impaired
Gated recurrent unit decision model for device argumentation in ambient assis...
3214ijscai01.pdf
Bacteria identification from microscopic
Automated Robotic System for Medical Assistance
Robotics in Healthcare
Smart prison technology and challenges: a systematic literature reviews
Analysis_of_Navigation_Assistants_for_Blind_and_Visually_Impaired_People_A_Sy...
Machine Learning in Robotics
iThings-2012, Besançon, France, 20 November, 2012
Ad

More from vivatechijri (20)

PDF
Design and Implementation of Water Garbage Cleaning Robot
PDF
Software Development Using Python Language For Designing Of Servomotor
PDF
GSM Based Controlling and Monitoring System of UPS Battery
PDF
Electrical Drive Based Floor Cleaning Robot
PDF
IoT BASED FIRE EXTINGUISHER SYSTEM with IOT
PDF
Wave Energy Generation producing electricity in future
PDF
Predictive Maintenance of Motor Using Machine Learning
PDF
Development of an Android App For Designing Of Stepper Motor By Kodular Software
PDF
Implementation Technology to Repair Pothole Using Waste Plastic
PDF
NFC BASED VOTING SYSTEM with Electronic voting devices
PDF
Review on Electrical Audit Management in MATLAB Software.
PDF
DESIGN AND FABRICATION OF AUTOMATIC CEMENT PLASTERING MACHINE
PDF
Research on Inspection Robot for Chemical Industry
PDF
Digital Synchroscope using Arduino microcontroller
PDF
BLDC MACHINE DESIGN SOFTWARE AND CALCULATION
PDF
SIMULATION MODEL OF 3 PHASE TRANSMISSION LINE FAULT ANALYSIS
PDF
Automated Water Supply and Theft Identification Using ESP32
PDF
Multipurpose Swimming Pool Cleaning Device for Observation, Cleaning and Life...
PDF
Annapurna – Waste Food Management system
PDF
A One stop APP for Personal Data management with enhanced Security using Inte...
Design and Implementation of Water Garbage Cleaning Robot
Software Development Using Python Language For Designing Of Servomotor
GSM Based Controlling and Monitoring System of UPS Battery
Electrical Drive Based Floor Cleaning Robot
IoT BASED FIRE EXTINGUISHER SYSTEM with IOT
Wave Energy Generation producing electricity in future
Predictive Maintenance of Motor Using Machine Learning
Development of an Android App For Designing Of Stepper Motor By Kodular Software
Implementation Technology to Repair Pothole Using Waste Plastic
NFC BASED VOTING SYSTEM with Electronic voting devices
Review on Electrical Audit Management in MATLAB Software.
DESIGN AND FABRICATION OF AUTOMATIC CEMENT PLASTERING MACHINE
Research on Inspection Robot for Chemical Industry
Digital Synchroscope using Arduino microcontroller
BLDC MACHINE DESIGN SOFTWARE AND CALCULATION
SIMULATION MODEL OF 3 PHASE TRANSMISSION LINE FAULT ANALYSIS
Automated Water Supply and Theft Identification Using ESP32
Multipurpose Swimming Pool Cleaning Device for Observation, Cleaning and Life...
Annapurna – Waste Food Management system
A One stop APP for Personal Data management with enhanced Security using Inte...
Ad

Recently uploaded (20)

PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPT
Total quality management ppt for engineering students
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Software Engineering and software moduleing
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
communication and presentation skills 01
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Total quality management ppt for engineering students
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
Fundamentals of safety and accident prevention -final (1).pptx
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
Current and future trends in Computer Vision.pptx
Software Engineering and software moduleing
Management Information system : MIS-e-Business Systems.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
communication and presentation skills 01
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Module 8- Technological and Communication Skills.pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Exploratory_Data_Analysis_Fundamentals.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...

COP : TARGET RECKS USING YOLOv8 Android

  • 1. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 1 COP : TARGET RECKS USING YOLOv8 Abhishek Mandavkar¹, Dishant Save², Yash Patil³, Prof. Janhavi Sangoi⁴ ¹ (Department of Computer Engineering, Mumbai University, Maharashtra, India) ² (Department of Computer Engineering, Mumbai University, Maharashtra, India) ³ (Department of Computer Engineering, Mumbai University, Maharashtra, India) ⁴ (Professor, Department of Computer Engineering, Mumbai University, Maharashtra, India) Abstract: With the increasing need for effective wildlife monitoring and conservation efforts, computer vision technologies have emerged as powerful tools for automating animal detection in diverse environments. This paper introduces an innovative framework for the detection of Indian exclusive animals—species found exclusively in India—employing the YOLOv8 (You Only Look One-level) object detection model. The proposed system is reinforced by a meticulously annotated dataset created through the Computer Vision Annotation Tool (CVAT), focusing specifically on the distinctive fauna inhabiting the Indian subcontinent. The YOLOv8 model, renowned for its speed and accuracy, is employed to detect animals in images and video frames. The YOLOv8 model is tailored to detect and classify indigenous animal species, ensuring its adaptability to the unique ecological contexts of India. By harnessing the real-time capabilities of YOLOv8, the system enables efficient and timely monitoring of exclusive wildlife populations, addressing the urgent need for accurate and scalable solutions in conservation efforts. The CVAT annotated dataset encapsulates a diverse array of Indian endemic species, encompassing various habitats and environmental conditions. The manual annotation process ensures precision in delineating bounding boxes around animals, contributing significantly to the enhancement of the model's detection accuracy for region-specific fauna. Addressing challenges such as diverse animal poses, complex backgrounds, and varying lighting conditions, our framework demonstrates its adaptability to the specific conditions prevalent in India. This work contributes to the growing body of research in wildlife conservation and monitoring, providing a scalable and accurate solution for automated animal detection. The proposed framework stands as a valuable tool for researchers, conservationists, and wildlife managers dedicated to safeguarding the unique biodiversity of India and its integral role in global ecological balance. Keywords- Android, Annotations, CVAT, Detection, Endemic species, YOLOv8. I. INTRODUCTION From self-driving cars and surveillance systems to augmented reality and healthcare, the ability to precisely identify and locate objects within an image or video stream has revolutionized various industries. Object detection, the process of identifying and locating objects within images or video frames, is a pivotal technology with applications ranging from autonomous vehicles to surveillance systems and beyond. Among the myriad object detection methods available, YOLO (You Only Look Once) stands out as a groundbreaking approach known for its speed and accuracy. In this comprehensive guide, we delve into the exciting world of custom object detection using YOLO. YOLO, first introduced in 2015 and improved in subsequent versions, offers real-time object detection capabilities. It is particularly renowned for its ability to simultaneously locate and classify objects within an image with remarkable speed. However, while the standard YOLO model is impressive, customizing it to detect specific objects tailored to your unique requirements is where its true potential shines. Establishing custom object detection with YOLO opens up a multitude of possibilities. Whether it's recognizing specialized industrial components, unique species in ecological research, or custom products in the retail sector, this project demonstrates the versatility and potential of YOLO to adapt and excel in diverse contexts. This endeavor has delved into the intricacies of data preparation, model training, and fine-tuning to empower YOLO to identify custom objects. The journey of establishing custom object detection using YOLO is one marked by technical challenges, innovative solutions, and a commitment to pushing the boundaries of computer vision.In this report, we will share the methodologies, insights, and outcomes of our journey in establishing custom object detection using YOLO. Our success underscores the transformative power of this technology, providing solutions for various industries and serving as a testament to the ever-evolving landscape of artificial intelligence and
  • 2. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 2 computer vision. This project has encompassed every facet of custom object detection, from the meticulous curation and preparation of datasets to the fine-tuning of YOLO's neural networks. The outcomes of this endeavour not only provide practical solutions but also serve as a testament to the boundless possibilities of modern artificial intelligence. This project will not only inspire further exploration but also empower others to embark on their own journeys in the ever-evolving field of computer vision. II. REVIEW OF LITERATURE SURVEY Nithin Kumar ,Nagarathna and Francesco Flammini [1], A zoology subfield called entomology covers nonentity- related exploration. We discovered that a further thorough disquisition is needed to identify the species position of insects due to the vast number of dangerous nonentity populations. Entomology exploration is pivotal because it opens new avenues and benefits for chemistry, drug, engineering, and the medicinal inspired by insects and nature. Insects rob and annihilate a third of the world’s crops, performing in the loss of multitudinous products, and businesses suffer losses. Quick and accurate identification of insects is essential to avoid fiscal losses and progress the study of entomology. Scientists are also inspired by insects while developing robotics, detectors, mechanical structures, aerodynamics, and intelligent systems. These factors make scholarly exploration on nonentity discovery pivotal for demonstrating biodiversity. Relating the order position an nonentity belongs to is vital in determining insects. Knowing the order position is necessary to separate the type. Nonentity orders dating back to 2002 were discovered. Sergejs Kodors, Edgars Rubauskis, Marks Sondors, lmars Apeinans, unars Lacis, Imants Zarembo [2], Pears are the third most economically important fruit crop encyclopaedically reaching25.7 million tons in 2021. Although it isn’t the most important fruit crop in Latvia, it forms a veritably important niche product with high added value, and the area of pear growing is about 200. Timely and accurate vaticinator of fruit yield is also of great profitable significance to optimally plan post-harvest conditioning, storehouse installations and deals. Developing similar yield soothsaying systems has been going on for a long time for colourful fruit factory species. Still, the disadvantage of all these soothsaying systems is the need for high- quality, large- scale data because the delicacy of the developed model and the correspondence of the read and real gathered yield depend on it. Tausif Diwan1 & G. Anirudh2 & Jitendra V. Tembhurne1 [3], provides a comprehensive exploration of the You Only Look Once (YOLO) object detection framework. The paper covers various aspects related to YOLO, including challenges in object detection, architectural advancements, datasets commonly used for evaluation, and practical applications. It addresses the inherent challenges in object detection tasks, such as handling objects of varying sizes, dealing with class imbalance, and addressing the need for large annotated datasets. The paper explores the evolution and improvements made in YOLO's architecture and its successors. These datasets are essential for benchmarking and validating the effectiveness of detection algorithms. The paper likely explores real-world applications where YOLO and related models are employed for object detection. The authors likely employed key components of the deep learning tech stack. Overall, the paper offers valuable insights into the challenges, advancements, datasets, and real-world applications associated with YOLO-based object detection, making it a valuable resource for researchers and practitioners in the field of computer vision. Sita M. Yadav, Sandeep M. Chaware [4], it explores video object detection techniques, including both traditional computer vision methods and deep learning approaches. The paper likely discusses various methods and algorithms used for detecting objects within videos. It may compare the effectiveness of traditional computer vision techniques, which include methods like feature extraction and tracking, with modern deep learning methods, such as convolutional neural networks (CNNs) and their variants, for video object detection tasks. Overall, this paper likely provides insights into the evolving landscape of video object detection, highlighting the advantages and limitations of both traditional and deep learning-based approaches in the context of this critical computer vision task. It may also discuss applications and challenges associated with video object detection. Zhuo Bian, And LiangliangWang. [5] Frame difference is a quick diversity grounded segmentation approach for object discovery, unfortunately, it gets trapped in over-segmented when the pixels of interest over time lap each other. This paper presents a rather fast visual object discovery approach able of approaching the position of moving object under heavy background noise or big imbrication caused by negative similarity. Object discovery is a introductory but critical content confederated nearly with image segmentation, object shadowing and recognition in computer vision, all of which have considerable implicit demand in. the field of videotape surveillance, mortal- computer commerce, virtual reality, robotics, intelligent transportation system and others. It’s no wonder that a large number of attempts have been made to exploit what features can be used to represent the interest area from noninterest corridor which plays the crucial part, despite central challenges from script complexity, scale variation, occlusion, illumination changing and others.
  • 3. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 3 Mohammed Boukabous, Mostafa Azizi [6], this journal presents an innovative approach to crime prediction through the integration of object detection and deep learning techniques. The research is conducted at the Mathematics, Signal and Image Processing, and Computing Research Laboratory (MATSI) at the Superior School of Technology (ESTO), Mohammed First University in Oujda, Morocco. In this study, the authors leverage the power of computer vision and deep learning to develop a system capable of predicting crimes based on image and video data. This research holds the potential to significantly contribute to the field of law enforcement and public safety by automating the process of crime prediction through advanced technology. The paper's findings and methodologies could prove valuable for both researchers and practitioners working on enhancing crime prevention strategies. Thomas Tsao and Doris Y. Tsao[7] Through a process of perceptual association that’s still not well understood, the primate visual system transforms visual input conforming of a sluice of retinal images into a percept of stable, separate objects. This process has traditionally been broken down into two separate problems the segmentation problem, which addresses how visual pixels can be grouped into distinct objects within a single image and the shadowing problem, which addresses how objects can be linked across images despite changing appearance. Both problems are largely gruelling. In this paper, we explore the computational origin of the capability to member and invariantly track objects and show that this problem can in principle be answered without literacy, supervised or unsupervised. Completing image grounded approaches to segmentation and shadowing, a figure grounded approach considers vision as an inverse plates problem. In this paper, we show that the problem of inferring 3D shells from images is in fact completely constrained, if the input is in the form of a sequence of images of a scene in which either the bystander or objects are moving. Mingqi Gao, Feng Zheng, James J. Q. Yu, Caifeng Shan, Guiguang Ding ,Jungong Han [8] As one of the abecedarian problems in the Feld of videotape understanding, videotape object segmentation aims at segmenting objects of interest throughout the given videotape sequence. Lately, with the advancements of deep literacy ways, deep neural networks have shown outstanding performance advancements in numerous computer vision operations, with videotape object segmentation being one of the most supported and intensely delved. Latterly, we summarise the datasets for training and testing a videotape object segmentation algorithm, as well as common challenges and evaluation criteria. Next, former workshop are grouped and reviewed grounded on how they prize and use spatial and temporal features, where their infrastructures, benefactions and the differences among each other are developed. This composition is anticipated to serve as a tutorial and source of reference for learners intended to snappily grasp the current progress in this exploration area and interpreters interested in applying the videotape object segmentation styles to their problems. Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah [9], this journal presents an innovative approach to self-supervised video object segmentation. The research addresses the growing need for automated video object segmentation, a fundamental task in computer vision with applications ranging from video editing to autonomous navigation. In this paper, the authors propose a novel method that leverages self-supervised learning techniques to achieve accurate and robust video object segmentation. This innovative approach allows the model to learn from unlabeled video data, reducing the need for extensive manual annotations, which is a critical advantage in practical applications. The proposed method is extensively evaluated on various benchmark datasets, demonstrating impressive results in terms of video object segmentation accuracy and generalization across diverse scenarios. The authors also highlight the potential real-world applications of their approach, emphasizing its relevance in fields such as video editing, robotics, and surveillance. Yadang Chen, Duolin Wang, Zhiguo Chen, Zhi-Xin Yang, and Enhua Wu [10], videotape object segmentation, which aims to draw a detailed object mask on videotape frames, is extensively applicable to colourful fields similar as autopilots, videotape editing, and videotape conflation. Propagation- grounded styles use the target’s temporal consonance, and calculate on the mask from former frames. For illustration, Mask Track combines the segmentation mask of the former frame with the current frame to form the mask of the current frame. Still, these styles suffer from occlusion problems and error drift. Matching- grounded styles uses the first frame of a given videotape as a reference frame and descry the segmented object singly in each frame. These styles are more robust and reduce the impact of occlusion, but don’t take full advantage of spatiotemporal information. Consequently, the performance and delicacy of some mongrel algorithms are bettered on the former two classes. Yue Wu HKUST, Rongrong Gao HKUST, Jaesik Park POSTECH, Qifeng Chen HKUST [11], An artificial intelligence system prognosticate a phot ore aplastic videotape conditioned on once visual observation? With an accurate videotape vaticinator model, an intelligent agent can plan its stir according to the prognosticated videotape. Unborn video generation ways can also be used to synthesize a long videotape by constantly extending
  • 4. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 4 the future of the videotape. Utmost being styles attack the videotape vaticinator task by generating unborn videotape frames one by one in an unsupervised fashion. These approaches synthesize unborn frames at the pixel position without unequivocal modelling of the movements or semantics of the scene. Therefore, it’s difficult for the model to grasp the conception of object boundaries to produce different movements for different objects. Haidi Zhu, Haoran Wei, Baoqing Li, Xiaobing Yuan and Nasser Kehtarnavaz [12], Videotape object discovery involves detecting objects using videotape data as compared to conventional object discovery using static images. Two operations that have played a major part in the growth of videotape object discovery are independent driving and videotape surveillance. In 2015, videotape object discovery came a new task of the Image Net Large Scale Visual Recognition Challenge (ILSVRC2015). In general, object discovery approaches can be grouped into two major orders one- stage sensors and two- stage sensors. One- stage sensors are frequently more computationally effective than two- stage sensors. Still, two- stage sensors are shown to produce advanced rigor compared to one- stage sensors. Still, using object discovery on each image frame doesn’t take into consideration the following attributes in videotape data Since there live both spatial and temporal correlations between image frames, there are point birth redundancies between conterminous frames. Detecting objects from poor quality frames leads to low rigor. Videotape object discovery approaches attempt to address the above challenges. Some approaches make use of the spatial-temporal information to ameliorate delicacy, similar as fusing features on different situations. III. ANALYSIS TABLE Table 1 Analysis Table Title Summary Advantages Open Challenges YOLO-Based Light-Weight Deep Learning Models for Insect Detection System with Field Adaption [1] The most inconceivable diversity, cornucopia, spread, and rigidity in biology are set up in insects. The foundation of nonentity study and pest operation is nonentity recognition. Still, utmost of the current nonentity recognition exploration depends on a small number of nonentity taxonomic experts. We can use computers to separate insects directly rather of professionals because of the quick advancement of computer technology. The deep feature extraction function makes it perform well in image, audio, and text data. Easy to update data through back propagation. Different architectures are suitable for different problems. The hidden layer reduces the dependence of algorithm on feature engineering. One of the main challenges of deep learning is the need for large amounts of data and computational resources. Neural networks learn from data by adjusting their parameters to minimize a loss function, which measures how well they fit the data. Rapid Prototyping of Pear Detection Neural Network with YOLO Architecture in Photographs [2] Fruit yield estimation and soothsaying are essential processes for data- grounded decision- making in agribusiness to optimise fruit- growing and selling operations. The yield soothsaying is grounded on the operation of literal data, which was collected in the result of periodic yield estimation. One of the main advantages of YOLO v7 is its speed. It can process images at a rate of 155 frames per second, much faster than other state-of- the-art object detection algorithms. Even the original baseline YOLO model was capable of processing at a maximum rate of 45 frames per second. YOLO may struggle with small or overlapping objects, as it can only predict a fixed number of boxes per cell. It may also miss some objects that do not fit well into the grid, or produce false positives for background regions. Object detection using YOLO: challenges, The paper addresses key challenges in object detection, such as handling objects of The paper likely provides a comprehensive overview of object Some open challenges in this context include addressing object
  • 5. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 5 architectural successors, datasets and applications [3] different sizes, dealing with class imbalances, and the need for extensive annotated datasets. It explores the evolution of YOLO's architecture and how subsequent versions have enhanced object detection capabilities. detection using the YOLO (You Only Look Once) framework. It may cover the challenges faced in object detection tasks, the evolution of YOLO architecture, available datasets, and real-world applications. detection difficulties like handling diverse object sizes and class imbalance, as well as the need for large annotated datasets. Video Object Detection through Traditional and Deep Learning Methods [4] The paper explores video object detection techniques, covering traditional computer vision methods and deep learning approaches. The paper provides insights into the evolving landscape of video object detection, highlighting the pros and cons of both approaches and discussing practical applications and challenges in this critical field of computer vision. It delivers a comprehensive overview of video object detection techniques, encompassing both traditional computer vision methods and modern deep learning approaches. This breadth of coverage enables readers to develop a holistic understanding of the field and make informed decisions about methodology selection. Video object detection presents several challenges in achieving real-time performance, maintaining object tracking consistency amid occlusions and motion, handling scale and viewpoint variations, and detecting multiple objects simultaneously. Detecting Moving Object via Projection of Forward Backward Frame Difference [5] - Introduce the importance of moving object detection in computer vision and video analysis. Explain the specific approach of "Projection of Forward-Backward Frame Difference" and its significance. Provide an overview of significant research papers, projects, or case studies that have contributed to the development and understanding of this specific technique. Cite and summarize these influential sources. Discuss the challenges and limitations associated with the "Projection of Forward-Backward Frame Difference" method, such as handling complex scenes or dynamic lighting conditions. Image and video- based crime prediction using object detection and deep learning [6] By harnessing computer vision and state-of-the-art object detection methods, the authors aim to automatically predict crimes from image and video data. Researchers and practitioners in the field can benefit from the findings and methodologies presented in this paper. Firstly, it represents an innovative and advanced approach to crime prediction, harnessing the capabilities of computer vision and deep learning algorithms. Secondly, the research conducted at MATSI in Morocco demonstrates the potential to automate the crime prediction process using image and video data, which can significantly improve the efficiency and accuracy Ensuring the accuracy and reliability of object detection in complex real- world environments, where lighting, weather, and object occlusions can affect results, remains a significant hurdle. Second, handling privacy and ethical concerns related to the use of image and video data in crime prediction systems is essential. Third, the scalability and real-time processing capabilities of such systems must be
  • 6. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 6 of law enforcement efforts. addressed to make them practical for law enforcement agencies. A topological solution to object segmentation and tracking. [7] The world is composed of objects, the ground, and the sky. Visual perception of objects requires working two abecedarian challenges segmenting visual input into separate units and tracking individualities of these units despite appearance changes due to object distortion, changing perspective, and dynamic occlusion. Topological solutions for object segmentation and tracking offer several advantages, which make them a promising approach in the field of computer vision and image analysis. Address the current challenges in topological object segmentation and tracking. Suggest potential future directions for research in this field, such as combining TDA with deep learning techniques or developing real-time topological solutions. Deep learning for video object segmentation: a review. [8] - Introduce the significance of video object segmentation and its applications in computer vision, robotics, and autonomous systems. Explain the importance of deep learning techniques in advancing video object segmentation. Provide an overview of the foundational concepts of deep learning, including neural networks and convolutional neural networks (CNNs). Deep learning for video object segmentation offers several advantages, making it a prominent approach in computer vision and video analysis. They can learn complex features and temporal dependencies in video sequences, leading to precise segmentation results. Describe commonly used datasets for video object segmentation, such as DAVIS, SegTrack, and YouTube-Objects. Highlight benchmark challenges that have driven the development of deep learning algorithms in this domain. Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging [9] This journal paper introduces a promising self-supervised approach for video object segmentation, offering a valuable contribution to the field of computer vision. The combination of cutout prediction, tagging, and a novel loss function enhances the model's ability to segment objects accurately in videos, opening up opportunities for more efficient and automated video analysis in various applications. By integrating cutout prediction and tagging mechanisms, the method enhances object boundary understanding and tracking performance, making it robust across diverse video scenarios. These innovations hold the potential to significantly improve video analysis applications, including video editing, robotics, and surveillance, by automating and enhancing object segmentation tasks. Achieving a balance between segmentation accuracy and computational efficiency is essential, particularly in robotics and autonomous systems demanding low- latency processing. Scaling the approach to handle extensive video datasets and deploying it in resource-constrained environments poses challenges in data management, model optimization, and hardware constraints. Global video object segmentation with spatial constraint Module [10] In this paper we present a feather light and effective semi- supervised videotape object segmentation network grounded on the space- time CNN extracts features automatically. Single feature extraction by CNN. The object detection rate is high. Overall, our solution is efficient and compatible, and we hope it will set a strong baseline for other real-time video object
  • 7. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 7 memory frame. The algorithm uses a global environment( GC) module to achieve high- performance, real- time segmentation. Adding feature pyramid model, and good for small object detection. Based on Google Net, fast in speed. segmentation solutions in the future. Future Video Synthesis with Object Motion Prediction [11] We present an approach to prognosticate unborn videotape frames given a sequence of nonstop videotape frames in the history. With this procedure, our system exhibits much lower tearing or deformation artefact compared to other approaches. They can be used in interior and exterior passage areas. While they are particularly useful in spaces with many people, they can also be used in certain areas of homes or residential areas such as entryways or gardens. For future frame prediction. Our method produces future Frames by firstly decomposing possible moving objects into Currently-moving or static objects. A Review of Video Object Detection: Datasets, Metrics and Methods [12] Although there are well established object discovery styles grounded on stationary images, their operation to videotape data on a frame by frame base faces two failings lack of computational effectiveness due to redundancy across image frames or by not using a temporal and spatial correlation of features across image frames, and lack of robustness to real- world conditions similar as stir blur and occlusion. It allows us to be more efficient and to focus on some other task while the machine works on its own. Object Detection is a key task of Computer Vision. Thanks to it, the machine is able to locate, identify and classify an object in an image. Challenges still remain for further improving the accuracy and speed of the video object detection methods. This section presents the major challenges and possible future trends as related to video object detection. IV. CONCLUSION The YOLOv8-CVAT framework, encapsulated within an Android application, stands as a beacon of technological innovation with successful implementation of YOLOv8-CVAT within an Android application propelled the boundaries of wildlife monitoring but has also laid the foundation for a transformative paradigm in conservation efforts [8]. Initially tailored for Indian exclusive species, reveals its potential for cross-domain application by seamlessly integrating with various other datasets. This work sparks further collaborations, adaptations, and innovations, leading to a future where advanced technology becomes an integral ally in the ceaseless effort to protect and preserve the diverse and unique wildlife that graces our planet. REFERENCES [1] Kumar, N.; Nagarathna; Flammini, F. YOLO-Based Light-Weight Deep Learning Models for Insect Detection System with Field Adaption. Agriculture 2023, 13, 741. https://doi.org/10.3390/ agriculture13030741 [2] Sergejs Kodors, Marks Sondors, Gunārs Lācis, Edgars Rubauskis, Ilmārs Apeināns, Imants Zarembo, Environment. Technology. Resources. Rezekne, Latvia Proceedings of the 14th International Scientific and Practical Conference. Volume 1, 81-85, 2023 [3] Tausif Diwan1 & G. Anirudh2 & Jitendra V. Tembhurne1, “Object detection using YOLO: challenges, architectural successors, datasets and applications”, Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India and Department of Data science and analytics, Central University of Rajasthan, Jaipur, Rajasthan, India, 2022
  • 8. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 6 (2023) ISSN(Online): 2581-7280 Article No. 20 PP 01-08 www.viva-technology.org/New/IJRI 8 [4] Sita M. Yadav, Sandeep M. Chaware, “Video Object Detection through Traditional and Deep Learning Methods”, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume-9 Issue-4, April, 2020 [5] Zhuo Bian1 and LiangliangWang2, “Detecting Moving Object via Projection of Forward-Backward Frame Difference”, International Journal of Future Generation Communication and Networking Vol. 9, No. 5 (2016), pp. 143-152 http://dx.doi.org/10.14257/ijfgcn.2016.9.5.14 [6] Mohammed Boukabous, Mostafa Azizi, “Image and video-based crime prediction using object detection and deep learning”, Bulletin of Electrical Engineering and Informatics Vol. 12, No. 3, June 2023, pp. 1630~1638 [7] Thomas Tsaoa and Doris Y. Tsao, “A topological solution to object segmentation and tracking”, National Academy of Sciences elected in, 2020 [8] Mingqi Gao1,2 · Feng Zheng2 · James J. Q. Yu2 · Caifeng Shan3 · Guiguang Ding4 · Jungong Han, “Deep learning for video object segmentation: a review”, Artifcial Intelligence Review (2023) 56:457–531, April 2022 [9] Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah, “Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging”, MBZ University of Artificial Intelligence UAE, 22 April 2022 [10] Yadang Chen1 , Duolin Wang1 (), Zhiguo Chen1 , Zhi-Xin Yang2 , and Enhua Wu, “Global video object segmentation with spatial constraint module”, Computational Visual Media https://doi.org/10.1007/s41095-022-0282-8, 2022 [11] Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen, “Future Video Synthesis with Object Motion Prediction”, arXiv:2004.00542v2 [cs.CV] 15 Apr 2020 [12] Haidi Zhu 1,2, Haoran Wei 3 , Baoqing Li 1,* , Xiaobing Yuan 1 and Nasser Kehtarnavaz, “A Review of Video Object Detection: Datasets, Metrics and Methods”, Appl. Sci. 2020, 10, 7834