SlideShare a Scribd company logo
MONTHLY REPORT
GENERATIVE ADVERSARIAL DOMAIN
ADAPTATION FOR OBJECT DETECTION
Period: April 2024
Report Date: May 1st 2024
HIGHLIGHTS
METHOD
OVERVIEW
PRIOR
COMPARISON
LIMITING PROMPT
PRE-LABELING
FUTURE TASKS
Descriptions of the proposed method and how it relates to the
“limiting prompt” as part of method.
Descriptions of contributions and what technical differences
against the prior arts.
A part of the proposed method that is contributing as a teacher
model or pre-labeling tool as a zero-shot object detector that
combines the state of arts of object detector and natural
language processing (NLP).
Descriptions of pending action items based on the scheduled
material as well as other action item(s) that may arise as part
of the discussion.
2024 Monthly Report 2
[METHOD OVERVIEW]
PURPOSE
• A method that uses generative network to
translate the environment of an image and
add certain object(s) based on bounding box-
based prompts to limited the location and
types of added objects.
• An object detector that could detect a target
domain with <20% of labeled target domain
dataset, 100% labeled source domain, and
corresponding generative dataset with
comparable accuracy rate to using 100% of
labeled target domain dataset.
2024 Monthly Report 3
[METHOD OVERVIEW]
PROPOSED
METHOD
2024 Monthly Report 4
• The proposed method uses a generative network to generate an image based on the target
domain style but the content of the image is coming from a prompt that is limited based on the
label of source domain; thus eliminating the requirement of manual label.
• The generated target domain, 20% target domain, and 100% source domain are trained to the
object detector and benchmarked against the 100% target domain based on target test.
PRIOR COMPARISONS
2024 Monthly Report 5
Attributes Proposed Method
Fine-grained Feature
Imitation [1]
Teacher-student Network
[2]
Label Smoothing
Regularization [3]
Training Paradigm
KD with self-supervised
teacher.
KD with fine-grained
feature imitation.
KD with supervised
teacher.
KD with smoothing
regularization.
Dataset
Combination of real and
synthetic dataset.
Real dataset Real dataset Real dataset
Teacher Model Unsupervised Supervised Weakly-Supervised Supervised
Incremental Learning
Adaptability
Yes No Yes No
[1] T. Wang et al, Distilling object detectors with fine-grained feature imitation, CVPR, 2019. https://arxiv.org/abs/1906.03609
[2] A. Banitalebi-Dehkordi, Revisiting knowledge distillation for object detection, 2021. https://arxiv.org/pdf/2105.10633.pdf
[3] L. Yuan et al, Revisiting knowledge distillation via label smoothing regularization, 2020. https://arxiv.org/abs/1909.11723
LIMITING
PROMPT PRE-
LABELING
2024 Monthly Report 6
A self-supervised learning framework on a transformer-based
architecture with zero-shot detector that combines DETR [4] for
visual cue and GLIP [5] for text cue.
[4] N. Carion et al. End-to-end object detection with transformers. European conference on computer
vision, pages 213–229. Springer, 2020. https://arxiv.org/pdf/2203.03605.pdf
[5] L.H. Li et al. Grounded language-image pre-training. ArXiv. 2022. https://arxiv.org/pdf/2112.03857.pdf
2024 Monthly Report 7
[LIMITING PROMPT PRE-LABELING]
WHAT IS A ZERO-SHOT OBJECT DETECTION?
Suppose we’d like to train an object detector for a specific
object classes: pedestrian, traffic sign, and car; then there
would be a list of actions to do prior training:
• data collection (10%),
• data preparation (5%),
• data annotation (70%),
• data augmentation (10%), and
• data preprocessing (5%).
A zero-shot object detection is a powerful tool that mainly
consist of a large detector architecture (thus inefficient)
with a capability to detect a vast range of classes.
Sample of a zero-shot object detector [6]
[6] H. Zhang et al. DETR with improved denoising anchor boxes for end-to-end
object detection, 2022. https://arxiv.org/abs/2203.03605
2024 Monthly Report 8
[LIMITING PROMPT PRE-LABELING]
WHY CAN’T WE USE ANY PRE-TRAINED MODEL?
• Any pre-trained model here refers to conservative
object detector that had been trained with a certain
dataset for example: MS-COCO [7].
• Although such detector may provide high accuracy
results, it lacks the robustness of network in terms of
detecting vast range of classes.
• For example, in MS-COCO, there is no “traffic sign”
classes [7, 8], which means that the model would be
unable to facilitate labeling for this class or any other
class at will.
Sample of a detector on MS-COCO
[7] T.Y. Lin et al. Microsoft COCO: Common objects in context, 2014.
https://arxiv.org/abs/1405.0312
[8] https://github.com/matlab-deep-learning/Object-Detection-Using-
Pretrained-YOLO-v2/blob/main/+helper/coco-classes.txt
2024 Monthly Report 9
[LIMITING PROMPT PRE-LABELING]
ZERO-SHOT DETECTOR AS PRE-LABELER
• Transformer-based detector DETR
[4] with grounded (i.e. limited) pre-
training.
• It has the capability to detect
arbitrary objects with text-based
prompts such as class names or
referring expressions.
• Extensive network with two
network DETR as the detector and
GLIP as the NLP processor; with
both construct a zero-shot detector
with vast classes [5].
[4] H. Zhang et al. DETR with improved denoising anchor boxes for end-to-end object detection, 2022. https://arxiv.org/abs/2203.03605
[5] S. Liu et al. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection, 2023. https://arxiv.org/pdf/2303.05499.pdf
2024 Monthly Report 10
[LIMITING PROMPT PRE-LABELING]
HOW DOES IT WORK? (1/5)
• It’s an end-to-end architecture which contains a backbone, a transformer encoder-decoder, and multiple prediction heads.
• The backbone works as a convolutional network would do that is to extract features from the input image; specifically high-
level features that associated with spatial information and hierarchical features.
• The transformer encoder processes feature maps through layers of transformer that consists of patch-wise multi-head self-
attention mechanisms and feed-forward neural networks to capture contextual information between each patch of features.
• The transformer decoder processes the feature vectors from encoder through layers of transformer that also consists of patch-
wise multi-head self-attention and feed-forward neural networks to generate set of object queries or predictions.
• Loss function is based on bipartite matching loss that sums localization and classification losses.
2024 Monthly Report 11
[LIMITING PROMPT PRE-LABELING]
HOW DOES IT WORK? (2/5)
• During bounding box matching (prior to loss function calculation), it is common to have incorrect matches due to constraint of
a fixed IoU threshold that may induce erroneous loss value.
• DETR uses a contrastive denoised training, that basically treat the predictions and ground truths as corresponding attention
masks. Therefore instead of matching is done on bounding box level, matching is done on attention mask level (or denoised
level). This is supposed to solve the error nous loss value.
2024 Monthly Report 12
[LIMITING PROMPT PRE-LABELING]
HOW DOES IT WORK? (3/5)
• This method improves the utilization of DETR from an ordinary detector
into a zero-shot detector by combining it with a parallel stream of GLIP
or an NLP model.
• Visual stream is processed by DETR. Text stream is processed by GLIP.
• Essentially, text stream is going to be used as a limiting prompt that
would provide cues to object detector in regressing visual stream for a
particular class of interest.
2024 Monthly Report 13
[LIMITING PROMPT PRE-LABELING]
HOW DOES IT WORK? (4/5)
• Feature enhancer is a bridge between visual and text stream
features.
• To unify these features, it uses multiple feature enhancer layers
such as deformable self-attention layer for enhancing features
from visual stream and regular self-attention layer for enhancing
features from text stream.
• Deformable self-attention works to enhance visual features by
allowing the model to focus on specific regions of interest within the
images.
• Regular self-attention mechanisms are applied to text features,
enabling the model to capture the semantic relationships and context
within the textual data.
2024 Monthly Report 14
[LIMITING PROMPT PRE-LABELING]
HOW DOES IT WORK? (5/5)
• A cross-modality decoder is then used to integrate text and
image modality features.
• The cross-modality decoder operates by processing the fused
features and decoder queries through a series of attention
layers and feed-forward networks.
• These layers allow the decoder to effectively capture the
relationships between the visual and textual information,
enabling it to refine the object detections and assign
appropriate labels.
• After this step, the model proceeds with the final steps in the
object detection including bounding box prediction, class
specific confidence filtering and label assignment.
2024 Monthly Report 15
To test the performance of limiting prompt pre-labeling, the following sample image was used. Notice that the text prompt,
helps the detector to predict which objects based on the semantics and contexts of the text.
[LIMITING PROMPT PRE-LABELING]
EXPERIMENTAL RESULTS (1/2)
2024 Monthly Report 16
[LIMITING PROMPT PRE-LABELING]
EXPERIMENTAL RESULTS (2/2)
[FUTURE TASKS]
PIPELINE SCHEDULE
PRELI MI N A R Y STUDIE S
AND PROP OS A L
LIMITI N G
PROM P T
2024 FEB APR JUN AUG OCT DEC
2024 JAN MAR MAY JUL SEP NOV
DATASE T
BENCHMA RK
2024 Monthly Report 17
GENERAT IVE
NETWORK DESIGN
OBJECT DETECTOR
D ESIGN
MANUSC R I P T
DRAFT
FEED BACK AND
REFINE ME N T
ARXIV
SUBMIS S I ON &
HANDOVE R
DAC202 5
SUBMIS S I ON
2024 Monthly Report 18
[FUTURE TASKS]
ACTION ITEMS
• Generative adversarial network that learns BDD-100K
dataset (U.S. dataset) with clear (1st domain) and snowy (2nd
domain).
• Inference said generative adversarial network on IDD dataset
(India dataset) to produce snowy domain.
• Use limited labels from pre-label tool to make sure that the
generative adversarial network does not convert any objects or
class of interest but only converts the background without
generating any false objects as well.

More Related Content

PDF
Visual Transformers
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
PDF
Unsupervised semi-supervised object detection
PDF
End-to-End Object Detection with Transformers
PDF
IISc Internship Report
PDF
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
PDF
Deep Learning for X ray Image to Text Generation
PDF
PR-284: End-to-End Object Detection with Transformers(DETR)
Visual Transformers
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
Unsupervised semi-supervised object detection
End-to-End Object Detection with Transformers
IISc Internship Report
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Deep Learning for X ray Image to Text Generation
PR-284: End-to-End Object Detection with Transformers(DETR)

Similar to GAN Report 1 Monthly Report Generative Adversarial Part2 (20)

PDF
2019 cvpr paper_overview
PDF
2019 cvpr paper overview by Ho Seong Lee
PDF
社内勉強会資料_Object Recognition as Next Token Prediction
PDF
IRJET- Displaying and Capturing Profile using Object Detection YOLO and Deepl...
PDF
GAN Report 1 Monthly Report Generative Adversarial Part1
PPTX
Computer vision-nit-silchar-hackathon
PDF
Learning Visual Representations from Uncurated Data
PDF
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
PDF
Machine learning based augmented reality for improved learning application th...
PDF
s41598-023-28094-1.pdf
PPTX
Digest of Human Detection from CVPR2015
PDF
Detection of Dense, Overlapping, Geometric Objects
PDF
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
PDF
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
PDF
Object Detection An Overview
PDF
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
PDF
Text and Object Recognition using Deep Learning for Visually Impaired People
PDF
[DSC Europe 24] Nemanja Milosevic - Beyond Supervised Learning with Zero-Shot...
PPTX
Lecture 5,6(Comp Vision) [Auto-saved].pptx
PPTX
Computer Vision Landscape : Present and Future
2019 cvpr paper_overview
2019 cvpr paper overview by Ho Seong Lee
社内勉強会資料_Object Recognition as Next Token Prediction
IRJET- Displaying and Capturing Profile using Object Detection YOLO and Deepl...
GAN Report 1 Monthly Report Generative Adversarial Part1
Computer vision-nit-silchar-hackathon
Learning Visual Representations from Uncurated Data
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Machine learning based augmented reality for improved learning application th...
s41598-023-28094-1.pdf
Digest of Human Detection from CVPR2015
Detection of Dense, Overlapping, Geometric Objects
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
Object Detection An Overview
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
Text and Object Recognition using Deep Learning for Visually Impaired People
[DSC Europe 24] Nemanja Milosevic - Beyond Supervised Learning with Zero-Shot...
Lecture 5,6(Comp Vision) [Auto-saved].pptx
Computer Vision Landscape : Present and Future
Ad

Recently uploaded (20)

PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
composite construction of structures.pdf
PDF
Well-logging-methods_new................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
DOCX
573137875-Attendance-Management-System-original
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Geodesy 1.pptx...............................................
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPT
Project quality management in manufacturing
PDF
Digital Logic Computer Design lecture notes
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
UNIT 4 Total Quality Management .pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
composite construction of structures.pdf
Well-logging-methods_new................
CYBER-CRIMES AND SECURITY A guide to understanding
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
573137875-Attendance-Management-System-original
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Geodesy 1.pptx...............................................
Model Code of Practice - Construction Work - 21102022 .pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Project quality management in manufacturing
Digital Logic Computer Design lecture notes
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Foundation to blockchain - A guide to Blockchain Tech
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
UNIT 4 Total Quality Management .pptx
Ad

GAN Report 1 Monthly Report Generative Adversarial Part2

  • 1. MONTHLY REPORT GENERATIVE ADVERSARIAL DOMAIN ADAPTATION FOR OBJECT DETECTION Period: April 2024 Report Date: May 1st 2024
  • 2. HIGHLIGHTS METHOD OVERVIEW PRIOR COMPARISON LIMITING PROMPT PRE-LABELING FUTURE TASKS Descriptions of the proposed method and how it relates to the “limiting prompt” as part of method. Descriptions of contributions and what technical differences against the prior arts. A part of the proposed method that is contributing as a teacher model or pre-labeling tool as a zero-shot object detector that combines the state of arts of object detector and natural language processing (NLP). Descriptions of pending action items based on the scheduled material as well as other action item(s) that may arise as part of the discussion. 2024 Monthly Report 2
  • 3. [METHOD OVERVIEW] PURPOSE • A method that uses generative network to translate the environment of an image and add certain object(s) based on bounding box- based prompts to limited the location and types of added objects. • An object detector that could detect a target domain with <20% of labeled target domain dataset, 100% labeled source domain, and corresponding generative dataset with comparable accuracy rate to using 100% of labeled target domain dataset. 2024 Monthly Report 3
  • 4. [METHOD OVERVIEW] PROPOSED METHOD 2024 Monthly Report 4 • The proposed method uses a generative network to generate an image based on the target domain style but the content of the image is coming from a prompt that is limited based on the label of source domain; thus eliminating the requirement of manual label. • The generated target domain, 20% target domain, and 100% source domain are trained to the object detector and benchmarked against the 100% target domain based on target test.
  • 5. PRIOR COMPARISONS 2024 Monthly Report 5 Attributes Proposed Method Fine-grained Feature Imitation [1] Teacher-student Network [2] Label Smoothing Regularization [3] Training Paradigm KD with self-supervised teacher. KD with fine-grained feature imitation. KD with supervised teacher. KD with smoothing regularization. Dataset Combination of real and synthetic dataset. Real dataset Real dataset Real dataset Teacher Model Unsupervised Supervised Weakly-Supervised Supervised Incremental Learning Adaptability Yes No Yes No [1] T. Wang et al, Distilling object detectors with fine-grained feature imitation, CVPR, 2019. https://arxiv.org/abs/1906.03609 [2] A. Banitalebi-Dehkordi, Revisiting knowledge distillation for object detection, 2021. https://arxiv.org/pdf/2105.10633.pdf [3] L. Yuan et al, Revisiting knowledge distillation via label smoothing regularization, 2020. https://arxiv.org/abs/1909.11723
  • 6. LIMITING PROMPT PRE- LABELING 2024 Monthly Report 6 A self-supervised learning framework on a transformer-based architecture with zero-shot detector that combines DETR [4] for visual cue and GLIP [5] for text cue. [4] N. Carion et al. End-to-end object detection with transformers. European conference on computer vision, pages 213–229. Springer, 2020. https://arxiv.org/pdf/2203.03605.pdf [5] L.H. Li et al. Grounded language-image pre-training. ArXiv. 2022. https://arxiv.org/pdf/2112.03857.pdf
  • 7. 2024 Monthly Report 7 [LIMITING PROMPT PRE-LABELING] WHAT IS A ZERO-SHOT OBJECT DETECTION? Suppose we’d like to train an object detector for a specific object classes: pedestrian, traffic sign, and car; then there would be a list of actions to do prior training: • data collection (10%), • data preparation (5%), • data annotation (70%), • data augmentation (10%), and • data preprocessing (5%). A zero-shot object detection is a powerful tool that mainly consist of a large detector architecture (thus inefficient) with a capability to detect a vast range of classes. Sample of a zero-shot object detector [6] [6] H. Zhang et al. DETR with improved denoising anchor boxes for end-to-end object detection, 2022. https://arxiv.org/abs/2203.03605
  • 8. 2024 Monthly Report 8 [LIMITING PROMPT PRE-LABELING] WHY CAN’T WE USE ANY PRE-TRAINED MODEL? • Any pre-trained model here refers to conservative object detector that had been trained with a certain dataset for example: MS-COCO [7]. • Although such detector may provide high accuracy results, it lacks the robustness of network in terms of detecting vast range of classes. • For example, in MS-COCO, there is no “traffic sign” classes [7, 8], which means that the model would be unable to facilitate labeling for this class or any other class at will. Sample of a detector on MS-COCO [7] T.Y. Lin et al. Microsoft COCO: Common objects in context, 2014. https://arxiv.org/abs/1405.0312 [8] https://github.com/matlab-deep-learning/Object-Detection-Using- Pretrained-YOLO-v2/blob/main/+helper/coco-classes.txt
  • 9. 2024 Monthly Report 9 [LIMITING PROMPT PRE-LABELING] ZERO-SHOT DETECTOR AS PRE-LABELER • Transformer-based detector DETR [4] with grounded (i.e. limited) pre- training. • It has the capability to detect arbitrary objects with text-based prompts such as class names or referring expressions. • Extensive network with two network DETR as the detector and GLIP as the NLP processor; with both construct a zero-shot detector with vast classes [5]. [4] H. Zhang et al. DETR with improved denoising anchor boxes for end-to-end object detection, 2022. https://arxiv.org/abs/2203.03605 [5] S. Liu et al. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection, 2023. https://arxiv.org/pdf/2303.05499.pdf
  • 10. 2024 Monthly Report 10 [LIMITING PROMPT PRE-LABELING] HOW DOES IT WORK? (1/5) • It’s an end-to-end architecture which contains a backbone, a transformer encoder-decoder, and multiple prediction heads. • The backbone works as a convolutional network would do that is to extract features from the input image; specifically high- level features that associated with spatial information and hierarchical features. • The transformer encoder processes feature maps through layers of transformer that consists of patch-wise multi-head self- attention mechanisms and feed-forward neural networks to capture contextual information between each patch of features. • The transformer decoder processes the feature vectors from encoder through layers of transformer that also consists of patch- wise multi-head self-attention and feed-forward neural networks to generate set of object queries or predictions. • Loss function is based on bipartite matching loss that sums localization and classification losses.
  • 11. 2024 Monthly Report 11 [LIMITING PROMPT PRE-LABELING] HOW DOES IT WORK? (2/5) • During bounding box matching (prior to loss function calculation), it is common to have incorrect matches due to constraint of a fixed IoU threshold that may induce erroneous loss value. • DETR uses a contrastive denoised training, that basically treat the predictions and ground truths as corresponding attention masks. Therefore instead of matching is done on bounding box level, matching is done on attention mask level (or denoised level). This is supposed to solve the error nous loss value.
  • 12. 2024 Monthly Report 12 [LIMITING PROMPT PRE-LABELING] HOW DOES IT WORK? (3/5) • This method improves the utilization of DETR from an ordinary detector into a zero-shot detector by combining it with a parallel stream of GLIP or an NLP model. • Visual stream is processed by DETR. Text stream is processed by GLIP. • Essentially, text stream is going to be used as a limiting prompt that would provide cues to object detector in regressing visual stream for a particular class of interest.
  • 13. 2024 Monthly Report 13 [LIMITING PROMPT PRE-LABELING] HOW DOES IT WORK? (4/5) • Feature enhancer is a bridge between visual and text stream features. • To unify these features, it uses multiple feature enhancer layers such as deformable self-attention layer for enhancing features from visual stream and regular self-attention layer for enhancing features from text stream. • Deformable self-attention works to enhance visual features by allowing the model to focus on specific regions of interest within the images. • Regular self-attention mechanisms are applied to text features, enabling the model to capture the semantic relationships and context within the textual data.
  • 14. 2024 Monthly Report 14 [LIMITING PROMPT PRE-LABELING] HOW DOES IT WORK? (5/5) • A cross-modality decoder is then used to integrate text and image modality features. • The cross-modality decoder operates by processing the fused features and decoder queries through a series of attention layers and feed-forward networks. • These layers allow the decoder to effectively capture the relationships between the visual and textual information, enabling it to refine the object detections and assign appropriate labels. • After this step, the model proceeds with the final steps in the object detection including bounding box prediction, class specific confidence filtering and label assignment.
  • 15. 2024 Monthly Report 15 To test the performance of limiting prompt pre-labeling, the following sample image was used. Notice that the text prompt, helps the detector to predict which objects based on the semantics and contexts of the text. [LIMITING PROMPT PRE-LABELING] EXPERIMENTAL RESULTS (1/2)
  • 16. 2024 Monthly Report 16 [LIMITING PROMPT PRE-LABELING] EXPERIMENTAL RESULTS (2/2)
  • 17. [FUTURE TASKS] PIPELINE SCHEDULE PRELI MI N A R Y STUDIE S AND PROP OS A L LIMITI N G PROM P T 2024 FEB APR JUN AUG OCT DEC 2024 JAN MAR MAY JUL SEP NOV DATASE T BENCHMA RK 2024 Monthly Report 17 GENERAT IVE NETWORK DESIGN OBJECT DETECTOR D ESIGN MANUSC R I P T DRAFT FEED BACK AND REFINE ME N T ARXIV SUBMIS S I ON & HANDOVE R DAC202 5 SUBMIS S I ON
  • 18. 2024 Monthly Report 18 [FUTURE TASKS] ACTION ITEMS • Generative adversarial network that learns BDD-100K dataset (U.S. dataset) with clear (1st domain) and snowy (2nd domain). • Inference said generative adversarial network on IDD dataset (India dataset) to produce snowy domain. • Use limited labels from pre-label tool to make sure that the generative adversarial network does not convert any objects or class of interest but only converts the background without generating any false objects as well.