Intelligence Machine Vision Lab
Strictly Confidential
2019 CVPR paper overview
SUALAB
Ho Seong Lee
2Type A-3
Contents
• CVPR 2019 Statistics
• 20 paper-one page summary
3Type A-3
CVPR 2019 Statistics
• What is CVPR?
• Conference on Computer Vision and Pattern Recognition (CVPR)
• CVPR was first held in 1983 and has been held annually
• CVPR 2019: June 16th – June 20th in Long Beach, CA
4Type A-3
CVPR 2019 Statistics
• CVPR 2019 statistics
• The total number of papers is increasing every year and this year has increased significantly!
• We can visualize main topic using title of paper and simple python script!
• https://github.com/hoya012/CVPR-Paper-Statistics
28.4% 30%
29.9%
29.6%
25.1%
5Type A-3
CVPR 2019 Statistics
2019 CVPR paper statistics
6Type A-3
CVPR 2018 Statistics
2018 CVPR paper statistics
Compared to 2018 Statistics..
7Type A-3
CVPR 2018 vs CVPR 2019 Statistics
• Most of the top keywords were maintained
• Image, detection, 3d, object, video, segmentation, adversarial, recognition, visual …
• “graph”, “cloud”, “representation” are about twice as frequent
• graph : 15 → 45
• representation: 25 → 48
• cloud: 16 → 35
8Type A-3
Before beginning..
• It does not mean that it is not an interesting article because it is not in the list.
• Since I mainly studied Computer Vision, most papers that I will discuss today are
Computer Vision papers..
• Topics not covered today
• Natural Language Processing
• Reinforcement Learning
• Robotics
• Etc..?
9Type A-3
1. Learning to Synthesize Motion Blur (oral)
• Synthesizing a motion blurred image from a pair of unblurred sequential images
• Motion blur is important in cinematography, and artful photo
• Generate a large-scale synthetic training dataset of motion blurred images
Recommended reference: “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation”, 2018 CVPR
10Type A-3
2. Semantic Image Synthesis with Spatially-Adaptive Normalization (oral)
• Synthesizing photorealistic images given an input semantic layout
• Spatially-adaptive normalization can keep semantic information
• This model allows user control over both semantic and style as synthesizing images
Demo Code: https://github.com/NVlabs/SPADE
11Type A-3
3. SiCloPe: Silhouette-Based Clothed People (Oral)
• Reconstruct a complete and textured 3D model of a person from a single image
• Use 2D Silhouettes and 3D joints of a body pose to reconstruct 3D mesh
• An effective two-stage 3D shape reconstruction pipeline
• Predicting multi-view 2D silhouettes from single input segmentation
• Deep visual hull based mesh reconstruction technique
Recommended reference: “BodyNet: Volumetric Inference of 3D Human Body Shapes”, 2018 ECCV
12Type A-3
4. Im2Pencil: Controllable Pencil Illustration from Photographs
• Propose controllable photo-to-pencil translation method
• Modeling pencil outline(rough, clean), pencil shading(4 types)
• Create training data pairs from online websites(e.g., Pinterest) and use image filtering techniques
Demo Code: https://github.com/Yijunmaverick/Im2Pencil
13Type A-3
5. End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image
• End-to-end solution to synthesize a time-lapse video from single image
• Use time-lapse videos and image sequences during training
• Use only single image during inference
Input image(single)
14Type A-3
6. StoryGAN: A Sequential Conditional GAN for Story Visualization
• Propose a new task called Story Visualization using GAN
• Sequential conditional GAN based StoryGAN
• Story Encoder – stochastic mapping from story to an low-dimensional embedding vector
• Context Encoder – capture contextual information during sequential image generation
• Two Discriminator – Image Discriminator & Story Discriminator
Context Encoder
15Type A-3
7. Image Super-Resolution by Neural Texture Transfer (oral)
• Improve “RefSR” even when irrelevant reference images are provided
• Traditional Single Image Super-Resolution is extremely challenging (ill-posed problem)
• Reference-based(RefSR) utilizes rich texture from HR references .. but.. only similar Ref images
• Adaptively transferring the texture from Ref Images according to their texture similarity
Recommended reference: “CrossNet: An end-to-end reference-based super resolution network using cross-scale warping”, 2018 ECCV
Similar
Different
16Type A-3
8. DVC: An End-to-end Deep Video Compression Framework (oral)
• Propose the first end-to-end video compression deep model
• Conventional video compression use predictive coding architecture and encode corresponding
motion information and residual information
• Taking advantage of both classical compression and neural network
• Use learning based optical flow estimation
17Type A-3
9. Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search (oral)
• Defense adversarial attack using Big-data and image manifold
• Assume that adversarial attack move the image away from the image manifold
• A successful defense mechanism should aim to project the images back on the image manifold
• For tens of billions of images, search a nearest-neighbor images (K=50) and use them
• Also propose two novel attack methods to break nearest neighbor defenses
18Type A-3
10. Bag of Tricks for Image Classification with Convolutional Neural Networks
• Examine a collection of some refinements and empirically evaluate their impact
• Improve ResNet-50’s accuracy from 75.3% to 79.29% on ImageNet with some refinements
• Efficient Training
• FP32 with BS=256 → FP16 with BS=1024 with some techniques
• Training Refinements:
• Cosine Learning Rate Decay / Label Smoothing / Knowledge Distillation / Mixup Training
• Transfer from classification to Object Detection, Semantic Segmentation
Linear scaling LR
LR warmup
Zero γ initialization in BN
No bias decay
Result of Efficient Training
Result of Training refinements
ResNet tweaks
19Type A-3
11. Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
• Automatically learn the group structure in training stage with end-to-end manner
• Outperform standard group convolution
• Propose an efficient strategy for index re-ordering
20Type A-3
12. ScratchDet:Exploring to Train Single-Shot Object Detectors from Scratch (oral)
• Explore to train object detectors from scratch robustly
• Almost SOTA detectors are fine-tuned from pretrained CNN (e.g., ImageNet)
• The classification and detection have different degrees of sensitivity to translation
• The architecture is limited by the classification network(backbone) → inconvenience!
• Find that one of the overlooked points is BatchNorm!
Recommended reference: “DSOD: Learning Deeply Supervised Object Detectors from Scratch”, 2017 ICCV
21Type A-3
13. Precise Detection in Densely Packed Scenes
• Propose precise detection in densely packed scenes
• In real-world, there are many applications of object detection (ex, detection and count # of object)
• In densely packed scenes, SOTA detector can’t detect accurately
(1) layer for estimating the Jaccard index (2) a novel EM merging unit (3) release SKU-110K dataset
22Type A-3
14. SIXray: A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images
• Present a large-scale dataset and establish a baseline for security inspection X-ray
• Total 1,059,231 X-ray images in which 6 classes of 8,929 prohibited items
• Propose an approach named class-balanced hierarchical refinement(CHR) and class-balanced loss
function
23Type A-3
15. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regressio
• Address the weaknesses of IoU and introduce generalized version(GIoU)
• Intersection over Union(IoU) is the most popular evaluation metric used in object detection
• But, there is a gap between optimizing distance losses and maximizing IoU
• Introducing generalized IoU as both a new loss and a new metric
24Type A-3
16. Bounding Box Regression with Uncertainty for Accurate Object Detection
• Propose novel bounding box regression loss with uncertainty
• Most of datasets have ambiguities and labeling noise of bounding box coordinate
• Network can learns to predict localization variance for each coordinate
25Type A-3
17. UPSNet: A Unified Panoptic Segmentation Network (oral)
• Propose a unified panoptic segmentation network(UPSNet)
• Semantic segmentation + Instance segmentation = panoptic segmentation
• Semantic Head + Instance Head + Panoptic head → end-to-end manner
Recommended reference: “Panoptic Segmentation”, 2018 arXiv
Countable objects → things
Uncountable objects → stuff
Deformable Conv Mask R-CNN Parameter-free
26Type A-3
18. SFNet: Learning Object-aware Semantic Correspondence (Oral)
• Propose SFNet for semantic correspondence problem
• Propose to use images annotated with binary foreground masks and synthetic geometric
deformations during training
• Manually selecting point correspondences is so expensive!!
• Outperform SOTA on standard benchmarks by a significant margin
27Type A-3
19. Fast Interactive Object Annotation with Curve-GC
• Propose end-to-end fast interactive object annotation tool (Curve-GCN)
• Predict all vertices simultaneously using a Graph Convolutional Network, (→ Polygon-RNN X)
• Human annotator can correct any wrong point and only the neighboring points are affected
Recommended reference: “Efficient interactive annotation of segmentation datasets with polygon-rnn++ ”, 2018 CVPR
Code: https://github.com/fidler-lab/curve-gcn
Correction!
28Type A-3
20. FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference
• Propose image-level WSSS method using stochastic inference (dropout)
• Localization maps(CAM) only focus on the small parts of objects → Problem
• FickleNet allows a single network to generate multiple CAM from a single image
• Does not require any additional training steps and only adds a simple layer
Full Stochastic
Both Training
and Inference
29Type A-3
Related Post..
• In my personal blog, there are similar works
• SIGGRAPH 2018
• NeurIPS 2018
• ICLR 2019
https://hoya012.github.io/
Thank you

More Related Content

PPTX
Object detection - RCNNs vs Retinanet
PPTX
License Plate recognition
PPTX
PDF
Faster R-CNN - PR012
PDF
End-to-End Object Detection with Transformers
PDF
Forward+ (EUROGRAPHICS 2012)
PPTX
ViT.pptx
PPTX
Object Recognition
Object detection - RCNNs vs Retinanet
License Plate recognition
Faster R-CNN - PR012
End-to-End Object Detection with Transformers
Forward+ (EUROGRAPHICS 2012)
ViT.pptx
Object Recognition

What's hot (20)

PDF
BEV Semantic Segmentation
PPTX
Moving object detection
PDF
Panoptic Segmentation
PPT
Histogram equalization
PPT
smart note writer
PDF
Panoptic Segmentation @CVPR2019
PPT
Frequency Domain Image Enhancement Techniques
PPT
08 frequency domain filtering DIP
PPTX
8K Extremely High Resolution Camera System
PPTX
TensorFlow Object Detection API
PPTX
Digit recognition
PPTX
Deep learning based object detection
PDF
Anchor free object detection by deep learning
PPTX
Interfacing bluetooth with arduino
PDF
Classification of signal
PPT
Chapter-05c-Image-Restoration-(Reconstruction-from-Projections).ppt
PPTX
Convolutional Neural Network
PDF
fusion of Camera and lidar for autonomous driving II
PDF
Lecture 15 DCT, Walsh and Hadamard Transform
PPTX
Depth estimation using deep learning
BEV Semantic Segmentation
Moving object detection
Panoptic Segmentation
Histogram equalization
smart note writer
Panoptic Segmentation @CVPR2019
Frequency Domain Image Enhancement Techniques
08 frequency domain filtering DIP
8K Extremely High Resolution Camera System
TensorFlow Object Detection API
Digit recognition
Deep learning based object detection
Anchor free object detection by deep learning
Interfacing bluetooth with arduino
Classification of signal
Chapter-05c-Image-Restoration-(Reconstruction-from-Projections).ppt
Convolutional Neural Network
fusion of Camera and lidar for autonomous driving II
Lecture 15 DCT, Walsh and Hadamard Transform
Depth estimation using deep learning
Ad

Similar to 2019 cvpr paper_overview (20)

PPTX
TechnicalBackgroundOverview
PPTX
Image Segmentation Using Deep Learning : A survey
PDF
Unsupervised/Self-supervvised visual object tracking
PPTX
slide-171212080528.pptx
PPTX
Real Time Object Dectection using machine learning
PPTX
Computer Vision Landscape : Present and Future
PDF
Visual geometry with deep learning
PPTX
• An attacker’s aim for carrying out a CSRF attack is to force the user to su...
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
PPT
Multimedia Mining
PDF
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PPTX
Neural Networks for Machine Learning and Deep Learning
PPTX
Moving object detection in complex scene
PPT
The Concurrent Constraint Programming Research Programmes -- Redux
PDF
Deep learning fundamental and Research project on IBM POWER9 system from NUS
PDF
PPTX
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
PDF
Camera-Based Road Lane Detection by Deep Learning II
PPTX
Computer vision-nit-silchar-hackathon
TechnicalBackgroundOverview
Image Segmentation Using Deep Learning : A survey
Unsupervised/Self-supervvised visual object tracking
slide-171212080528.pptx
Real Time Object Dectection using machine learning
Computer Vision Landscape : Present and Future
Visual geometry with deep learning
• An attacker’s aim for carrying out a CSRF attack is to force the user to su...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
Multimedia Mining
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Neural Networks for Machine Learning and Deep Learning
Moving object detection in complex scene
The Concurrent Constraint Programming Research Programmes -- Redux
Deep learning fundamental and Research project on IBM POWER9 system from NUS
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
Camera-Based Road Lane Detection by Deep Learning II
Computer vision-nit-silchar-hackathon
Ad

More from LEE HOSEONG (20)

PDF
Unsupervised anomaly detection using style distillation
PDF
do adversarially robust image net models transfer better
PDF
CNN Architecture A to Z
PDF
carrier of_tricks_for_image_classification
PDF
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
PDF
Mixed Precision Training Review
PDF
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
PDF
YOLOv4: optimal speed and accuracy of object detection review
PDF
FixMatch:simplifying semi supervised learning with consistency and confidence
PDF
"Revisiting self supervised visual representation learning" Paper Review
PDF
Unsupervised visual representation learning overview: Toward Self-Supervision
PDF
Human uncertainty makes classification more robust, ICCV 2019 Review
PDF
Single Image Super Resolution Overview
PDF
2019 ICLR Best Paper Review
PDF
"Google Vizier: A Service for Black-Box Optimization" Paper Review
PDF
"Searching for Activation Functions" Paper Review
PDF
"Learning transferable architectures for scalable image recognition" Paper Re...
PDF
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
PDF
"Dataset and metrics for predicting local visible differences" Paper Review
PDF
"From image level to pixel-level labeling with convolutional networks" Paper ...
Unsupervised anomaly detection using style distillation
do adversarially robust image net models transfer better
CNN Architecture A to Z
carrier of_tricks_for_image_classification
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
Mixed Precision Training Review
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
YOLOv4: optimal speed and accuracy of object detection review
FixMatch:simplifying semi supervised learning with consistency and confidence
"Revisiting self supervised visual representation learning" Paper Review
Unsupervised visual representation learning overview: Toward Self-Supervision
Human uncertainty makes classification more robust, ICCV 2019 Review
Single Image Super Resolution Overview
2019 ICLR Best Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Searching for Activation Functions" Paper Review
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
"From image level to pixel-level labeling with convolutional networks" Paper ...

Recently uploaded (20)

PDF
Architecture types and enterprise applications.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PPTX
Configure Apache Mutual Authentication
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Modernising the Digital Integration Hub
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
CloudStack 4.21: First Look Webinar slides
PDF
sbt 2.0: go big (Scala Days 2025 edition)
Architecture types and enterprise applications.pdf
Comparative analysis of machine learning models for fake news detection in so...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Configure Apache Mutual Authentication
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Module 1.ppt Iot fundamentals and Architecture
Chapter 5: Probability Theory and Statistics
Getting started with AI Agents and Multi-Agent Systems
The influence of sentiment analysis in enhancing early warning system model f...
sustainability-14-14877-v2.pddhzftheheeeee
Custom Battery Pack Design Considerations for Performance and Safety
Convolutional neural network based encoder-decoder for efficient real-time ob...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Modernising the Digital Integration Hub
Flame analysis and combustion estimation using large language and vision assi...
UiPath Agentic Automation session 1: RPA to Agents
CloudStack 4.21: First Look Webinar slides
sbt 2.0: go big (Scala Days 2025 edition)

2019 cvpr paper_overview

  • 1. Intelligence Machine Vision Lab Strictly Confidential 2019 CVPR paper overview SUALAB Ho Seong Lee
  • 2. 2Type A-3 Contents • CVPR 2019 Statistics • 20 paper-one page summary
  • 3. 3Type A-3 CVPR 2019 Statistics • What is CVPR? • Conference on Computer Vision and Pattern Recognition (CVPR) • CVPR was first held in 1983 and has been held annually • CVPR 2019: June 16th – June 20th in Long Beach, CA
  • 4. 4Type A-3 CVPR 2019 Statistics • CVPR 2019 statistics • The total number of papers is increasing every year and this year has increased significantly! • We can visualize main topic using title of paper and simple python script! • https://github.com/hoya012/CVPR-Paper-Statistics 28.4% 30% 29.9% 29.6% 25.1%
  • 5. 5Type A-3 CVPR 2019 Statistics 2019 CVPR paper statistics
  • 6. 6Type A-3 CVPR 2018 Statistics 2018 CVPR paper statistics Compared to 2018 Statistics..
  • 7. 7Type A-3 CVPR 2018 vs CVPR 2019 Statistics • Most of the top keywords were maintained • Image, detection, 3d, object, video, segmentation, adversarial, recognition, visual … • “graph”, “cloud”, “representation” are about twice as frequent • graph : 15 → 45 • representation: 25 → 48 • cloud: 16 → 35
  • 8. 8Type A-3 Before beginning.. • It does not mean that it is not an interesting article because it is not in the list. • Since I mainly studied Computer Vision, most papers that I will discuss today are Computer Vision papers.. • Topics not covered today • Natural Language Processing • Reinforcement Learning • Robotics • Etc..?
  • 9. 9Type A-3 1. Learning to Synthesize Motion Blur (oral) • Synthesizing a motion blurred image from a pair of unblurred sequential images • Motion blur is important in cinematography, and artful photo • Generate a large-scale synthetic training dataset of motion blurred images Recommended reference: “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation”, 2018 CVPR
  • 10. 10Type A-3 2. Semantic Image Synthesis with Spatially-Adaptive Normalization (oral) • Synthesizing photorealistic images given an input semantic layout • Spatially-adaptive normalization can keep semantic information • This model allows user control over both semantic and style as synthesizing images Demo Code: https://github.com/NVlabs/SPADE
  • 11. 11Type A-3 3. SiCloPe: Silhouette-Based Clothed People (Oral) • Reconstruct a complete and textured 3D model of a person from a single image • Use 2D Silhouettes and 3D joints of a body pose to reconstruct 3D mesh • An effective two-stage 3D shape reconstruction pipeline • Predicting multi-view 2D silhouettes from single input segmentation • Deep visual hull based mesh reconstruction technique Recommended reference: “BodyNet: Volumetric Inference of 3D Human Body Shapes”, 2018 ECCV
  • 12. 12Type A-3 4. Im2Pencil: Controllable Pencil Illustration from Photographs • Propose controllable photo-to-pencil translation method • Modeling pencil outline(rough, clean), pencil shading(4 types) • Create training data pairs from online websites(e.g., Pinterest) and use image filtering techniques Demo Code: https://github.com/Yijunmaverick/Im2Pencil
  • 13. 13Type A-3 5. End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image • End-to-end solution to synthesize a time-lapse video from single image • Use time-lapse videos and image sequences during training • Use only single image during inference Input image(single)
  • 14. 14Type A-3 6. StoryGAN: A Sequential Conditional GAN for Story Visualization • Propose a new task called Story Visualization using GAN • Sequential conditional GAN based StoryGAN • Story Encoder – stochastic mapping from story to an low-dimensional embedding vector • Context Encoder – capture contextual information during sequential image generation • Two Discriminator – Image Discriminator & Story Discriminator Context Encoder
  • 15. 15Type A-3 7. Image Super-Resolution by Neural Texture Transfer (oral) • Improve “RefSR” even when irrelevant reference images are provided • Traditional Single Image Super-Resolution is extremely challenging (ill-posed problem) • Reference-based(RefSR) utilizes rich texture from HR references .. but.. only similar Ref images • Adaptively transferring the texture from Ref Images according to their texture similarity Recommended reference: “CrossNet: An end-to-end reference-based super resolution network using cross-scale warping”, 2018 ECCV Similar Different
  • 16. 16Type A-3 8. DVC: An End-to-end Deep Video Compression Framework (oral) • Propose the first end-to-end video compression deep model • Conventional video compression use predictive coding architecture and encode corresponding motion information and residual information • Taking advantage of both classical compression and neural network • Use learning based optical flow estimation
  • 17. 17Type A-3 9. Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search (oral) • Defense adversarial attack using Big-data and image manifold • Assume that adversarial attack move the image away from the image manifold • A successful defense mechanism should aim to project the images back on the image manifold • For tens of billions of images, search a nearest-neighbor images (K=50) and use them • Also propose two novel attack methods to break nearest neighbor defenses
  • 18. 18Type A-3 10. Bag of Tricks for Image Classification with Convolutional Neural Networks • Examine a collection of some refinements and empirically evaluate their impact • Improve ResNet-50’s accuracy from 75.3% to 79.29% on ImageNet with some refinements • Efficient Training • FP32 with BS=256 → FP16 with BS=1024 with some techniques • Training Refinements: • Cosine Learning Rate Decay / Label Smoothing / Knowledge Distillation / Mixup Training • Transfer from classification to Object Detection, Semantic Segmentation Linear scaling LR LR warmup Zero γ initialization in BN No bias decay Result of Efficient Training Result of Training refinements ResNet tweaks
  • 19. 19Type A-3 11. Fully Learnable Group Convolution for Acceleration of Deep Neural Networks • Automatically learn the group structure in training stage with end-to-end manner • Outperform standard group convolution • Propose an efficient strategy for index re-ordering
  • 20. 20Type A-3 12. ScratchDet:Exploring to Train Single-Shot Object Detectors from Scratch (oral) • Explore to train object detectors from scratch robustly • Almost SOTA detectors are fine-tuned from pretrained CNN (e.g., ImageNet) • The classification and detection have different degrees of sensitivity to translation • The architecture is limited by the classification network(backbone) → inconvenience! • Find that one of the overlooked points is BatchNorm! Recommended reference: “DSOD: Learning Deeply Supervised Object Detectors from Scratch”, 2017 ICCV
  • 21. 21Type A-3 13. Precise Detection in Densely Packed Scenes • Propose precise detection in densely packed scenes • In real-world, there are many applications of object detection (ex, detection and count # of object) • In densely packed scenes, SOTA detector can’t detect accurately (1) layer for estimating the Jaccard index (2) a novel EM merging unit (3) release SKU-110K dataset
  • 22. 22Type A-3 14. SIXray: A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images • Present a large-scale dataset and establish a baseline for security inspection X-ray • Total 1,059,231 X-ray images in which 6 classes of 8,929 prohibited items • Propose an approach named class-balanced hierarchical refinement(CHR) and class-balanced loss function
  • 23. 23Type A-3 15. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regressio • Address the weaknesses of IoU and introduce generalized version(GIoU) • Intersection over Union(IoU) is the most popular evaluation metric used in object detection • But, there is a gap between optimizing distance losses and maximizing IoU • Introducing generalized IoU as both a new loss and a new metric
  • 24. 24Type A-3 16. Bounding Box Regression with Uncertainty for Accurate Object Detection • Propose novel bounding box regression loss with uncertainty • Most of datasets have ambiguities and labeling noise of bounding box coordinate • Network can learns to predict localization variance for each coordinate
  • 25. 25Type A-3 17. UPSNet: A Unified Panoptic Segmentation Network (oral) • Propose a unified panoptic segmentation network(UPSNet) • Semantic segmentation + Instance segmentation = panoptic segmentation • Semantic Head + Instance Head + Panoptic head → end-to-end manner Recommended reference: “Panoptic Segmentation”, 2018 arXiv Countable objects → things Uncountable objects → stuff Deformable Conv Mask R-CNN Parameter-free
  • 26. 26Type A-3 18. SFNet: Learning Object-aware Semantic Correspondence (Oral) • Propose SFNet for semantic correspondence problem • Propose to use images annotated with binary foreground masks and synthetic geometric deformations during training • Manually selecting point correspondences is so expensive!! • Outperform SOTA on standard benchmarks by a significant margin
  • 27. 27Type A-3 19. Fast Interactive Object Annotation with Curve-GC • Propose end-to-end fast interactive object annotation tool (Curve-GCN) • Predict all vertices simultaneously using a Graph Convolutional Network, (→ Polygon-RNN X) • Human annotator can correct any wrong point and only the neighboring points are affected Recommended reference: “Efficient interactive annotation of segmentation datasets with polygon-rnn++ ”, 2018 CVPR Code: https://github.com/fidler-lab/curve-gcn Correction!
  • 28. 28Type A-3 20. FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference • Propose image-level WSSS method using stochastic inference (dropout) • Localization maps(CAM) only focus on the small parts of objects → Problem • FickleNet allows a single network to generate multiple CAM from a single image • Does not require any additional training steps and only adds a simple layer Full Stochastic Both Training and Inference
  • 29. 29Type A-3 Related Post.. • In my personal blog, there are similar works • SIGGRAPH 2018 • NeurIPS 2018 • ICLR 2019 https://hoya012.github.io/