SlideShare a Scribd company logo
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
Lecture 1 -Fei-Fei Li & Justin Johnson & Serena Yeung
Computer
Vision
Neuroscience
Machine learning
Speech, NLP
Information retrieval
Mathematics
Computer

Science
Biology
Engineering
Physics
Robotics
Cognitive
sciences
Psychology
graphics, algorithms,
theory,…
Image
processing
4/4/20174
systems,
architecture, …
optics
最近の研究情勢についていくために - Deep Learningを中心に -
R-CNN
Piotr Doll´ar Ross Girshick
search (FAIR)
RoIAlignRoIAlign
class
box
convconv convconv
Figure 1. The MaskR-CNN framework for instance segmentation.
a fixed set of categories without differentiating object in-
stances.1
Given this, one might expect a complex method
is required to achieve good results. However, we show that
a surprisingly simple, flexible, and fast system can surpass
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Google
vinyals@google.com
Alexander Toshev
Google
toshev@google.com
Samy Bengio
Google
bengio@google.com
Dumitru Erhan
Google
dumitru@google.com
Abstract
Automatically describing the content of an image is a
fundamental problem in artificial intelligence that connects
computer vision and natural language processing. In this
paper, we present a generative model based on a deep re-
current architecture that combines recent advances in com-
puter vision and machine translation and that can be used
to generate natural sentences describing an image. The
model is trained to maximize the likelihood of the target de-
scription sentence given the training image. Experiments
on several datasets show the accuracy of the model and the
fluency of the language it learns solely from image descrip-
tions. Our model is often quite accurate, which we verify
both qualitatively and quantitatively. For instance, while
the current state-of-the-art BLEU-1 score (the higher the
A group of people
shopping at an
outdoor market.
!
There are many
vegetables at the
fruit stand.
Vision!
Deep CNN
Language !
Generating!
RNN
Figure 1. NIC, our model, is based end-to-end on a neural net-
work consisting of a vision CNN followed by a language gener-
ating RNN. It generates complete sentences in natural language
from an input image, as shown on the example above.
existing solutions of the above sub-problems, in order to go
from an image to its description [6, 16]. In contrast, we
Perceptual Generative Adversarial Networks for Small Object Detection
Jianan Li Xiaodan Liang Yunchao Wei Tingfa Xu Jiashi Feng Shuicheng Yan
Abstract
Detecting small objects is notoriously challenging due
to their low resolution and noisy representation. Exist-
ing object detection pipelines usually detect small objects
through learning representations of all the objects at multi-
ple scales. However, the performance gain of such ad hoc
architectures is usually limited to pay off the computational
cost. In this work, we address the small object detection
problem by developing a single architecture that internally
lifts representations of small objects to “super-resolved”
ones, achieving similar characteristics as large objects and
thus more discriminative for detection. For this purpose,
we propose a new Perceptual Generative Adversarial Net-
work (Perceptual GAN) model that improves small object
Perceptual
GAN
Features For
Small Instance
Super-resolved
Features
Features For
Large Instance
≈
Figure 1. Large and small objects exhibit different representation
from high-level convolutional layers of a CNN detector. The repr
sentations of large objects are discriminative while those of sma
objects are of low resolution, which hurts the detection accurac
In this work, we introduce the Perceptual GAN model to enhanc
the representations for small objects to be similar to real large ob
jects, thus improve detection performance on the small objects.
cs.CV]20Jun2017
and Cityscapes (bottom) using a single ResNet-101-FPN network.
PQ PQTh
PQSt
mIoU AP
DIN [1] 53.8 42.5 62.1 - 28.6
Panoptic FPN 58.1 52.0 62.5 75.7 33.0
O (top) and Cityscapes (bottom) using a single ResNet-101-FPN network.
PQSt
PQ PQTh
PQSt
mIoU AP
Features for Amodal 3D Object Detection
Zhixin Wang and Kui Jia
Abstract— In this work, we propose a novel method termed
Frustum ConvNet (F-ConvNet) for amodal 3D object detection
from point clouds. Given 2D region proposals in a RGB image,
our method first generates a sequence of frustums for each
region proposal, and uses the obtained frustums to group local
points. F-ConvNet aggregates point-wise features as frustum-
level feature vectors, and arrays these feature vectors as a
feature map for use of its subsequent component of fully
convolutional network (FCN), which spatially fuses frustum-
level features and supports an end-to-end and continuous
estimation of oriented boxes in the 3D space. We also propose
component variants of L-ConvNet, including a FCN variant
that extracts multi-resolution frustum features, and a refined
use of L-ConvNet over a reduced 3D space. Careful ablation
studies verify the efficacy of these component variants. L-
ConvNet assumes no prior knowledge of the working 3D envi-
ronment, and is thus dataset-agnostic. We present experiments
on both the indoor SUN-RGBD and outdoor KITTI datasets. L-
ConvNet outperforms all existing methods on SUN-RGBD, and
at the time of submission it outperforms all published works on
the KITTI benchmark. We will make the code of L-ConvNet
publicly available.
I. INTRODUCTION
Detection of object instances in 3D sensory data has
tremendous importance in many applications including au-
tonomous driving, robotic object manipulation, and aug-
mented reality. Among others, RGB-D images and LiDAR
point clouds are the most representative formats of 3D
Fig. 1: Illustration for how a sequence of frustums are
generated for a region proposal in a RGB image.
or volumes, these methods suffer from loss of critical 3D
information in the projection or quantization process.
With the progress of point set deep learning [11], [12],
recent methods [13], [14] resort to learning features directly
from raw point clouds. For example, the seminal work of
F-PointNet [13] first finds local points corresponding to
pixels inside a 2D region proposal, and then uses PointNet
[11] to segment from these local points the foreground
ones; the amodal 3D box is finally estimated from the
foreground points. Performance of this method is limited
due to the reasons that (1) it is not of end-to-end learning,
.01864v1[cs.CV]5Mar2019
Method
MV3D [5]
VoxelNet [14]
F-PointNet [13]
AVOD-FPN [6]
SECOND [15]
IPOD [22]
PointPillars [16]
PointRCNN-v1.1 [23]
Ours
TABLE
Fig. 7: Qualitative results on the
different categories, with green f
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
Chen Wang2
Danfei Xu1
Yuke Zhu1
Roberto Mart´ın-Mart´ın1
Cewu Lu2
Li Fei-Fei1
Silvio Savarese1
1
Department of Computer Science, Stanford University
2
Department of Computer Science, Shanghai Jiao Tong University
Abstract
A key technical challenge in performing 6D object pose
estimation from RGB-D image is to fully leverage the two
complementary data sources. Prior works either extract in-
formation from the RGB image and depth separately or use
costly post-processing steps, limiting their performances in
highly cluttered scenes and real-time applications. In this
work, we present DenseFusion, a generic framework for
estimating 6D pose of a set of known objects from RGB-
D images. DenseFusion is a heterogeneous architecture
that processes the two data sources individually and uses a
novel dense fusion network to extract pixel-wise dense fea-
ture embedding, from which the pose is estimated. Further-
more, we integrate an end-to-end iterative pose refinement
RGB-D
DenseFusion
Figure 1. We develop an end-to-end deep network model for 6D
1[cs.CV]15Jan2019
Deep Learning for Generic Object Detection: A Survey
Li Liu 1,2
· Wanli Ouyang 3
· Xiaogang Wang 4
·
Paul Fieguth 5
· Jie Chen 2
· Xinwang Liu 1
· Matti Pietik¨ainen 2
Received: 12 September 2018
Abstract Generic object detection, aiming at locating object in-
stances from a large number of predefined categories in natural
images, is one of the most fundamental and challenging problems
in computer vision. Deep learning techniques have emerged in re-
cent years as powerful methods for learning feature representations
directly from data, and have led to remarkable breakthroughs in
the field of generic object detection. Given this time of rapid evo-
lution, the goal of this paper is to provide a comprehensive sur-
vey of the recent achievements in this field brought by deep learn-
ing techniques. More than 250 key contributions are included in
this survey, covering many aspects of generic object detection re-
search: leading detection frameworks and fundamental subprob-
lems including object feature representation, object proposal gen-
eration, context information modeling and training strategies; eval-
uation issues, specifically benchmark datasets, evaluation metrics,
and state of the art performance. We finish by identifying promis-
ing directions for future research.
Keywords Object detection · deep learning · convolutional neural
networks · object recognition
1 Introduction
As a longstanding, fundamental and challenging problem in com-
puter vision, object detection has been an active area of research
for several decades. The goal of object detection is to determine
whether or not there are any instances of objects from the given
categories (such as humans, cars, bicycles, dogs and cats) in some
Li Liu (li.liu@oulu.fi)
Wanli Ouyang (wanli.ouyang@sydney.edu.au)
Xiaogang Wang (xgwang@ee.cuhk.edu.hk)
Paul Fieguth (pfieguth@uwaterloo.ca)
Jie Chen (jie.chen@oulu.fi)
Xinwang Liu (xinwangliu@nudt.edu.cn)
Matti Pietik¨ainen (matti.pietikainen@oulu.fi)
1 National University of Defense Technology, China
2 University of Oulu, Finland
3 University of Sydney, Australia
4 Chinese University of Hong Kong, China
ILSVRC yearVOC year Results on VOC2012 Data
(a) (b)
Turning Point in 2012: Deep Learning Achieved Record Breaking Image Classification Result
Fig. 1 Recent evolution of object detection performance. We can observe sig-
nificant performance (mean average precision) improvement since deep learn-
ing entered the scene in 2012. The performance of the best detector has been
steadily increasing by a significant amount on a yearly basis. (a) Results on the
PASCAL VOC datasets: Detection results of winning entries in the VOC2007-
2012 competitions (using only provided training data). (b) Top object detection
competition results in ILSVRC2013-2017 (using only provided training data).
given image and, if present, to return the spatial location and ex-
tent of each object instance (e.g., via a bounding box [53, 179]).
As the cornerstone of image understanding and computer vision,
object detection forms the basis for solving more complex or high
level vision tasks such as segmentation, scene understanding, ob-
ject tracking, image captioning, event detection, and activity recog-
nition. Object detection has a wide range of applications in many
areas of artificial intelligence and information technologies, in-
cluding robot vision, consumer electronics, security, autonomous
driving, human computer interaction, content based image retrieval,
intelligent video surveillance, and augmented reality.
Recently, deep learning techniques [81, 116] have emerged as
powerful methods for learning feature representations automati-
cally from data. In particular, these techniques have provided sig-
nificant improvement for object detection, a problem which has
attracted enormous attention in the last five years, even though it
has been studied for decades by psychophysicists, neuroscientists,
and engineers.
Object detection can be grouped into one of two types [69,
240]: detection of specific instance and detection of specific cat-
egories. The first type aims at detecting instances of a particular
object (such as Donald Trump’s face, the Pentagon building, or my
arXiv:1809.02165v1[cs.CV]6Sep2018
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
Deep Learning for Generic Object Detection: A Survey
Li Liu 1,2
· Wanli Ouyang 3
· Xiaogang Wang 4
·
Paul Fieguth 5
· Jie Chen 2
· Xinwang Liu 1
· Matti Pietik¨ainen 2
Received: 12 September 2018
Abstract Generic object detection, aiming at locating object in-
stances from a large number of predefined categories in natural
images, is one of the most fundamental and challenging problems
in computer vision. Deep learning techniques have emerged in re-
cent years as powerful methods for learning feature representations
directly from data, and have led to remarkable breakthroughs in
the field of generic object detection. Given this time of rapid evo-
lution, the goal of this paper is to provide a comprehensive sur-
vey of the recent achievements in this field brought by deep learn-
ing techniques. More than 250 key contributions are included in
this survey, covering many aspects of generic object detection re-
search: leading detection frameworks and fundamental subprob-
lems including object feature representation, object proposal gen-
eration, context information modeling and training strategies; eval-
uation issues, specifically benchmark datasets, evaluation metrics,
and state of the art performance. We finish by identifying promis-
ing directions for future research.
Keywords Object detection · deep learning · convolutional neural
networks · object recognition
1 Introduction
As a longstanding, fundamental and challenging problem in com-
puter vision, object detection has been an active area of research
for several decades. The goal of object detection is to determine
whether or not there are any instances of objects from the given
categories (such as humans, cars, bicycles, dogs and cats) in some
Li Liu (li.liu@oulu.fi)
Wanli Ouyang (wanli.ouyang@sydney.edu.au)
Xiaogang Wang (xgwang@ee.cuhk.edu.hk)
Paul Fieguth (pfieguth@uwaterloo.ca)
Jie Chen (jie.chen@oulu.fi)
Xinwang Liu (xinwangliu@nudt.edu.cn)
Matti Pietik¨ainen (matti.pietikainen@oulu.fi)
1 National University of Defense Technology, China
2 University of Oulu, Finland
3 University of Sydney, Australia
4 Chinese University of Hong Kong, China
ILSVRC yearVOC year Results on VOC2012 Data
(a) (b)
Turning Point in 2012: Deep Learning Achieved Record Breaking Image Classification Result
Fig. 1 Recent evolution of object detection performance. We can observe sig-
nificant performance (mean average precision) improvement since deep learn-
ing entered the scene in 2012. The performance of the best detector has been
steadily increasing by a significant amount on a yearly basis. (a) Results on the
PASCAL VOC datasets: Detection results of winning entries in the VOC2007-
2012 competitions (using only provided training data). (b) Top object detection
competition results in ILSVRC2013-2017 (using only provided training data).
given image and, if present, to return the spatial location and ex-
tent of each object instance (e.g., via a bounding box [53, 179]).
As the cornerstone of image understanding and computer vision,
object detection forms the basis for solving more complex or high
level vision tasks such as segmentation, scene understanding, ob-
ject tracking, image captioning, event detection, and activity recog-
nition. Object detection has a wide range of applications in many
areas of artificial intelligence and information technologies, in-
cluding robot vision, consumer electronics, security, autonomous
driving, human computer interaction, content based image retrieval,
intelligent video surveillance, and augmented reality.
Recently, deep learning techniques [81, 116] have emerged as
powerful methods for learning feature representations automati-
cally from data. In particular, these techniques have provided sig-
nificant improvement for object detection, a problem which has
attracted enormous attention in the last five years, even though it
has been studied for decades by psychophysicists, neuroscientists,
and engineers.
Object detection can be grouped into one of two types [69,
240]: detection of specific instance and detection of specific cat-
egories. The first type aims at detecting instances of a particular
object (such as Donald Trump’s face, the Pentagon building, or my
arXiv:1809.02165v1[cs.CV]6Sep2018
Deep Learning for Generic Object Detection: A Survey
Li Liu 1,2
· Wanli Ouyang 3
· Xiaogang Wang 4
·
Paul Fieguth 5
· Jie Chen 2
· Xinwang Liu 1
· Matti Pietik¨ainen 2
Received: 12 September 2018
Abstract Generic object detection, aiming at locating object in-
stances from a large number of predefined categories in natural
images, is one of the most fundamental and challenging problems
in computer vision. Deep learning techniques have emerged in re-
cent years as powerful methods for learning feature representations
directly from data, and have led to remarkable breakthroughs in
the field of generic object detection. Given this time of rapid evo-
lution, the goal of this paper is to provide a comprehensive sur-
vey of the recent achievements in this field brought by deep learn-
ing techniques. More than 250 key contributions are included in
this survey, covering many aspects of generic object detection re-
search: leading detection frameworks and fundamental subprob-
lems including object feature representation, object proposal gen-
eration, context information modeling and training strategies; eval-
uation issues, specifically benchmark datasets, evaluation metrics,
and state of the art performance. We finish by identifying promis-
ing directions for future research.
Keywords Object detection · deep learning · convolutional neural
networks · object recognition
1 Introduction
As a longstanding, fundamental and challenging problem in com-
puter vision, object detection has been an active area of research
for several decades. The goal of object detection is to determine
whether or not there are any instances of objects from the given
categories (such as humans, cars, bicycles, dogs and cats) in some
Li Liu (li.liu@oulu.fi)
Wanli Ouyang (wanli.ouyang@sydney.edu.au)
Xiaogang Wang (xgwang@ee.cuhk.edu.hk)
Paul Fieguth (pfieguth@uwaterloo.ca)
Jie Chen (jie.chen@oulu.fi)
Xinwang Liu (xinwangliu@nudt.edu.cn)
Matti Pietik¨ainen (matti.pietikainen@oulu.fi)
1 National University of Defense Technology, China
2 University of Oulu, Finland
3 University of Sydney, Australia
4 Chinese University of Hong Kong, China
5 University of Waterloo, Canada
ILSVRC yearVOC year Results on VOC2012 Data
(a) (b)
Turning Point in 2012: Deep Learning Achieved Record Breaking Image Classification Result
Fig. 1 Recent evolution of object detection performance. We can observe sig-
nificant performance (mean average precision) improvement since deep learn-
ing entered the scene in 2012. The performance of the best detector has been
steadily increasing by a significant amount on a yearly basis. (a) Results on the
PASCAL VOC datasets: Detection results of winning entries in the VOC2007-
2012 competitions (using only provided training data). (b) Top object detection
competition results in ILSVRC2013-2017 (using only provided training data).
given image and, if present, to return the spatial location and ex-
tent of each object instance (e.g., via a bounding box [53, 179]).
As the cornerstone of image understanding and computer vision,
object detection forms the basis for solving more complex or high
level vision tasks such as segmentation, scene understanding, ob-
ject tracking, image captioning, event detection, and activity recog-
nition. Object detection has a wide range of applications in many
areas of artificial intelligence and information technologies, in-
cluding robot vision, consumer electronics, security, autonomous
driving, human computer interaction, content based image retrieval,
intelligent video surveillance, and augmented reality.
Recently, deep learning techniques [81, 116] have emerged as
powerful methods for learning feature representations automati-
cally from data. In particular, these techniques have provided sig-
nificant improvement for object detection, a problem which has
attracted enormous attention in the last five years, even though it
has been studied for decades by psychophysicists, neuroscientists,
and engineers.
Object detection can be grouped into one of two types [69,
240]: detection of specific instance and detection of specific cat-
egories. The first type aims at detecting instances of a particular
object (such as Donald Trump’s face, the Pentagon building, or my
dog Penny), whereas the goal of the second type is to detect differ-
ent instances of predefined object categories (for example humans,
arXiv:1809.02165v1[cs.CV]6Sep2018
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
🍆
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -

More Related Content

PPTX
CVPR2016を自分なりにまとめてみた
PDF
Learning Disentangled Representation for Robust Person Re-identification
PDF
Visual geometry with deep learning
PDF
Pc Seminar Jordi
PDF
Dq4301702706
PDF
A Survey on Approaches for Object Tracking
PDF
Real time pedestrian detection, tracking, and distance estimation
CVPR2016を自分なりにまとめてみた
Learning Disentangled Representation for Robust Person Re-identification
Visual geometry with deep learning
Pc Seminar Jordi
Dq4301702706
A Survey on Approaches for Object Tracking
Real time pedestrian detection, tracking, and distance estimation

What's hot (20)

PPTX
Object detection
PDF
[IJET V2I3P2] Authors: Shraddha Kallappa Walikar, Dr. Aswatha Kumar M
PDF
Character Recognition (Devanagari Script)
PDF
Jw2517421747
PDF
Color Based Object Tracking with OpenCV A Survey
PPTX
Digest of Human Detection from CVPR2015
PDF
Visual Object Category Recognition
PPTX
Introduction to Object recognition
PDF
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
PDF
Visual Object Tracking: review
PDF
Survey on video object detection & tracking
PDF
Occlusion and Abandoned Object Detection for Surveillance Applications
PPTX
Object tracking a survey
PPTX
Object Detection & Tracking
PDF
Object Detection and tracking in Video Sequences
PDF
Object Capturing In A Cluttered Scene By Using Point Feature Matching
PPT
Moving object detection
PDF
Object tracking
PPTX
Object recognition
PPT
Presentation Object Recognition And Tracking Project
Object detection
[IJET V2I3P2] Authors: Shraddha Kallappa Walikar, Dr. Aswatha Kumar M
Character Recognition (Devanagari Script)
Jw2517421747
Color Based Object Tracking with OpenCV A Survey
Digest of Human Detection from CVPR2015
Visual Object Category Recognition
Introduction to Object recognition
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Visual Object Tracking: review
Survey on video object detection & tracking
Occlusion and Abandoned Object Detection for Surveillance Applications
Object tracking a survey
Object Detection & Tracking
Object Detection and tracking in Video Sequences
Object Capturing In A Cluttered Scene By Using Point Feature Matching
Moving object detection
Object tracking
Object recognition
Presentation Object Recognition And Tracking Project
Ad

Similar to 最近の研究情勢についていくために - Deep Learningを中心に - (20)

PDF
Object Detetcion using SSD-MobileNet
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
PDF
Machine learning based augmented reality for improved learning application th...
PDF
REVIEW ON OBJECT DETECTION WITH CNN
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
PDF
A Literature Survey: Neural Networks for object detection
PDF
A novel model to detect and categorize objects from images by using a hybrid ...
PDF
DSNet Joint Semantic Learning for Object Detection in Inclement Weather Condi...
PDF
Comparative Study of Object Detection Algorithms
PDF
Partial Object Detection in Inclined Weather Conditions
PDF
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
PDF
Backbone search for object detection for applications in intrusion warning sy...
PDF
Modern convolutional object detectors
PDF
物件偵測與辨識技術
PPTX
[NS][Lab_Seminar_241118]Relation Matters: Foreground-aware Graph-based Relati...
PDF
Computer vision for transportation
PDF
Deep Neural Networks Presentation
PPTX
Presentation2.pptx of sota seminar iit kanpur
PDF
IRJET- Real-Time Object Detection System using Caffe Model
PDF
Object Detection and Tracking AI Robot
Object Detetcion using SSD-MobileNet
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
Machine learning based augmented reality for improved learning application th...
REVIEW ON OBJECT DETECTION WITH CNN
IRJET- Real-Time Object Detection using Deep Learning: A Survey
A Literature Survey: Neural Networks for object detection
A novel model to detect and categorize objects from images by using a hybrid ...
DSNet Joint Semantic Learning for Object Detection in Inclement Weather Condi...
Comparative Study of Object Detection Algorithms
Partial Object Detection in Inclined Weather Conditions
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Backbone search for object detection for applications in intrusion warning sy...
Modern convolutional object detectors
物件偵測與辨識技術
[NS][Lab_Seminar_241118]Relation Matters: Foreground-aware Graph-based Relati...
Computer vision for transportation
Deep Neural Networks Presentation
Presentation2.pptx of sota seminar iit kanpur
IRJET- Real-Time Object Detection System using Caffe Model
Object Detection and Tracking AI Robot
Ad

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Institutional Correction lecture only . . .
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
RMMM.pdf make it easy to upload and study
PDF
Basic Mud Logging Guide for educational purpose
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Cell Structure & Organelles in detailed.
PDF
Pre independence Education in Inndia.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Business Ethics Teaching Materials for college
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Insiders guide to clinical Medicine.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Pharma ospi slides which help in ospi learning
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Microbial disease of the cardiovascular and lymphatic systems
O7-L3 Supply Chain Operations - ICLT Program
Institutional Correction lecture only . . .
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Types and Its function , kingdom of life
RMMM.pdf make it easy to upload and study
Basic Mud Logging Guide for educational purpose
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Cell Structure & Organelles in detailed.
Pre independence Education in Inndia.pdf
PPH.pptx obstetrics and gynecology in nursing
Business Ethics Teaching Materials for college
Module 4: Burden of Disease Tutorial Slides S2 2025
Insiders guide to clinical Medicine.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx

最近の研究情勢についていくために - Deep Learningを中心に -

  • 15. Lecture 1 -Fei-Fei Li & Justin Johnson & Serena Yeung Computer Vision Neuroscience Machine learning Speech, NLP Information retrieval Mathematics Computer
 Science Biology Engineering Physics Robotics Cognitive sciences Psychology graphics, algorithms, theory,… Image processing 4/4/20174 systems, architecture, … optics
  • 17. R-CNN Piotr Doll´ar Ross Girshick search (FAIR) RoIAlignRoIAlign class box convconv convconv Figure 1. The MaskR-CNN framework for instance segmentation. a fixed set of categories without differentiating object in- stances.1 Given this, one might expect a complex method is required to achieve good results. However, we show that a surprisingly simple, flexible, and fast system can surpass Show and Tell: A Neural Image Caption Generator Oriol Vinyals Google vinyals@google.com Alexander Toshev Google toshev@google.com Samy Bengio Google bengio@google.com Dumitru Erhan Google dumitru@google.com Abstract Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep re- current architecture that combines recent advances in com- puter vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target de- scription sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descrip- tions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the A group of people shopping at an outdoor market. ! There are many vegetables at the fruit stand. Vision! Deep CNN Language ! Generating! RNN Figure 1. NIC, our model, is based end-to-end on a neural net- work consisting of a vision CNN followed by a language gener- ating RNN. It generates complete sentences in natural language from an input image, as shown on the example above. existing solutions of the above sub-problems, in order to go from an image to its description [6, 16]. In contrast, we Perceptual Generative Adversarial Networks for Small Object Detection Jianan Li Xiaodan Liang Yunchao Wei Tingfa Xu Jiashi Feng Shuicheng Yan Abstract Detecting small objects is notoriously challenging due to their low resolution and noisy representation. Exist- ing object detection pipelines usually detect small objects through learning representations of all the objects at multi- ple scales. However, the performance gain of such ad hoc architectures is usually limited to pay off the computational cost. In this work, we address the small object detection problem by developing a single architecture that internally lifts representations of small objects to “super-resolved” ones, achieving similar characteristics as large objects and thus more discriminative for detection. For this purpose, we propose a new Perceptual Generative Adversarial Net- work (Perceptual GAN) model that improves small object Perceptual GAN Features For Small Instance Super-resolved Features Features For Large Instance ≈ Figure 1. Large and small objects exhibit different representation from high-level convolutional layers of a CNN detector. The repr sentations of large objects are discriminative while those of sma objects are of low resolution, which hurts the detection accurac In this work, we introduce the Perceptual GAN model to enhanc the representations for small objects to be similar to real large ob jects, thus improve detection performance on the small objects. cs.CV]20Jun2017
  • 18. and Cityscapes (bottom) using a single ResNet-101-FPN network. PQ PQTh PQSt mIoU AP DIN [1] 53.8 42.5 62.1 - 28.6 Panoptic FPN 58.1 52.0 62.5 75.7 33.0 O (top) and Cityscapes (bottom) using a single ResNet-101-FPN network. PQSt PQ PQTh PQSt mIoU AP Features for Amodal 3D Object Detection Zhixin Wang and Kui Jia Abstract— In this work, we propose a novel method termed Frustum ConvNet (F-ConvNet) for amodal 3D object detection from point clouds. Given 2D region proposals in a RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustum- level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum- level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of L-ConvNet, including a FCN variant that extracts multi-resolution frustum features, and a refined use of L-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. L- ConvNet assumes no prior knowledge of the working 3D envi- ronment, and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. L- ConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. We will make the code of L-ConvNet publicly available. I. INTRODUCTION Detection of object instances in 3D sensory data has tremendous importance in many applications including au- tonomous driving, robotic object manipulation, and aug- mented reality. Among others, RGB-D images and LiDAR point clouds are the most representative formats of 3D Fig. 1: Illustration for how a sequence of frustums are generated for a region proposal in a RGB image. or volumes, these methods suffer from loss of critical 3D information in the projection or quantization process. With the progress of point set deep learning [11], [12], recent methods [13], [14] resort to learning features directly from raw point clouds. For example, the seminal work of F-PointNet [13] first finds local points corresponding to pixels inside a 2D region proposal, and then uses PointNet [11] to segment from these local points the foreground ones; the amodal 3D box is finally estimated from the foreground points. Performance of this method is limited due to the reasons that (1) it is not of end-to-end learning, .01864v1[cs.CV]5Mar2019 Method MV3D [5] VoxelNet [14] F-PointNet [13] AVOD-FPN [6] SECOND [15] IPOD [22] PointPillars [16] PointRCNN-v1.1 [23] Ours TABLE Fig. 7: Qualitative results on the different categories, with green f DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion Chen Wang2 Danfei Xu1 Yuke Zhu1 Roberto Mart´ın-Mart´ın1 Cewu Lu2 Li Fei-Fei1 Silvio Savarese1 1 Department of Computer Science, Stanford University 2 Department of Computer Science, Shanghai Jiao Tong University Abstract A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources. Prior works either extract in- formation from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications. In this work, we present DenseFusion, a generic framework for estimating 6D pose of a set of known objects from RGB- D images. DenseFusion is a heterogeneous architecture that processes the two data sources individually and uses a novel dense fusion network to extract pixel-wise dense fea- ture embedding, from which the pose is estimated. Further- more, we integrate an end-to-end iterative pose refinement RGB-D DenseFusion Figure 1. We develop an end-to-end deep network model for 6D 1[cs.CV]15Jan2019
  • 19. Deep Learning for Generic Object Detection: A Survey Li Liu 1,2 · Wanli Ouyang 3 · Xiaogang Wang 4 · Paul Fieguth 5 · Jie Chen 2 · Xinwang Liu 1 · Matti Pietik¨ainen 2 Received: 12 September 2018 Abstract Generic object detection, aiming at locating object in- stances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in re- cent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evo- lution, the goal of this paper is to provide a comprehensive sur- vey of the recent achievements in this field brought by deep learn- ing techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection re- search: leading detection frameworks and fundamental subprob- lems including object feature representation, object proposal gen- eration, context information modeling and training strategies; eval- uation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promis- ing directions for future research. Keywords Object detection · deep learning · convolutional neural networks · object recognition 1 Introduction As a longstanding, fundamental and challenging problem in com- puter vision, object detection has been an active area of research for several decades. The goal of object detection is to determine whether or not there are any instances of objects from the given categories (such as humans, cars, bicycles, dogs and cats) in some Li Liu (li.liu@oulu.fi) Wanli Ouyang (wanli.ouyang@sydney.edu.au) Xiaogang Wang (xgwang@ee.cuhk.edu.hk) Paul Fieguth (pfieguth@uwaterloo.ca) Jie Chen (jie.chen@oulu.fi) Xinwang Liu (xinwangliu@nudt.edu.cn) Matti Pietik¨ainen (matti.pietikainen@oulu.fi) 1 National University of Defense Technology, China 2 University of Oulu, Finland 3 University of Sydney, Australia 4 Chinese University of Hong Kong, China ILSVRC yearVOC year Results on VOC2012 Data (a) (b) Turning Point in 2012: Deep Learning Achieved Record Breaking Image Classification Result Fig. 1 Recent evolution of object detection performance. We can observe sig- nificant performance (mean average precision) improvement since deep learn- ing entered the scene in 2012. The performance of the best detector has been steadily increasing by a significant amount on a yearly basis. (a) Results on the PASCAL VOC datasets: Detection results of winning entries in the VOC2007- 2012 competitions (using only provided training data). (b) Top object detection competition results in ILSVRC2013-2017 (using only provided training data). given image and, if present, to return the spatial location and ex- tent of each object instance (e.g., via a bounding box [53, 179]). As the cornerstone of image understanding and computer vision, object detection forms the basis for solving more complex or high level vision tasks such as segmentation, scene understanding, ob- ject tracking, image captioning, event detection, and activity recog- nition. Object detection has a wide range of applications in many areas of artificial intelligence and information technologies, in- cluding robot vision, consumer electronics, security, autonomous driving, human computer interaction, content based image retrieval, intelligent video surveillance, and augmented reality. Recently, deep learning techniques [81, 116] have emerged as powerful methods for learning feature representations automati- cally from data. In particular, these techniques have provided sig- nificant improvement for object detection, a problem which has attracted enormous attention in the last five years, even though it has been studied for decades by psychophysicists, neuroscientists, and engineers. Object detection can be grouped into one of two types [69, 240]: detection of specific instance and detection of specific cat- egories. The first type aims at detecting instances of a particular object (such as Donald Trump’s face, the Pentagon building, or my arXiv:1809.02165v1[cs.CV]6Sep2018
  • 22. Deep Learning for Generic Object Detection: A Survey Li Liu 1,2 · Wanli Ouyang 3 · Xiaogang Wang 4 · Paul Fieguth 5 · Jie Chen 2 · Xinwang Liu 1 · Matti Pietik¨ainen 2 Received: 12 September 2018 Abstract Generic object detection, aiming at locating object in- stances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in re- cent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evo- lution, the goal of this paper is to provide a comprehensive sur- vey of the recent achievements in this field brought by deep learn- ing techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection re- search: leading detection frameworks and fundamental subprob- lems including object feature representation, object proposal gen- eration, context information modeling and training strategies; eval- uation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promis- ing directions for future research. Keywords Object detection · deep learning · convolutional neural networks · object recognition 1 Introduction As a longstanding, fundamental and challenging problem in com- puter vision, object detection has been an active area of research for several decades. The goal of object detection is to determine whether or not there are any instances of objects from the given categories (such as humans, cars, bicycles, dogs and cats) in some Li Liu (li.liu@oulu.fi) Wanli Ouyang (wanli.ouyang@sydney.edu.au) Xiaogang Wang (xgwang@ee.cuhk.edu.hk) Paul Fieguth (pfieguth@uwaterloo.ca) Jie Chen (jie.chen@oulu.fi) Xinwang Liu (xinwangliu@nudt.edu.cn) Matti Pietik¨ainen (matti.pietikainen@oulu.fi) 1 National University of Defense Technology, China 2 University of Oulu, Finland 3 University of Sydney, Australia 4 Chinese University of Hong Kong, China ILSVRC yearVOC year Results on VOC2012 Data (a) (b) Turning Point in 2012: Deep Learning Achieved Record Breaking Image Classification Result Fig. 1 Recent evolution of object detection performance. We can observe sig- nificant performance (mean average precision) improvement since deep learn- ing entered the scene in 2012. The performance of the best detector has been steadily increasing by a significant amount on a yearly basis. (a) Results on the PASCAL VOC datasets: Detection results of winning entries in the VOC2007- 2012 competitions (using only provided training data). (b) Top object detection competition results in ILSVRC2013-2017 (using only provided training data). given image and, if present, to return the spatial location and ex- tent of each object instance (e.g., via a bounding box [53, 179]). As the cornerstone of image understanding and computer vision, object detection forms the basis for solving more complex or high level vision tasks such as segmentation, scene understanding, ob- ject tracking, image captioning, event detection, and activity recog- nition. Object detection has a wide range of applications in many areas of artificial intelligence and information technologies, in- cluding robot vision, consumer electronics, security, autonomous driving, human computer interaction, content based image retrieval, intelligent video surveillance, and augmented reality. Recently, deep learning techniques [81, 116] have emerged as powerful methods for learning feature representations automati- cally from data. In particular, these techniques have provided sig- nificant improvement for object detection, a problem which has attracted enormous attention in the last five years, even though it has been studied for decades by psychophysicists, neuroscientists, and engineers. Object detection can be grouped into one of two types [69, 240]: detection of specific instance and detection of specific cat- egories. The first type aims at detecting instances of a particular object (such as Donald Trump’s face, the Pentagon building, or my arXiv:1809.02165v1[cs.CV]6Sep2018
  • 23. Deep Learning for Generic Object Detection: A Survey Li Liu 1,2 · Wanli Ouyang 3 · Xiaogang Wang 4 · Paul Fieguth 5 · Jie Chen 2 · Xinwang Liu 1 · Matti Pietik¨ainen 2 Received: 12 September 2018 Abstract Generic object detection, aiming at locating object in- stances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in re- cent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evo- lution, the goal of this paper is to provide a comprehensive sur- vey of the recent achievements in this field brought by deep learn- ing techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection re- search: leading detection frameworks and fundamental subprob- lems including object feature representation, object proposal gen- eration, context information modeling and training strategies; eval- uation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promis- ing directions for future research. Keywords Object detection · deep learning · convolutional neural networks · object recognition 1 Introduction As a longstanding, fundamental and challenging problem in com- puter vision, object detection has been an active area of research for several decades. The goal of object detection is to determine whether or not there are any instances of objects from the given categories (such as humans, cars, bicycles, dogs and cats) in some Li Liu (li.liu@oulu.fi) Wanli Ouyang (wanli.ouyang@sydney.edu.au) Xiaogang Wang (xgwang@ee.cuhk.edu.hk) Paul Fieguth (pfieguth@uwaterloo.ca) Jie Chen (jie.chen@oulu.fi) Xinwang Liu (xinwangliu@nudt.edu.cn) Matti Pietik¨ainen (matti.pietikainen@oulu.fi) 1 National University of Defense Technology, China 2 University of Oulu, Finland 3 University of Sydney, Australia 4 Chinese University of Hong Kong, China 5 University of Waterloo, Canada ILSVRC yearVOC year Results on VOC2012 Data (a) (b) Turning Point in 2012: Deep Learning Achieved Record Breaking Image Classification Result Fig. 1 Recent evolution of object detection performance. We can observe sig- nificant performance (mean average precision) improvement since deep learn- ing entered the scene in 2012. The performance of the best detector has been steadily increasing by a significant amount on a yearly basis. (a) Results on the PASCAL VOC datasets: Detection results of winning entries in the VOC2007- 2012 competitions (using only provided training data). (b) Top object detection competition results in ILSVRC2013-2017 (using only provided training data). given image and, if present, to return the spatial location and ex- tent of each object instance (e.g., via a bounding box [53, 179]). As the cornerstone of image understanding and computer vision, object detection forms the basis for solving more complex or high level vision tasks such as segmentation, scene understanding, ob- ject tracking, image captioning, event detection, and activity recog- nition. Object detection has a wide range of applications in many areas of artificial intelligence and information technologies, in- cluding robot vision, consumer electronics, security, autonomous driving, human computer interaction, content based image retrieval, intelligent video surveillance, and augmented reality. Recently, deep learning techniques [81, 116] have emerged as powerful methods for learning feature representations automati- cally from data. In particular, these techniques have provided sig- nificant improvement for object detection, a problem which has attracted enormous attention in the last five years, even though it has been studied for decades by psychophysicists, neuroscientists, and engineers. Object detection can be grouped into one of two types [69, 240]: detection of specific instance and detection of specific cat- egories. The first type aims at detecting instances of a particular object (such as Donald Trump’s face, the Pentagon building, or my dog Penny), whereas the goal of the second type is to detect differ- ent instances of predefined object categories (for example humans, arXiv:1809.02165v1[cs.CV]6Sep2018
  • 28. 🍆