SlideShare a Scribd company logo
ImageNet Classification with Deep
Convolutional Neural Networks
신우철
Introduction
1. Trained one of the largest CNN on ImageNet data. The advantages of CNN
are 1) CNN’s prior knowledge, which are stationarity of statistics and
locality of pixel dependencies, 2) its easiness to be controlled, varying its
depth and breath, contributing to fewer parameters and easier training.
2. Implemented highly-optimized GPU implementation to facilitate the
training of large CNNs on high resolution images.
3. Introduced new features to improve performance, reduce training time,
and prevent overfitting.
Dataset
• Down-sampled ImageNet images to 256 x 256. Trained on centered raw
RGB values of pixels.
1) Rescaled the image such that the shorter side was of length 256
2) Cropped out the central 256 x 256 patch from the resulting image of 1).
3) Subtracted the mean activity over the training set from each pixel.
cf)
NORB
MNIST
LabelMe
Architecture
• 8 layers = 5 Convolutional + 3 Fully-connected
• Newly introduced features
1) ReLU Nonlinearity
• Much faster to train since it is non-saturating
• Nonlinear |tanh(x)| function focuses on preventing overfitting, while ReLU
focuses on fast learning of large models on large datasets
Architecture
2) Training on two GPUs
• Cross-GPU parallelization, while the GPUs only communicate in certain
layers. This is to tune the amount of computation by communication.
Architecture
3) Local Response Normalization
• Since the architecture uses ReLU, one high activation value can affect
adjacent activation values in convolution or pooling. Therefore, LRN is
conducted.
filter
: the activity of a neuron computed by applying kernel i at position (x,y)
: the response-normalized activity
N : #all kernels
n : adjacent #kernels at position (x,y)
Architecture
3) Local Response Normalization
filter 0 filter 1 filter 2 filter 3
1 2 3 1 2 1 2 1 2 4 2 1
4 5 6 2 3 2 3 2 3 5 2 1
7 8 9 3 4 3 4 3 4 2 2 4
0.50 0.25 0.30 0.17 0.22 0.07 0.10 0.11 0.33 0.20 0.40 0.20
0.20 0.15 0.15 0.07 0.08 0.04 0.08 0.12 0.21 0.15 0.25 0.10
0.12 0.10 0.10 0.04 0.04 0.03 0.14 0.10 0.10 0.10 0.15 0.13
k alpha beta n N
0 1 1 2 4
=
2
{0 + 1 x 12 + 22 + 42 }1
Architecture
4) Overlapping Pooling
• Overlapping reduces overfitting compared to non-overlapping pooling.
Architecture
5) Overall architecture
Convolutional layer
Kernel size = 11
Stride = 4
Filter = 96
Zero-padding = 0
(227 – 11) / 4 + 1 = 55
Maxpooling
Kernel size(z) = 3
Stride(s) = 2
Convolutional layer
Kernel size = 5
Stride = 1
Filter = 256
Zero-padding = 2
(55 – 3) / 2 + 1 = 27
(27 +2 * 2 – 5) / 1 + 1 = 27
Local response normalization
Convolutional layer
Kernel size = 3
Stride = 1
Filter = 384
Zero-padding = 1
(27 + 1 * 2– 3) / 1 + 1 = 27
Convolutional layer
Kernel size = 3
Stride = 1
Filter = 384
Zero-padding = 1
(13 +1 * 2– 3) / 1 + 1 = 13
Maxpooling
Kernel size(z) = 3
Stride = 2
(27 – 3) / 2 + 1 = 13
Local response normalization
Convolutional layer
Kernel size = 3
Stride = 1
Filter = 256
Zero-padding = 1
(13 +1 * 2– 3) / 1 + 1 = 13
Maxpooling
Kernel size(z) = 3
Stride = 2
(13 – 3) / 2 + 1 = 6
Flatten 6 * 6 * 256 = 9216
Fully connected 4096
Fully connected 4096
Fully connected 1000(softmax)
Reducing Overfitting
1) Data Augmentation
(1) Image translations and horizontal reflections
Train set
• Image translations (x (256-224) * (256-224))
• Horizontal reflections (x 2)
Total : (256-224) * (256-224) * 2 = 2048
Test set
• Image translations (x 5)
• Horizontal reflections (x 2)
• Total: 5 * 2 = 10
Reducing Overfitting
1) Data Augmentation
(2) Altering intensity of RGB channels (performing PCA)
2) Dropout
• Applied dropout on first two FC layers with p = 0.5
Pi : eigen vector
 : eigen value
 : random
=
Details of Learning
• SGD with batch size of 128 examples
• Momentum = 0.9
• Weight decay = 0.0005
• Weight initialization : N(0, 0.012
)
• Neuron biases initialization:
Conv layers = 0
FC layers = 1
• Learning rate
Initialized at 0.01 and reduced three times prior to termination. Reduction was
done by dividing learning rate by 10 when the validation error rate stopped
improving with the current learning rate.
Results
Results
• Result of restricted connectivity between two GPUs result in specialization.
Kernels on GPU 1 are largely color-agnostic, while kernels on GPU 2 are
largely color-specific.
• Perform kNN at the last 4096-dimensional hidden layer shows that images
are semantically similar.

More Related Content

PPTX
Visualizing and understanding convolutional networks(2014)
PDF
Color and 3D Semantic Reconstruction of Indoor Scenes from RGB-D stream
PDF
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
PPTX
CSTalks - Object detection and tracking - 25th May
PDF
Yolo v2 ai_tech_20190421
PDF
Analysis of KinectFusion
PDF
Mask R-CNN
PDF
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps
Visualizing and understanding convolutional networks(2014)
Color and 3D Semantic Reconstruction of Indoor Scenes from RGB-D stream
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
CSTalks - Object detection and tracking - 25th May
Yolo v2 ai_tech_20190421
Analysis of KinectFusion
Mask R-CNN
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

What's hot (20)

PPTX
Kintinuous review
PPTX
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
PDF
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
PPTX
Depth estimation using deep learning
PDF
Seed net automatic seed generation with deep reinforcement learning for robus...
PPTX
Motion Estimation in h.264 encoder
PPTX
Model compression
PDF
FastCampus 2018 SLAM Workshop
PDF
30th コンピュータビジョン勉強会@関東 DynamicFusion
PPTX
Thesis Presentation
PPTX
PPTX
Single Image Depth Estimation using frequency domain analysis and Deep learning
PDF
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
VJAI Paper Reading#3-KDD2019-ClusterGCN
PDF
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
PPTX
Geometry Batching Using Texture-Arrays
PPTX
Final Review
PDF
3D reconstruction
PDF
Introductory Level of SLAM Seminar
Kintinuous review
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
Depth estimation using deep learning
Seed net automatic seed generation with deep reinforcement learning for robus...
Motion Estimation in h.264 encoder
Model compression
FastCampus 2018 SLAM Workshop
30th コンピュータビジョン勉強会@関東 DynamicFusion
Thesis Presentation
Single Image Depth Estimation using frequency domain analysis and Deep learning
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
Semantic segmentation with Convolutional Neural Network Approaches
VJAI Paper Reading#3-KDD2019-ClusterGCN
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
Geometry Batching Using Texture-Arrays
Final Review
3D reconstruction
Introductory Level of SLAM Seminar
Ad

Similar to ImageNet classification with deep convolutional neural networks(2012) (20)

PPTX
Deep learning requirement and notes for novoice
PPTX
Introduction to Neural Networks and Deep Learning
PDF
Lecture 6: Convolutional Neural Networks
PDF
Mask-RCNN for Instance Segmentation
PDF
Hardware Acceleration for Machine Learning
PDF
Graph Regularised Hashing
PDF
Network Deconvolution review [cdm]
PDF
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
PDF
FPL15 talk: Deep Convolutional Neural Network on FPGA
PDF
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
PPTX
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
PDF
Lecture 5: Convolutional Neural Network Models
PPTX
B.tech_project_ppt.pptx
PPTX
Convolutional Neural Networks
PPTX
PDF
3D Brain Image Segmentation Model using Deep Learning and Hidden Markov Rando...
PDF
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
PPTX
Convolutional neural network in deep learning
PPTX
Convolutional neural network in deep learning
PPTX
BLE Localiser for iOS Conf SG 2017
Deep learning requirement and notes for novoice
Introduction to Neural Networks and Deep Learning
Lecture 6: Convolutional Neural Networks
Mask-RCNN for Instance Segmentation
Hardware Acceleration for Machine Learning
Graph Regularised Hashing
Network Deconvolution review [cdm]
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
FPL15 talk: Deep Convolutional Neural Network on FPGA
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Lecture 5: Convolutional Neural Network Models
B.tech_project_ppt.pptx
Convolutional Neural Networks
3D Brain Image Segmentation Model using Deep Learning and Hidden Markov Rando...
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
Convolutional neural network in deep learning
Convolutional neural network in deep learning
BLE Localiser for iOS Conf SG 2017
Ad

Recently uploaded (20)

PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
PPT on Performance Review to get promotions
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Well-logging-methods_new................
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT on Performance Review to get promotions
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
573137875-Attendance-Management-System-original
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Well-logging-methods_new................
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Automation-in-Manufacturing-Chapter-Introduction.pdf
Digital Logic Computer Design lecture notes
R24 SURVEYING LAB MANUAL for civil enggi
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Internet of Things (IOT) - A guide to understanding
CH1 Production IntroductoryConcepts.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Lecture Notes Electrical Wiring System Components
CYBER-CRIMES AND SECURITY A guide to understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

ImageNet classification with deep convolutional neural networks(2012)

  • 1. ImageNet Classification with Deep Convolutional Neural Networks 신우철
  • 2. Introduction 1. Trained one of the largest CNN on ImageNet data. The advantages of CNN are 1) CNN’s prior knowledge, which are stationarity of statistics and locality of pixel dependencies, 2) its easiness to be controlled, varying its depth and breath, contributing to fewer parameters and easier training. 2. Implemented highly-optimized GPU implementation to facilitate the training of large CNNs on high resolution images. 3. Introduced new features to improve performance, reduce training time, and prevent overfitting.
  • 3. Dataset • Down-sampled ImageNet images to 256 x 256. Trained on centered raw RGB values of pixels. 1) Rescaled the image such that the shorter side was of length 256 2) Cropped out the central 256 x 256 patch from the resulting image of 1). 3) Subtracted the mean activity over the training set from each pixel. cf) NORB MNIST LabelMe
  • 4. Architecture • 8 layers = 5 Convolutional + 3 Fully-connected • Newly introduced features 1) ReLU Nonlinearity • Much faster to train since it is non-saturating • Nonlinear |tanh(x)| function focuses on preventing overfitting, while ReLU focuses on fast learning of large models on large datasets
  • 5. Architecture 2) Training on two GPUs • Cross-GPU parallelization, while the GPUs only communicate in certain layers. This is to tune the amount of computation by communication.
  • 6. Architecture 3) Local Response Normalization • Since the architecture uses ReLU, one high activation value can affect adjacent activation values in convolution or pooling. Therefore, LRN is conducted. filter : the activity of a neuron computed by applying kernel i at position (x,y) : the response-normalized activity N : #all kernels n : adjacent #kernels at position (x,y)
  • 7. Architecture 3) Local Response Normalization filter 0 filter 1 filter 2 filter 3 1 2 3 1 2 1 2 1 2 4 2 1 4 5 6 2 3 2 3 2 3 5 2 1 7 8 9 3 4 3 4 3 4 2 2 4 0.50 0.25 0.30 0.17 0.22 0.07 0.10 0.11 0.33 0.20 0.40 0.20 0.20 0.15 0.15 0.07 0.08 0.04 0.08 0.12 0.21 0.15 0.25 0.10 0.12 0.10 0.10 0.04 0.04 0.03 0.14 0.10 0.10 0.10 0.15 0.13 k alpha beta n N 0 1 1 2 4 = 2 {0 + 1 x 12 + 22 + 42 }1
  • 8. Architecture 4) Overlapping Pooling • Overlapping reduces overfitting compared to non-overlapping pooling.
  • 10. Convolutional layer Kernel size = 11 Stride = 4 Filter = 96 Zero-padding = 0 (227 – 11) / 4 + 1 = 55 Maxpooling Kernel size(z) = 3 Stride(s) = 2 Convolutional layer Kernel size = 5 Stride = 1 Filter = 256 Zero-padding = 2 (55 – 3) / 2 + 1 = 27 (27 +2 * 2 – 5) / 1 + 1 = 27 Local response normalization
  • 11. Convolutional layer Kernel size = 3 Stride = 1 Filter = 384 Zero-padding = 1 (27 + 1 * 2– 3) / 1 + 1 = 27 Convolutional layer Kernel size = 3 Stride = 1 Filter = 384 Zero-padding = 1 (13 +1 * 2– 3) / 1 + 1 = 13 Maxpooling Kernel size(z) = 3 Stride = 2 (27 – 3) / 2 + 1 = 13 Local response normalization
  • 12. Convolutional layer Kernel size = 3 Stride = 1 Filter = 256 Zero-padding = 1 (13 +1 * 2– 3) / 1 + 1 = 13 Maxpooling Kernel size(z) = 3 Stride = 2 (13 – 3) / 2 + 1 = 6 Flatten 6 * 6 * 256 = 9216 Fully connected 4096 Fully connected 4096 Fully connected 1000(softmax)
  • 13. Reducing Overfitting 1) Data Augmentation (1) Image translations and horizontal reflections Train set • Image translations (x (256-224) * (256-224)) • Horizontal reflections (x 2) Total : (256-224) * (256-224) * 2 = 2048 Test set • Image translations (x 5) • Horizontal reflections (x 2) • Total: 5 * 2 = 10
  • 14. Reducing Overfitting 1) Data Augmentation (2) Altering intensity of RGB channels (performing PCA) 2) Dropout • Applied dropout on first two FC layers with p = 0.5 Pi : eigen vector  : eigen value  : random =
  • 15. Details of Learning • SGD with batch size of 128 examples • Momentum = 0.9 • Weight decay = 0.0005 • Weight initialization : N(0, 0.012 ) • Neuron biases initialization: Conv layers = 0 FC layers = 1 • Learning rate Initialized at 0.01 and reduced three times prior to termination. Reduction was done by dividing learning rate by 10 when the validation error rate stopped improving with the current learning rate.
  • 17. Results • Result of restricted connectivity between two GPUs result in specialization. Kernels on GPU 1 are largely color-agnostic, while kernels on GPU 2 are largely color-specific. • Perform kNN at the last 4096-dimensional hidden layer shows that images are semantically similar.