SlideShare a Scribd company logo
Basic of DCNN : AlexNet and VggNet
ISL lab Seminar
Han-Sol Kang
08 May 2017
5 October 2018
2
Contents
Introduction DCNN VGG net Implementa
tion
Introduction
Machine Learning
5 October 2018
3
Machine
Learning
Perceptron
MLP
BP, DL
CNN
DCNN
SVM
Introduction
CNN
5 October 2018
4
1
2



S
FHPH
OH
1
2



S
FWPW
OW
1 2 3 0
0 1 2 3
3 0 1 2
2 3 0 1
2 0 1
0 1 2
1 0 2
7 12 10 2
4 15 16 10
10 6 15 6
8 10 4 3
* 
<LeNet>
Introduction
CNN
5 October 2018
5
DCNN
AlexNet
5 October 2018
6
“We used the wrong type of non-linearity”
Geoffrey Hinton
ReLU(Rectified Linear Unit)
VGG Net*
Convolutional network hit
5 October 2018
7
Accuracy
Depth
* Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
VGG Net
ConvNet Configuration
5 October 2018
8
A
11 weight
layers
Input(224x224RGBimage) conv3-64
maxpool
conv3-128
maxpool
conv3-256
conv3-256
maxpool
conv3-512
conv3-512
maxpool
conv3-512
conv3-512
maxpool
FC-4096
FC-4096
FC-1000
soft-max
A-LRN
11 weight
layers
conv3-64
LRN
conv3-128 conv3-256
conv3-256
conv3-512
conv3-512
conv3-512
conv3-512
B
13 weight
layers
conv3-64
conv3-64
conv3-128
conv3-128
conv3-256
conv3-256
conv3-512
conv3-512
conv3-512
conv3-512
C
16 weight
layers
conv3-64
conv3-64
conv3-128
conv3-128
conv3-256
conv3-256
conv1-256
conv3-512
conv3-512
conv1-512
conv3-512
conv3-512
conv1-512
D
16 weight
layers
conv3-64
conv3-64
conv3-128
conv3-128
conv3-256
conv3-256
conv3-256
conv3-512
conv3-512
conv3-512
conv3-512
conv3-512
conv3-512
E
19 weight
layers
conv3-64
conv3-64
conv3-128
conv3-128
conv3-256
conv3-256
conv3-256
conv3-256
conv3-512
conv3-512
conv3-512
conv3-512
conv3-512
conv3-512
conv3-512
conv3-512
VGG Net
Training
5 October 2018
9
 Mini-batch gradient descent with momentum (batch size : 256, momentum : 0.9)
W
-vv



L

vWW 
W
-WW



L

 Weight decay( , ) & dropout (0.5) regularization2L 4
105 

 초기 learning rate는 로 설정2
10
 A 네트워크 트레이닝 깊은 네트워크 트레이닝(초기 4개 Conv, 3개의 FC)
VGG Net
Training
5 October 2018
10
 Data augmentation(flip, RGB color shift, rescaling)
Single scale training : S를 고정(256 & 384)
Multi-scale training
: S를 일정 범위 안에서 랜덤으로 지정[𝑆 𝑚𝑖𝑛, 𝑆 𝑚𝑎𝑥],
(𝑆 𝑚𝑖𝑛 = 256, 𝑆 𝑚𝑎𝑥 = 512)
* S : 트레이닝 스케일, 입력 이미지의 비율을 유지하면서 스케일링 했을 때 가장 작은 면.
VGG Net
Testing
5 October 2018
11
 테스팅 스케일 Q를 사용
 첫번째 FC layer는 7x7 conv.layer 마지막 두 개의 FC layer는 1x1 conv.layer
 Dense evaluation을 이용. (multi-crop 방식과 같이 사용시 성능 향상)
VGG Net
Classification experiments
5 October 2018
12
ConvNet
config.
smallest image side top-1 val.
error(%)
top-5 val.
error(%)
train(S) test(Q)
A 256 256 29.6 10.4
A-LRN 256 256 29.7 10.5
B 256 256 28.7 9.9
C 256 256 28.1 9.4
384 384 28.1 9.3
[256;512] 384 27.3 8.8
D 256 256 27.0 8.8
384 384 26.8 8.7
[256;512] 384 25.6 8.1
E 256 256 27.3 9.0
384 384 26.9 8.7
[256;512] 384 25.5 8.0
ConvNet
config.
smallest image side top-1 val.
error(%)
top-5 val.
error(%)
train(S) test(Q)
B 256 224, 256, 288 28.2 9.6
C 256 224, 256, 288 27.7 9.2
384 352, 384, 416 27.8 9.2
[256;512] 256, 384, 512 26.3 8.2
D 256 224, 256, 288 26.6 8.6
384 352, 384, 416 26.5 8.6
[256;512] 256, 384, 512 24.8 7.5
E 256 224, 256, 288 26.9 8.7
384 352, 384, 416 26.7 8.6
[256;512] 256, 384, 512 24.8 7.5
VGG Net
Classification experiments
5 October 2018
13
ConvNet
config.
Evaluation
method
top-1 val.
error(%)
top-5 val.
error(%)
D dense 24.8 7.5
multi-crop 24.6 7.5
multi-crop &
dense
24.4 7.2
E dense 24.8 7.5
multi-crop 24.6 7.4
multi-crop &
dense
24.4 7.1
Method
top-1 val. e
rror(%)
top-5 val. e
rror(%)
top-5 test e
rror(%)
VGG(2 nets, multi-crop & d
ense eval.)
23.7 6.8 6.8
VGG(1 net, multi-crop & de
nse eval.)
24.4 7.1 7
VGG(ILSVRC submission,
7 nets, dense eval.)
2.47 7.5 7.3
GoogLeNet(1net) 7.9
GoogLeNet(7 nets) 6.7
MSRA(11 nets) 8.1
MSRA(1 net) 27.9 9.1 9.1
Clarifai(multiple nets) 11.7
Clarifai(1 net) 12.5
ZF Net(6nets) 36.0 14.7 14.8
ZF Net(1net) 37.5 16 16.1
OverFeat(7 nets) 34.0 13.2 13.6
OverFeat(1 nets) 35.7 14.2
AlexNet(5 nets) 38.1 16.4 16.4
AlexNet(1 net) 40.7 18.2
VGG Net
Conclusion
5 October 2018
14
 3x3의 아주 작은 컨볼루션 필터를 이용해 깊은 네트워크 구조를 평가.
 네트워크의 깊이가 깊어질수록 분류 정확도에 도움을 주는 것을 확인.
 전통적인 ConvNet 구조에서 깊이를 증가시켜 좋은 성능을 확인.
 VGG-16 & VGG-19 모델 공개
Implementation
5 October 2018
15
Epoch: 0001 cost = 0.366208560
Epoch: 0002 cost = 0.091067037
Epoch: 0003 cost = 0.067395312
Epoch: 0004 cost = 0.054241491
Epoch: 0005 cost = 0.046002268
Epoch: 0006 cost = 0.039577450
Epoch: 0007 cost = 0.034572003
Epoch: 0008 cost = 0.030414227
Epoch: 0009 cost = 0.026961391
Epoch: 0010 cost = 0.024227326
Epoch: 0011 cost = 0.020874776
Epoch: 0012 cost = 0.018590417
Epoch: 0013 cost = 0.016660221
Epoch: 0014 cost = 0.014668066
Epoch: 0015 cost = 0.012948724
Learning finished
Accuracy 0.9884
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
learning_rate = 0.001
training_epochs = 15
batch_size = 100
X = tf.placeholder(tf.float32, [None, 784])
X_img = tf.reshape(X, [-1, 28, 28, 1])
Y = tf.placeholder(tf.float32, [None, 10])
#첫번째 레이어
W1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01))
L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME')
L1 = tf.nn.relu(L1)
L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#두번째 레이어
W2 = tf.Variable(tf.random_normal([3, 3, 32, 64], stddev=0.01))
L2 = tf.nn.conv2d(L1, W2, strides=[1, 1, 1, 1], padding='SAME')
L2 = tf.nn.relu(L2)
L2 = tf.nn.max_pool(L2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
L2 = tf.reshape(L2, [-1, 7*7*64]) #FC 연결하기 위해 벡터로
W3 = tf.get_variable("W3", shape=[7*7*64,10], initializer=tf.contrib.layers.xavier_initializer())
b = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L2,W3) + b
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print('Learning started. It takes sometime')
for epoch in range(training_epochs):
avg_cost=0
total_batch = int(mnist.train.num_examples / batch_size)
for i in range(total_batch):
batch_xs , batch_ys = mnist.train.next_batch(batch_size)
feed_dict={X:batch_xs, Y:batch_ys}
c,_, = sess.run([cost, optimizer], feed_dict=feed_dict)
avg_cost+=c/total_batch
print('Epoch:', '%04d' % (epoch+1), 'cost =', '{:.9f}'.format(avg_cost))
print('Learning finished')
correct_prediction = tf.equal(tf.argmax(hypothesis,1), tf.argmax(Y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy', sess.run(accuracy, feed_dict={X:mnist.test.images, Y:mnist.test.labels}))
Implementation
5 October 2018
16
Implementation
5 October 2018
17
Epoch: 0001 cost = 0.409386985
Epoch: 0002 cost = 0.100627775
Epoch: 0003 cost = 0.072903002
Epoch: 0004 cost = 0.060526004
Epoch: 0005 cost = 0.052039743
import tensorflow as tf
import random
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
tf.set_random_seed(777) # reproducibility
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# hyper parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
# dropout (keep_prob) rate 0.7~0.5 on training, but should be 1 for testing
keep_prob = tf.placeholder(tf.float32)
# input place holders
X = tf.placeholder(tf.float32, [None, 784])
X_img = tf.reshape(X, [-1, 28, 28, 1]) # img 28x28x1 (black/white)
Y = tf.placeholder(tf.float32, [None, 10])
# L1 ImgIn shape=(?, 28, 28, 1)
W1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01))
L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME')
L1 = tf.nn.relu(L1)
L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
L1 = tf.nn.dropout(L1, keep_prob=keep_prob)
…
# L5 Final FC 625 inputs -> 10 outputs
W5 = tf.get_variable("W5", shape=[625, 10],
initializer=tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([10]))
logits = tf.matmul(L4, W5) + b5
# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# train my model
print('Learning started. It takes sometime.')
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(mnist.train.num_examples / batch_size)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
feed_dict = {X: batch_xs, Y: batch_ys, keep_prob: 0.7}
c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
avg_cost += c / total_batch
print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))
print('Learning Finished!')
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={
X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1}))
Epoch: 0006 cost = 0.047962842
Epoch: 0007 cost = 0.042300057
Epoch: 0008 cost = 0.039930305
Epoch: 0009 cost = 0.034254246
Epoch: 0010 cost = 0.033424444
Epoch: 0011 cost = 0.032899911
Epoch: 0012 cost = 0.031550007
Epoch: 0013 cost = 0.028447655
Epoch: 0014 cost = 0.028178741
Epoch: 0015 cost = 0.027132071
Learning Finished!
Accuracy: 0.9939
Implementation
5 October 2018
18
Implementation
5 October 2018
19
Q & A

More Related Content

PDF
Deep Convolutional GANs - meaning of latent space
PDF
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
PDF
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
PDF
PyTorch 튜토리얼 (Touch to PyTorch)
PDF
Speaker Diarization
PDF
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
PDF
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
PPTX
CppConcurrencyInAction - Chapter07
Deep Convolutional GANs - meaning of latent space
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
PyTorch 튜토리얼 (Touch to PyTorch)
Speaker Diarization
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
CppConcurrencyInAction - Chapter07

What's hot (20)

PPTX
Aaex3 group2
PDF
GeoMesa on Apache Spark SQL with Anthony Fox
PDF
Real Time Big Data Management
PDF
Machine(s) Learning with Neural Networks
PDF
Scaling up data science applications
PPTX
Scaling out logistic regression with Spark
PDF
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
PDF
A practical Introduction to Machine(s) Learning
PDF
[241]large scale search with polysemous codes
PDF
Scaling Deep Learning with MXNet
PDF
Nx tutorial basics
PPTX
Time Series Analysis for Network Secruity
PDF
Gradient boosting in practice: a deep dive into xgboost
PDF
VRP2013 - Comp Aspects VRP
PDF
Realtime Analytics
PDF
Sleep Period Optimization Model For Layered Video Service Delivery Over eMBMS...
PDF
Introduction to Big Data Science
PDF
ModuLab DLC-Medical3
PDF
Working with the Scalding Type-Safe API
PPT
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
Aaex3 group2
GeoMesa on Apache Spark SQL with Anthony Fox
Real Time Big Data Management
Machine(s) Learning with Neural Networks
Scaling up data science applications
Scaling out logistic regression with Spark
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
A practical Introduction to Machine(s) Learning
[241]large scale search with polysemous codes
Scaling Deep Learning with MXNet
Nx tutorial basics
Time Series Analysis for Network Secruity
Gradient boosting in practice: a deep dive into xgboost
VRP2013 - Comp Aspects VRP
Realtime Analytics
Sleep Period Optimization Model For Layered Video Service Delivery Over eMBMS...
Introduction to Big Data Science
ModuLab DLC-Medical3
Working with the Scalding Type-Safe API
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
Ad

Similar to 딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet) (20)

PDF
Machine Learning Model for M.S admissions
PDF
PPT
Matlab Nn Intro
PPTX
curve fitting or regression analysis-1.pptx
PDF
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
PDF
AIML4 CNN lab256 1hr (111-1).pdf
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
PPTX
Convolution Neural Network Lecture Slides
PDF
Machine Learning and Go. Go!
PPTX
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
PPTX
Deep learning requirement and notes for novoice
PPTX
Dimension reduction techniques[Feature Selection]
PDF
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
PDF
REvit training
PDF
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
PDF
digit recognition recognition in computer science
PDF
maxbox starter60 machine learning
PDF
Deep Learning with Julia1.0 and Flux
PPT
Learn Matlab
PDF
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Machine Learning Model for M.S admissions
Matlab Nn Intro
curve fitting or regression analysis-1.pptx
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
AIML4 CNN lab256 1hr (111-1).pdf
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Convolution Neural Network Lecture Slides
Machine Learning and Go. Go!
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Deep learning requirement and notes for novoice
Dimension reduction techniques[Feature Selection]
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
REvit training
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
digit recognition recognition in computer science
maxbox starter60 machine learning
Deep Learning with Julia1.0 and Flux
Learn Matlab
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Ad

More from Hansol Kang (20)

PDF
이 세계로의 전송_파이썬과 함께하는 궤도모험.pdf
PDF
Support Vector Machine - 기본 이해와 OpenCV 실습.pdf
PDF
ROS 시작하기(Getting Started with ROS:: Your First Steps in Robot Programming )
PPTX
관측 임무스케줄링 (Selecting and scheduling observations of agile satellites)
PDF
알아두면 쓸모있는 깃허브 2
PDF
알아두면 쓸모있는 깃허브 1
PDF
FPN 리뷰
PDF
R-FCN 리뷰
PDF
basic of deep learning
PDF
파이썬 제대로 활용하기
PPTX
모던 C++ 정리
PDF
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
PDF
InfoGAN : Interpretable Representation Learning by Information Maximizing Gen...
PDF
문서와 개발에 필요한 간단한 팁들(Too easy, but important things - document, development)
PDF
신뢰 전파 기법을 이용한 스테레오 정합(Stereo matching using belief propagation algorithm)
PDF
HSV 컬러 공간에서의 레티넥스와 채도 보정을 이용한 화질 개선 기법
PDF
QT 프로그래밍 기초(basic of QT programming tutorial)
PDF
Continuously Adaptive Mean Shift(CAMSHIFT)
PDF
Mobile Robot PD and DOB control
PDF
딥러닝 기초 - XOR 문제와 딥뉴럴넷(Basic of DL - XOR problem and DNN)
이 세계로의 전송_파이썬과 함께하는 궤도모험.pdf
Support Vector Machine - 기본 이해와 OpenCV 실습.pdf
ROS 시작하기(Getting Started with ROS:: Your First Steps in Robot Programming )
관측 임무스케줄링 (Selecting and scheduling observations of agile satellites)
알아두면 쓸모있는 깃허브 2
알아두면 쓸모있는 깃허브 1
FPN 리뷰
R-FCN 리뷰
basic of deep learning
파이썬 제대로 활용하기
모던 C++ 정리
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
InfoGAN : Interpretable Representation Learning by Information Maximizing Gen...
문서와 개발에 필요한 간단한 팁들(Too easy, but important things - document, development)
신뢰 전파 기법을 이용한 스테레오 정합(Stereo matching using belief propagation algorithm)
HSV 컬러 공간에서의 레티넥스와 채도 보정을 이용한 화질 개선 기법
QT 프로그래밍 기초(basic of QT programming tutorial)
Continuously Adaptive Mean Shift(CAMSHIFT)
Mobile Robot PD and DOB control
딥러닝 기초 - XOR 문제와 딥뉴럴넷(Basic of DL - XOR problem and DNN)

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
A Presentation on Touch Screen Technology
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
A novel scalable deep ensemble learning framework for big data classification...
A comparative study of natural language inference in Swahili using monolingua...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
OMC Textile Division Presentation 2021.pptx
A Presentation on Touch Screen Technology
SOPHOS-XG Firewall Administrator PPT.pptx
Chapter 5: Probability Theory and Statistics
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Univ-Connecticut-ChatGPT-Presentaion.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
1. Introduction to Computer Programming.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
WOOl fibre morphology and structure.pdf for textiles
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid model detection and classification of lung cancer
Hindi spoken digit analysis for native and non-native speakers
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
A novel scalable deep ensemble learning framework for big data classification...

딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)

  • 1. Basic of DCNN : AlexNet and VggNet ISL lab Seminar Han-Sol Kang 08 May 2017
  • 2. 5 October 2018 2 Contents Introduction DCNN VGG net Implementa tion
  • 3. Introduction Machine Learning 5 October 2018 3 Machine Learning Perceptron MLP BP, DL CNN DCNN SVM
  • 4. Introduction CNN 5 October 2018 4 1 2    S FHPH OH 1 2    S FWPW OW 1 2 3 0 0 1 2 3 3 0 1 2 2 3 0 1 2 0 1 0 1 2 1 0 2 7 12 10 2 4 15 16 10 10 6 15 6 8 10 4 3 *  <LeNet>
  • 6. DCNN AlexNet 5 October 2018 6 “We used the wrong type of non-linearity” Geoffrey Hinton ReLU(Rectified Linear Unit)
  • 7. VGG Net* Convolutional network hit 5 October 2018 7 Accuracy Depth * Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
  • 8. VGG Net ConvNet Configuration 5 October 2018 8 A 11 weight layers Input(224x224RGBimage) conv3-64 maxpool conv3-128 maxpool conv3-256 conv3-256 maxpool conv3-512 conv3-512 maxpool conv3-512 conv3-512 maxpool FC-4096 FC-4096 FC-1000 soft-max A-LRN 11 weight layers conv3-64 LRN conv3-128 conv3-256 conv3-256 conv3-512 conv3-512 conv3-512 conv3-512 B 13 weight layers conv3-64 conv3-64 conv3-128 conv3-128 conv3-256 conv3-256 conv3-512 conv3-512 conv3-512 conv3-512 C 16 weight layers conv3-64 conv3-64 conv3-128 conv3-128 conv3-256 conv3-256 conv1-256 conv3-512 conv3-512 conv1-512 conv3-512 conv3-512 conv1-512 D 16 weight layers conv3-64 conv3-64 conv3-128 conv3-128 conv3-256 conv3-256 conv3-256 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 E 19 weight layers conv3-64 conv3-64 conv3-128 conv3-128 conv3-256 conv3-256 conv3-256 conv3-256 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512
  • 9. VGG Net Training 5 October 2018 9  Mini-batch gradient descent with momentum (batch size : 256, momentum : 0.9) W -vv    L  vWW  W -WW    L   Weight decay( , ) & dropout (0.5) regularization2L 4 105    초기 learning rate는 로 설정2 10  A 네트워크 트레이닝 깊은 네트워크 트레이닝(초기 4개 Conv, 3개의 FC)
  • 10. VGG Net Training 5 October 2018 10  Data augmentation(flip, RGB color shift, rescaling) Single scale training : S를 고정(256 & 384) Multi-scale training : S를 일정 범위 안에서 랜덤으로 지정[𝑆 𝑚𝑖𝑛, 𝑆 𝑚𝑎𝑥], (𝑆 𝑚𝑖𝑛 = 256, 𝑆 𝑚𝑎𝑥 = 512) * S : 트레이닝 스케일, 입력 이미지의 비율을 유지하면서 스케일링 했을 때 가장 작은 면.
  • 11. VGG Net Testing 5 October 2018 11  테스팅 스케일 Q를 사용  첫번째 FC layer는 7x7 conv.layer 마지막 두 개의 FC layer는 1x1 conv.layer  Dense evaluation을 이용. (multi-crop 방식과 같이 사용시 성능 향상)
  • 12. VGG Net Classification experiments 5 October 2018 12 ConvNet config. smallest image side top-1 val. error(%) top-5 val. error(%) train(S) test(Q) A 256 256 29.6 10.4 A-LRN 256 256 29.7 10.5 B 256 256 28.7 9.9 C 256 256 28.1 9.4 384 384 28.1 9.3 [256;512] 384 27.3 8.8 D 256 256 27.0 8.8 384 384 26.8 8.7 [256;512] 384 25.6 8.1 E 256 256 27.3 9.0 384 384 26.9 8.7 [256;512] 384 25.5 8.0 ConvNet config. smallest image side top-1 val. error(%) top-5 val. error(%) train(S) test(Q) B 256 224, 256, 288 28.2 9.6 C 256 224, 256, 288 27.7 9.2 384 352, 384, 416 27.8 9.2 [256;512] 256, 384, 512 26.3 8.2 D 256 224, 256, 288 26.6 8.6 384 352, 384, 416 26.5 8.6 [256;512] 256, 384, 512 24.8 7.5 E 256 224, 256, 288 26.9 8.7 384 352, 384, 416 26.7 8.6 [256;512] 256, 384, 512 24.8 7.5
  • 13. VGG Net Classification experiments 5 October 2018 13 ConvNet config. Evaluation method top-1 val. error(%) top-5 val. error(%) D dense 24.8 7.5 multi-crop 24.6 7.5 multi-crop & dense 24.4 7.2 E dense 24.8 7.5 multi-crop 24.6 7.4 multi-crop & dense 24.4 7.1 Method top-1 val. e rror(%) top-5 val. e rror(%) top-5 test e rror(%) VGG(2 nets, multi-crop & d ense eval.) 23.7 6.8 6.8 VGG(1 net, multi-crop & de nse eval.) 24.4 7.1 7 VGG(ILSVRC submission, 7 nets, dense eval.) 2.47 7.5 7.3 GoogLeNet(1net) 7.9 GoogLeNet(7 nets) 6.7 MSRA(11 nets) 8.1 MSRA(1 net) 27.9 9.1 9.1 Clarifai(multiple nets) 11.7 Clarifai(1 net) 12.5 ZF Net(6nets) 36.0 14.7 14.8 ZF Net(1net) 37.5 16 16.1 OverFeat(7 nets) 34.0 13.2 13.6 OverFeat(1 nets) 35.7 14.2 AlexNet(5 nets) 38.1 16.4 16.4 AlexNet(1 net) 40.7 18.2
  • 14. VGG Net Conclusion 5 October 2018 14  3x3의 아주 작은 컨볼루션 필터를 이용해 깊은 네트워크 구조를 평가.  네트워크의 깊이가 깊어질수록 분류 정확도에 도움을 주는 것을 확인.  전통적인 ConvNet 구조에서 깊이를 증가시켜 좋은 성능을 확인.  VGG-16 & VGG-19 모델 공개
  • 15. Implementation 5 October 2018 15 Epoch: 0001 cost = 0.366208560 Epoch: 0002 cost = 0.091067037 Epoch: 0003 cost = 0.067395312 Epoch: 0004 cost = 0.054241491 Epoch: 0005 cost = 0.046002268 Epoch: 0006 cost = 0.039577450 Epoch: 0007 cost = 0.034572003 Epoch: 0008 cost = 0.030414227 Epoch: 0009 cost = 0.026961391 Epoch: 0010 cost = 0.024227326 Epoch: 0011 cost = 0.020874776 Epoch: 0012 cost = 0.018590417 Epoch: 0013 cost = 0.016660221 Epoch: 0014 cost = 0.014668066 Epoch: 0015 cost = 0.012948724 Learning finished Accuracy 0.9884 import tensorflow as tf import matplotlib.pyplot as plt from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) learning_rate = 0.001 training_epochs = 15 batch_size = 100 X = tf.placeholder(tf.float32, [None, 784]) X_img = tf.reshape(X, [-1, 28, 28, 1]) Y = tf.placeholder(tf.float32, [None, 10]) #첫번째 레이어 W1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01)) L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME') L1 = tf.nn.relu(L1) L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') #두번째 레이어 W2 = tf.Variable(tf.random_normal([3, 3, 32, 64], stddev=0.01)) L2 = tf.nn.conv2d(L1, W2, strides=[1, 1, 1, 1], padding='SAME') L2 = tf.nn.relu(L2) L2 = tf.nn.max_pool(L2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') L2 = tf.reshape(L2, [-1, 7*7*64]) #FC 연결하기 위해 벡터로 W3 = tf.get_variable("W3", shape=[7*7*64,10], initializer=tf.contrib.layers.xavier_initializer()) b = tf.Variable(tf.random_normal([10])) hypothesis = tf.matmul(L2,W3) + b cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=hypothesis, labels=Y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) sess = tf.Session() sess.run(tf.global_variables_initializer()) print('Learning started. It takes sometime') for epoch in range(training_epochs): avg_cost=0 total_batch = int(mnist.train.num_examples / batch_size) for i in range(total_batch): batch_xs , batch_ys = mnist.train.next_batch(batch_size) feed_dict={X:batch_xs, Y:batch_ys} c,_, = sess.run([cost, optimizer], feed_dict=feed_dict) avg_cost+=c/total_batch print('Epoch:', '%04d' % (epoch+1), 'cost =', '{:.9f}'.format(avg_cost)) print('Learning finished') correct_prediction = tf.equal(tf.argmax(hypothesis,1), tf.argmax(Y,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print('Accuracy', sess.run(accuracy, feed_dict={X:mnist.test.images, Y:mnist.test.labels}))
  • 17. Implementation 5 October 2018 17 Epoch: 0001 cost = 0.409386985 Epoch: 0002 cost = 0.100627775 Epoch: 0003 cost = 0.072903002 Epoch: 0004 cost = 0.060526004 Epoch: 0005 cost = 0.052039743 import tensorflow as tf import random import matplotlib.pyplot as plt from tensorflow.examples.tutorials.mnist import input_data tf.set_random_seed(777) # reproducibility mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # hyper parameters learning_rate = 0.001 training_epochs = 15 batch_size = 100 # dropout (keep_prob) rate 0.7~0.5 on training, but should be 1 for testing keep_prob = tf.placeholder(tf.float32) # input place holders X = tf.placeholder(tf.float32, [None, 784]) X_img = tf.reshape(X, [-1, 28, 28, 1]) # img 28x28x1 (black/white) Y = tf.placeholder(tf.float32, [None, 10]) # L1 ImgIn shape=(?, 28, 28, 1) W1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01)) L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME') L1 = tf.nn.relu(L1) L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') L1 = tf.nn.dropout(L1, keep_prob=keep_prob) … # L5 Final FC 625 inputs -> 10 outputs W5 = tf.get_variable("W5", shape=[625, 10], initializer=tf.contrib.layers.xavier_initializer()) b5 = tf.Variable(tf.random_normal([10])) logits = tf.matmul(L4, W5) + b5 # define cost/loss & optimizer cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # initialize sess = tf.Session() sess.run(tf.global_variables_initializer()) # train my model print('Learning started. It takes sometime.') for epoch in range(training_epochs): avg_cost = 0 total_batch = int(mnist.train.num_examples / batch_size) for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) feed_dict = {X: batch_xs, Y: batch_ys, keep_prob: 0.7} c, _ = sess.run([cost, optimizer], feed_dict=feed_dict) avg_cost += c / total_batch print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost)) print('Learning Finished!') correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print('Accuracy:', sess.run(accuracy, feed_dict={ X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1})) Epoch: 0006 cost = 0.047962842 Epoch: 0007 cost = 0.042300057 Epoch: 0008 cost = 0.039930305 Epoch: 0009 cost = 0.034254246 Epoch: 0010 cost = 0.033424444 Epoch: 0011 cost = 0.032899911 Epoch: 0012 cost = 0.031550007 Epoch: 0013 cost = 0.028447655 Epoch: 0014 cost = 0.028178741 Epoch: 0015 cost = 0.027132071 Learning Finished! Accuracy: 0.9939
  • 20. Q & A