Caffe studying 2017

工業技術研究院機密資料禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Caffe Study
資訊與通訊研究所
劉得彥
Danny Liu

concept of view in Blob
2

Blob
3
• There are some implemented functions,
such as:
▪ assum_diff(), sumsq_data(), update(), asum_data()…
• Has both side of implementation of math
functions
▪ CPU: For example: caffe_axpy
a. Using CBLAS Library
▪ GPU: for example: caffe_gpu_axpy
a. Using cuBLAS Library
• Use SyncMemory class to do the data sync
between CPU and GPU
▪ Always use {cpu,gpu}_data() or mutable_{cpu,gpu}_data()
to get data pointer

Layer
4
• caffe::Layer is a base class
• All the layers as follows all
inherent caffe:Layer
▪ Data, Vision, Recurrent,
Common, Normalization,
Activation, Loss layers, and so
on.
▪ http://caffe.berkeleyvision.org/tut
orial/layers.html

Layers have GPU implemented code
5
• src/caffe/layers/
▪ *_layer.cu
▪ cudnn_*_layer.cu
• src/caffe/util/
▪ math_functions.cu
▪ im2col.cu
• include/caffe/util/
▪ device_alternate.hpp 
CUDA macro definition

Layer
6
• Setup()
▪ Initialize layers
• Forward()
▪ Use bottom blob’s data as input to the layer and
calculate the output/loss to top blob’s data.
• Backward()
▪ Use top blob’s diff as input to the layer and calculate
the diff/gradient to the bottom blob’s diff.
▪ For the calculation of diff/gradient, it’s about
bottom_diff - top_diff · top_data

Solver using NCCL
7

NCCL::Run()
8
• boost::barrier
▪ it is a synchronization
point between multiple
threads.
• Worker
▪ class Worker : public
InternalThread

Caffe training with NCCL
9

Broadcast and All-Reduce in Caffe
10
• Worker is an internal thread served a GPU
• The picture introduces the broadcast and all-reduce operation

The Data Layer
11

Prototxt: Define Net
12

Prototxt: Define Net
13

LeNet
14
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
• LeNet-5
▪ https://world4jason.gitbooks.io/research-
log/content/deepLearning/CNN/Model%20&%20ImgNet/lenet.html

Lenet.prototxt
15
name: "LeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param
{ shape: { dim: 64
dim: 1 dim: 28 dim:
28 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param
{
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param
{
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type:
"InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_para
m {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type:
"InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_para
m {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}

都在GPU
上做
Caffe data flow
16
name: "LogReg"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
data_param {
source: "input_leveldb"
batch_size: 64
}
}
layer {
name: "ip"
type: "InnerProduct"
bottom: "data"
top: "ip"
inner_product_param {
num_output: 2
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip"
bottom: "label"
top: "loss"
}
Forward()
Forward()
Forward()
Load_batch()
Copy data
to GPU
計算 output
計算 output
計算 loss
Backward()
Backward()
計算 gradient
計算 gradient
Step()
ApplyUpdate()
ForwardBackward()
Normalize(param_id);
Regularize(param_id);
this->net_->Update();
Prefetching
data and label
cblas_saxpy
(data, diff)
on_gradients_ready()
NCCL: All-Reduce
for diff/gradient
Learnable
params

工業技術研究院機密資料禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 17
• blocking_queue.cpp:49 Waiting for data
43 template<typename T>
44 T BlockingQueue<T>::pop(const string& log_on_wait) {
45 boost::mutex::scoped_lock lock(sync_->mutex_);
46
47 while (queue_.empty()) {
48 if (!log_on_wait.empty()) {
49 LOG_EVERY_N(INFO, 1000)<< log_on_wait;
50 }
51 sync_->condition_.wait(lock);
52 }
53
54 T t = queue_.front();
55 queue_.pop();
56 return t;
57 }

Batch Data Prefetching
18
先以 InternalThread
將Image放到CPU Memory
再以cudaMemcpyAsync
Copy到 GPU Memory

Caffe studying 2017

More Related Content

What's hot (20)

Similar to Caffe studying 2017 (20)

Recently uploaded (20)

Caffe studying 2017