SlideShare a Scribd company logo
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Caffe Study
資訊與通訊研究所
劉得彥
Danny Liu
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
concept of view in Blob
2
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Blob
3
• There are some implemented functions,
such as:
▪ assum_diff(), sumsq_data(), update(), asum_data()…
• Has both side of implementation of math
functions
▪ CPU: For example: caffe_axpy
a. Using CBLAS Library
▪ GPU: for example: caffe_gpu_axpy
a. Using cuBLAS Library
• Use SyncMemory class to do the data sync
between CPU and GPU
▪ Always use {cpu,gpu}_data() or mutable_{cpu,gpu}_data()
to get data pointer
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Layer
4
• caffe::Layer is a base class
• All the layers as follows all
inherent caffe:Layer
▪ Data, Vision, Recurrent,
Common, Normalization,
Activation, Loss layers, and so
on.
▪ http://caffe.berkeleyvision.org/tut
orial/layers.html
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Layers have GPU implemented code
5
• src/caffe/layers/
▪ *_layer.cu
▪ cudnn_*_layer.cu
• src/caffe/util/
▪ math_functions.cu
▪ im2col.cu
• include/caffe/util/
▪ device_alternate.hpp 
CUDA macro definition
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Layer
6
• Setup()
▪ Initialize layers
• Forward()
▪ Use bottom blob’s data as input to the layer and
calculate the output/loss to top blob’s data.
• Backward()
▪ Use top blob’s diff as input to the layer and calculate
the diff/gradient to the bottom blob’s diff.
▪ For the calculation of diff/gradient, it’s about
bottom_diff - top_diff · top_data
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Solver using NCCL
7
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
NCCL::Run()
8
• boost::barrier
▪ it is a synchronization
point between multiple
threads.
• Worker
▪ class Worker : public
InternalThread
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Caffe training with NCCL
9
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Broadcast and All-Reduce in Caffe
10
• Worker is an internal thread served a GPU
• The picture introduces the broadcast and all-reduce operation
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
The Data Layer
11
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Prototxt: Define Net
12
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Prototxt: Define Net
13
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
LeNet
14
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
• LeNet-5
▪ https://world4jason.gitbooks.io/research-
log/content/deepLearning/CNN/Model%20&%20ImgNet/lenet.html
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Lenet.prototxt
15
name: "LeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param
{ shape: { dim: 64
dim: 1 dim: 28 dim:
28 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param
{
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param
{
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type:
"InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_para
m {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type:
"InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_para
m {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
都在GPU
上做
Caffe data flow
16
name: "LogReg"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
data_param {
source: "input_leveldb"
batch_size: 64
}
}
layer {
name: "ip"
type: "InnerProduct"
bottom: "data"
top: "ip"
inner_product_param {
num_output: 2
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip"
bottom: "label"
top: "loss"
}
Forward()
Forward()
Forward()
Load_batch()
Copy data
to GPU
計算 output
計算 output
計算 loss
Backward()
Backward()
計算 gradient
計算 gradient
Step()
ApplyUpdate()
ForwardBackward()
Normalize(param_id);
Regularize(param_id);
this->net_->Update();
Prefetching
data and label
cblas_saxpy
(data, diff)
on_gradients_ready()
NCCL: All-Reduce
for diff/gradient
Learnable
params
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 17
• blocking_queue.cpp:49 Waiting for data
43 template<typename T>
44 T BlockingQueue<T>::pop(const string& log_on_wait) {
45 boost::mutex::scoped_lock lock(sync_->mutex_);
46
47 while (queue_.empty()) {
48 if (!log_on_wait.empty()) {
49 LOG_EVERY_N(INFO, 1000)<< log_on_wait;
50 }
51 sync_->condition_.wait(lock);
52 }
53
54 T t = queue_.front();
55 queue_.pop();
56 return t;
57 }
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Batch Data Prefetching
18
先以 InternalThread
將Image放到CPU Memory
再以cudaMemcpyAsync
Copy到 GPU Memory

More Related Content

PPTX
TensorFlow Studying Part II for GPU
PPTX
TensorFlow Study Part I
PDF
Xdp and ebpf_maps
PDF
【論文紹介】Relay: A New IR for Machine Learning Frameworks
PDF
Vc4c development of opencl compiler for videocore4
PPTX
Onnc intro
PDF
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
PDF
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
TensorFlow Studying Part II for GPU
TensorFlow Study Part I
Xdp and ebpf_maps
【論文紹介】Relay: A New IR for Machine Learning Frameworks
Vc4c development of opencl compiler for videocore4
Onnc intro
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ

What's hot (20)

PDF
Interpreter, Compiler, JIT from scratch
PDF
Tensorflow in Docker
PPTX
Linux Device Tree
PDF
Building Network Functions with eBPF & BCC
PDF
Specializing the Data Path - Hooking into the Linux Network Stack
PDF
Goroutine stack and local variable allocation in Go
PPTX
OpenCL Heterogeneous Parallel Computing
PDF
第11回 配信講義 計算科学技術特論A(2021)
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
C++ amp on linux
PPTX
用Raspberry Pi 學Linux I2C Driver
PDF
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
PPT
Intro2 Cuda Moayad
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
eBPF/XDP
PDF
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
PDF
Kernelvm 201312-dlmopen
PDF
Using Netconf/Yang with OpenDalight
PDF
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PDF
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Interpreter, Compiler, JIT from scratch
Tensorflow in Docker
Linux Device Tree
Building Network Functions with eBPF & BCC
Specializing the Data Path - Hooking into the Linux Network Stack
Goroutine stack and local variable allocation in Go
OpenCL Heterogeneous Parallel Computing
第11回 配信講義 計算科学技術特論A(2021)
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
C++ amp on linux
用Raspberry Pi 學Linux I2C Driver
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Intro2 Cuda Moayad
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
eBPF/XDP
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
Kernelvm 201312-dlmopen
Using Netconf/Yang with OpenDalight
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Ad

Similar to Caffe studying 2017 (20)

PDF
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
PPTX
Layers in Deep Learning & Caffe layers (model architecture )
PDF
Image Classification (20230411)
 
PDF
AIML4 CNN lab256 1hr (111-1).pdf
PDF
20190927 generative models_aia
PDF
deep learning library coyoteの開発(CNN編)
PPTX
AI model in smart manufacturer
PPTX
# Can we trust ai. the dilemma of model adjustment
PDF
LeNet-5
PDF
[系列活動] 手把手的深度學實務
PPTX
TAME: Trainable Attention Mechanism for Explanations
PDF
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
PDF
28 01-2021-05
PPTX
Detection of medical instruments project- PART 1
PDF
[系列活動] 手把手的深度學習實務
PDF
2020 icldla-updated
PDF
Hands-on Tutorial of Deep Learning
PDF
Image Classification using Deep Learning
PDF
[系列活動] 一日搞懂生成式對抗網路
PDF
Deep learning - the conf br 2018
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
Layers in Deep Learning & Caffe layers (model architecture )
Image Classification (20230411)
 
AIML4 CNN lab256 1hr (111-1).pdf
20190927 generative models_aia
deep learning library coyoteの開発(CNN編)
AI model in smart manufacturer
# Can we trust ai. the dilemma of model adjustment
LeNet-5
[系列活動] 手把手的深度學實務
TAME: Trainable Attention Mechanism for Explanations
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
28 01-2021-05
Detection of medical instruments project- PART 1
[系列活動] 手把手的深度學習實務
2020 icldla-updated
Hands-on Tutorial of Deep Learning
Image Classification using Deep Learning
[系列活動] 一日搞懂生成式對抗網路
Deep learning - the conf br 2018
Ad

Recently uploaded (20)

PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
assetexplorer- product-overview - presentation
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PPTX
Trending Python Topics for Data Visualization in 2025
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
GSA Content Generator Crack (2025 Latest)
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
"Secure File Sharing Solutions on AWS".pptx
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
assetexplorer- product-overview - presentation
DNT Brochure 2025 – ISV Solutions @ D365
Trending Python Topics for Data Visualization in 2025
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Computer Software and OS of computer science of grade 11.pptx
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
iTop VPN Crack Latest Version Full Key 2025
Advanced SystemCare Ultimate Crack + Portable (2025)
Topaz Photo AI Crack New Download (Latest 2025)
GSA Content Generator Crack (2025 Latest)
Why Generative AI is the Future of Content, Code & Creativity?
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Salesforce Agentforce AI Implementation.pdf
"Secure File Sharing Solutions on AWS".pptx
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Digital Systems & Binary Numbers (comprehensive )
Oracle Fusion HCM Cloud Demo for Beginners
How to Use SharePoint as an ISO-Compliant Document Management System

Caffe studying 2017

  • 1. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Caffe Study 資訊與通訊研究所 劉得彥 Danny Liu
  • 2. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE concept of view in Blob 2
  • 3. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Blob 3 • There are some implemented functions, such as: ▪ assum_diff(), sumsq_data(), update(), asum_data()… • Has both side of implementation of math functions ▪ CPU: For example: caffe_axpy a. Using CBLAS Library ▪ GPU: for example: caffe_gpu_axpy a. Using cuBLAS Library • Use SyncMemory class to do the data sync between CPU and GPU ▪ Always use {cpu,gpu}_data() or mutable_{cpu,gpu}_data() to get data pointer
  • 4. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Layer 4 • caffe::Layer is a base class • All the layers as follows all inherent caffe:Layer ▪ Data, Vision, Recurrent, Common, Normalization, Activation, Loss layers, and so on. ▪ http://caffe.berkeleyvision.org/tut orial/layers.html
  • 5. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Layers have GPU implemented code 5 • src/caffe/layers/ ▪ *_layer.cu ▪ cudnn_*_layer.cu • src/caffe/util/ ▪ math_functions.cu ▪ im2col.cu • include/caffe/util/ ▪ device_alternate.hpp  CUDA macro definition
  • 6. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Layer 6 • Setup() ▪ Initialize layers • Forward() ▪ Use bottom blob’s data as input to the layer and calculate the output/loss to top blob’s data. • Backward() ▪ Use top blob’s diff as input to the layer and calculate the diff/gradient to the bottom blob’s diff. ▪ For the calculation of diff/gradient, it’s about bottom_diff - top_diff · top_data
  • 7. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Solver using NCCL 7
  • 8. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE NCCL::Run() 8 • boost::barrier ▪ it is a synchronization point between multiple threads. • Worker ▪ class Worker : public InternalThread
  • 9. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Caffe training with NCCL 9
  • 10. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Broadcast and All-Reduce in Caffe 10 • Worker is an internal thread served a GPU • The picture introduces the broadcast and all-reduce operation
  • 11. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE The Data Layer 11
  • 12. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Prototxt: Define Net 12
  • 13. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Prototxt: Define Net 13
  • 14. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE LeNet 14 LeNet: a layered model composed of convolution and subsampling operations followed by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ] • LeNet-5 ▪ https://world4jason.gitbooks.io/research- log/content/deepLearning/CNN/Model%20&%20ImgNet/lenet.html
  • 15. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Lenet.prototxt 15 name: "LeNet" layer { name: "data" type: "Input" top: "data" input_param { shape: { dim: 64 dim: 1 dim: 28 dim: 28 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_para m { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_para m { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "prob" type: "Softmax" bottom: "ip2" top: "prob" }
  • 16. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 都在GPU 上做 Caffe data flow 16 name: "LogReg" layer { name: "mnist" type: "Data" top: "data" top: "label" data_param { source: "input_leveldb" batch_size: 64 } } layer { name: "ip" type: "InnerProduct" bottom: "data" top: "ip" inner_product_param { num_output: 2 } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip" bottom: "label" top: "loss" } Forward() Forward() Forward() Load_batch() Copy data to GPU 計算 output 計算 output 計算 loss Backward() Backward() 計算 gradient 計算 gradient Step() ApplyUpdate() ForwardBackward() Normalize(param_id); Regularize(param_id); this->net_->Update(); Prefetching data and label cblas_saxpy (data, diff) on_gradients_ready() NCCL: All-Reduce for diff/gradient Learnable params
  • 17. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 17 • blocking_queue.cpp:49 Waiting for data 43 template<typename T> 44 T BlockingQueue<T>::pop(const string& log_on_wait) { 45 boost::mutex::scoped_lock lock(sync_->mutex_); 46 47 while (queue_.empty()) { 48 if (!log_on_wait.empty()) { 49 LOG_EVERY_N(INFO, 1000)<< log_on_wait; 50 } 51 sync_->condition_.wait(lock); 52 } 53 54 T t = queue_.front(); 55 queue_.pop(); 56 return t; 57 }
  • 18. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Batch Data Prefetching 18 先以 InternalThread 將Image放到CPU Memory 再以cudaMemcpyAsync Copy到 GPU Memory