SlideShare a Scribd company logo
MUNICH
IN DATA REPLY
Dmitrii Azarnykh | Data Scientist at Data Reply
Jupyter notebook: https://goo.gl/z6Guvo WLAN: DO-Tagungswelt
HTML version: https://goo.gl/Nh953A PASS: DesignOffice
Git-Hub: https://goo.gl/j8LEb9
CONSULTING TEAMS IN
DATA REPLY
6
DataStrategy
Big Data3
Data Science1
Data Incubator2
Ab Initio4
MicroStrategy5
Teams
• Different aspects of Data Science are done by
different types of specialists
• Python is most used language in Data Science
group
• International, fast-growing team: more than 30
nationalities
• Employees from best Universities, >30% PhD
• Free trainings, certificates
• Traveling to conferences: ICML, ML-Prague
6
DataStrategy
Big Data3
Data Science1
Data Incubator2
Ab Initio4
MicroStrategy5
Teams
CONSULTING TEAMS IN
DATA REPLY
• Different aspects of Data Science are done by
different types of specialists
• Python is most used language in Data Science
group
• International, fast-growing team: more than 30
nationalities
• Employees from best Universities, >30% PhD
• Free trainings, certificates
• Traveling to conferences: ICML, ML-Prague
UNDERSTANDING PYTORCH:
PYTORCH IN IMAGE
PROCESSING
Dmitrii Azarnykh | Data Scientist at Data Reply
Jupyter notebook: https://goo.gl/spXV6b WLAN: DO-Tagungswelt
HTML version: https://goo.gl/Nh953A PASS: DesignOffice
OUTLINE
Automatic Differentiation
Pytorch Basics
Pytorch in image processing
AUTOMATIC
DIFFERENTIATION
THE TRUTH ABOUT TRAINING
DEEP NEURAL NETWORKS
THE TRUTH ABOUT TRAINING
DEEP NEURAL NETWORKS
𝜕𝐹(𝒙, 𝒚, 𝒘)
𝜕𝒘
TWO WAYS COMPUTE GRADIENTS
𝑥1
exp
𝑓(𝑥1)
Backward pathForward path
𝑓(𝑥1)
𝜕𝑓(𝑥1)
𝜕𝑥1
𝑥1
exp
𝑓(𝑥1)
Forward path
𝑓(𝑥1),
𝜕𝑓(𝑥1)
𝜕𝑥1
Backward propagationForward propagation
AUTOMATIC DIFFERENTIATION
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑3
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
Loss function
Weights
AUTOMATIC DIFFERENTIATION
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3
Loss function
Weights
Use 𝑑5 to update weights.
NO BACKPROPAGATIONNEEDED
AUTOMATIC DIFFERENTIATION
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3
How many outputs?
How many inputs?
AUTOMATIC DIFFERENTIATION
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3
one or zero,
cats or dogs
Inception:
6.7 millions
AUTOMATIC DIFFERENTIATION
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3
𝑥1 𝑥2
exp
𝑥3 +
𝑓(𝑥1, 𝑥2)
Backwardpropagation
ofderivativevalues
𝑥4
𝑥5
First store values 𝑥3, 𝑥4, 𝑥5
mult
BACKPROPAGATION: FORWARD
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp mult
+
𝑓(𝑥1, 𝑥2)
Backwardpropagation
ofderivativevalues
𝑑5= 1 (𝑠𝑒𝑒𝑑)
𝑑3= 𝑑5 𝑑4 = 𝑑5
𝑑2= 𝑥1𝑑1= ⅇ 𝑥1+ 𝑥2
First store values 𝑥3, 𝑥4, 𝑥5
Then backpropagate gradients
BACKPROPAGATION: BACKWARD
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
Pytorch meetup
PYTORCH LIBRARY
Dynamic computational graph
Feels like python, not C++
Python library for neural network
Implemented automatic differentiation
PYTORCH TENSOR
• Tensoris N-dimensional array
• Operations on tensors are done on a CPU in parallel or on a GPU
• Syntaxis is similar to numpy
PYTORCH TENSOR
• Tensorfrom numpyarray and numpy array to tensor
• torch.tensor versus torch.as_tensor
PYTORCH TENSOR
Tensorhas many attributes. Some of these attributes:
• Data of a tensoris a tensoritself
• Gradients of tensors is also a tensor of the same size as data tensor or None
• Parameter requires_grad.Need to compute gradients only for weights, not for data
• A function to compute backpropagation
COMPUTE GRADIENTS
Function backward() computesgradients with
backpropagation
What is the output here?
COMPUTE GRADIENTS
RuntimeError: grad can be implicitly created only
for scalar outputs
Function backward() computesgradients with
backpropagation
COMPUTE GRADIENTS
RuntimeError: grad can be implicitly created only
for scalar outputs
are vectors so a gradient is a matrix. What to do?
COMPUTE GRADIENTS
COMPUTE GRADIENTS
gradients double after the second call of
backward()function
COMPUTE GRADIENTS
Gradients always sum up after backpropagation.
So we need to set gradients to zero before calling
backward() function second time.
𝑥1
exp mult
𝑥3
𝑥4𝑑3= 𝑑5
𝜕𝑑5
𝜕𝑑4
= 𝑑5
𝑑4= 𝑑5
𝜕𝑑5
𝜕𝑑3
= 𝑑5
𝑑1= 𝑑3
𝜕𝑑3
𝜕𝑑1
+ 𝑑4
𝜕𝑑4
𝜕𝑑1
= ⅇ 𝑥1 + 𝑥2
Equation of orange line:
ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏
Blue dots:
(𝑥𝑖, 𝑦𝑖)
Minimize sum of squared lengths
of greenlines:
෍
𝑖
(ො𝑦𝑖 − 𝑦𝑖)2
𝑦
𝑥
LINEAR REGRESSION
features and labels, (𝑥𝑖, 𝑦𝑖)
initialize weights, need gradient 𝑤, 𝑏
train with a gradient descent:
• compute predictions, ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏
• backpropagate loss ෌𝑖
( ො𝑦𝑖 − 𝑦𝑖)2
• update weights
• set gradients to zero
LINEAR REGRESSION
LINEAR REGRESSION
also possible to use optimizer to accept weights
as parameters
optimizer updates all weights and sets gradients
of all weights to zero
LINEAR REGRESSION
Note that these two lines do the same as:
LINEAR REGRESSION
Can easily plot results:
for tensors with requires_grad=True
need to cast .detach()function
before transform to numpy
EXCITING PART:
PYTORCH FOR IMAGE
PROCESSING
OUTLINE
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation
STEP 1: BUILD
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation
BUILD ALEXNET MODEL
Weights are downloaded
automatically.
features part is pretrained on
ImageNet.It extracts the most
useful features from the images.
We will not train this part and use
the weights we downloaded.
We will substitute and retrain
this part.
BUILD ALEXNET MODEL
do not need gradients for
features-extractorweights
a new modelforclassification
syntaxis is similar to Keras
set classifieras trainable
set features as not trainable
STEP 2: LOAD
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation
LOAD DATASET
set transformation for data
create dataset: no images in
memoryyet, only their paths
and labels
split train and test: still no
images in memory
balance dataset and create a
generator, which yields batches
of images
Images are loaded in memory
only when iterations happen
STEP 3: CONVERT
1. Build AlexNet model
2. Load dataset
3. CUDA/GPU compatibility
4. Training
5. Speed-up,save-load model,evaluation
CUDA/GPU COMPATIBILITY
Graphical Processing Unit GPU Random-access memory(RAM)
• Model weights
• Images
• Labels
Solid State Drive (SSD)
• Images
(DataLoader)
STEP 4: TRAIN
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save/load model,evaluation
MODEL TRAINING
send images and labels to GPU, if
GPU is used
non_blocking=Trueis used for
asynchronous computations,which
speedsup CUDA computations
MODEL TRAINING
compute predictions,estimate loss
function and backpropagate
MODEL TRAINING
make one step of gradient descent
and set gradients of trainable weights
in alexnet.classifier to zero
only classifierparameters are
passed to the optimizer
MODEL TRAINING
use tqdm to show a progress bar and
evaluate a current average batch loss
function
STEP 5: EVALUATE
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-loadmodel,evaluation
SPEED UP IMAGE LOADING
Graphical Processing Unit GPU Random-access memory(RAM)
• Model weights
• Labels
Solid State Drive (SSD)
• Images
(DataLoader)
MODEL EVALUATION
iterate on test_loaderDataLoader
push labels and probabilities firstly to CPU
and then to numpy
use scikit-learn to show metrics
SAVE/LOAD MODEL
save torch modeland the state
of the optimizer
when loading weights, need to
initialize the modeland the
optimizer first
load the weights and the state of
the optimizer
THANK YOU FOR
YOUR ATTENTION

More Related Content

PPTX
Scaling out logistic regression with Spark
PDF
2014-06-20 Multinomial Logistic Regression with Apache Spark
PDF
Multinomial Logistic Regression with Apache Spark
PDF
Large scale logistic regression and linear support vector machines using spark
PDF
Gan
PDF
Introduction to complex networks
PDF
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
PDF
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Scaling out logistic regression with Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
Large scale logistic regression and linear support vector machines using spark
Gan
Introduction to complex networks
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...

Similar to Pytorch meetup (20)

PDF
OpenPOWER Workshop in Silicon Valley
PDF
Neural networks with python
PDF
Introduction to Machine Learning
PPTX
Introduction to Neural Networks and Deep Learning from Scratch
PDF
Machine learning with py torch
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
PDF
Pytorch for tf_developers
PDF
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
PDF
Deep Style: Using Variational Auto-encoders for Image Generation
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PPTX
Pytorch and Machine Learning for the Math Impaired
PDF
Neural networks using tensor flow in amazon deep learning server
PDF
pytorch-cheatsheet.pdf for ML study with pythroch
PDF
PyTorch for Deep Learning Practitioners
PDF
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
PDF
[update] Introductory Parts of the Book "Dive into Deep Learning"
PPTX
2Wisjshsbebe pehele isienew Dorene isksnwnw
PDF
Neural Networks and Deep Learning
PDF
Neural Networks in the Wild: Handwriting Recognition
OpenPOWER Workshop in Silicon Valley
Neural networks with python
Introduction to Machine Learning
Introduction to Neural Networks and Deep Learning from Scratch
Machine learning with py torch
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Pytorch for tf_developers
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
Deep Style: Using Variational Auto-encoders for Image Generation
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Pytorch and Machine Learning for the Math Impaired
Neural networks using tensor flow in amazon deep learning server
pytorch-cheatsheet.pdf for ML study with pythroch
PyTorch for Deep Learning Practitioners
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
[update] Introductory Parts of the Book "Dive into Deep Learning"
2Wisjshsbebe pehele isienew Dorene isksnwnw
Neural Networks and Deep Learning
Neural Networks in the Wild: Handwriting Recognition
Ad

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Analytics and business intelligence.pdf
1_Introduction to advance data techniques.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
Business Acumen Training GuidePresentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Clinical guidelines as a resource for EBP(1).pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
climate analysis of Dhaka ,Banglades.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Data_Analytics_and_PowerBI_Presentation.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
ISS -ESG Data flows What is ESG and HowHow
.pdf is not working space design for the following data for the following dat...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction-to-Cloud-ComputingFinal.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Ad

Pytorch meetup

  • 1. MUNICH IN DATA REPLY Dmitrii Azarnykh | Data Scientist at Data Reply Jupyter notebook: https://goo.gl/z6Guvo WLAN: DO-Tagungswelt HTML version: https://goo.gl/Nh953A PASS: DesignOffice Git-Hub: https://goo.gl/j8LEb9
  • 2. CONSULTING TEAMS IN DATA REPLY 6 DataStrategy Big Data3 Data Science1 Data Incubator2 Ab Initio4 MicroStrategy5 Teams • Different aspects of Data Science are done by different types of specialists • Python is most used language in Data Science group • International, fast-growing team: more than 30 nationalities • Employees from best Universities, >30% PhD • Free trainings, certificates • Traveling to conferences: ICML, ML-Prague
  • 3. 6 DataStrategy Big Data3 Data Science1 Data Incubator2 Ab Initio4 MicroStrategy5 Teams CONSULTING TEAMS IN DATA REPLY • Different aspects of Data Science are done by different types of specialists • Python is most used language in Data Science group • International, fast-growing team: more than 30 nationalities • Employees from best Universities, >30% PhD • Free trainings, certificates • Traveling to conferences: ICML, ML-Prague
  • 4. UNDERSTANDING PYTORCH: PYTORCH IN IMAGE PROCESSING Dmitrii Azarnykh | Data Scientist at Data Reply Jupyter notebook: https://goo.gl/spXV6b WLAN: DO-Tagungswelt HTML version: https://goo.gl/Nh953A PASS: DesignOffice
  • 7. THE TRUTH ABOUT TRAINING DEEP NEURAL NETWORKS
  • 8. THE TRUTH ABOUT TRAINING DEEP NEURAL NETWORKS 𝜕𝐹(𝒙, 𝒚, 𝒘) 𝜕𝒘
  • 9. TWO WAYS COMPUTE GRADIENTS 𝑥1 exp 𝑓(𝑥1) Backward pathForward path 𝑓(𝑥1) 𝜕𝑓(𝑥1) 𝜕𝑥1 𝑥1 exp 𝑓(𝑥1) Forward path 𝑓(𝑥1), 𝜕𝑓(𝑥1) 𝜕𝑥1 Backward propagationForward propagation
  • 10. AUTOMATIC DIFFERENTIATION 𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥1 =? ; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥2 =? 𝑥1 𝑥2 exp 𝑑3 𝑑2= 0 + 𝑓(𝑥1, 𝑥2) Forwardpropagation ofderivativevalues 𝑑1= 1 𝑑5=𝑑3+ 𝑑4 𝑑4 mult
  • 11. Loss function Weights AUTOMATIC DIFFERENTIATION 𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥1 =? ; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥2 =? 𝑥1 𝑥2 exp 𝑑2= 0 + 𝑓(𝑥1, 𝑥2) Forwardpropagation ofderivativevalues 𝑑1= 1 𝑑5=𝑑3+ 𝑑4 𝑑4 mult 𝑑3
  • 12. Loss function Weights Use 𝑑5 to update weights. NO BACKPROPAGATIONNEEDED AUTOMATIC DIFFERENTIATION 𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥1 =? ; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥2 =? 𝑥1 𝑥2 exp 𝑑2= 0 + 𝑓(𝑥1, 𝑥2) Forwardpropagation ofderivativevalues 𝑑1= 1 𝑑5=𝑑3+ 𝑑4 𝑑4 mult 𝑑3
  • 13. How many outputs? How many inputs? AUTOMATIC DIFFERENTIATION 𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥1 =? ; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥2 =? 𝑥1 𝑥2 exp 𝑑2= 0 + 𝑓(𝑥1, 𝑥2) Forwardpropagation ofderivativevalues 𝑑1= 1 𝑑5=𝑑3+ 𝑑4 𝑑4 mult 𝑑3
  • 14. one or zero, cats or dogs Inception: 6.7 millions AUTOMATIC DIFFERENTIATION 𝑥1 𝑥2 exp 𝑑2= 0 + 𝑓(𝑥1, 𝑥2) Forwardpropagation ofderivativevalues 𝑑1= 1 𝑑5=𝑑3+ 𝑑4 𝑑4 mult 𝑑3
  • 15. 𝑥1 𝑥2 exp 𝑥3 + 𝑓(𝑥1, 𝑥2) Backwardpropagation ofderivativevalues 𝑥4 𝑥5 First store values 𝑥3, 𝑥4, 𝑥5 mult BACKPROPAGATION: FORWARD 𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥1 =? ; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥2 =?
  • 16. 𝑥1 𝑥2 exp mult + 𝑓(𝑥1, 𝑥2) Backwardpropagation ofderivativevalues 𝑑5= 1 (𝑠𝑒𝑒𝑑) 𝑑3= 𝑑5 𝑑4 = 𝑑5 𝑑2= 𝑥1𝑑1= ⅇ 𝑥1+ 𝑥2 First store values 𝑥3, 𝑥4, 𝑥5 Then backpropagate gradients BACKPROPAGATION: BACKWARD 𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥1 =? ; 𝜕𝑓(𝑥1,𝑥2) 𝜕𝑥2 =?
  • 18. PYTORCH LIBRARY Dynamic computational graph Feels like python, not C++ Python library for neural network Implemented automatic differentiation
  • 19. PYTORCH TENSOR • Tensoris N-dimensional array • Operations on tensors are done on a CPU in parallel or on a GPU • Syntaxis is similar to numpy
  • 20. PYTORCH TENSOR • Tensorfrom numpyarray and numpy array to tensor • torch.tensor versus torch.as_tensor
  • 21. PYTORCH TENSOR Tensorhas many attributes. Some of these attributes: • Data of a tensoris a tensoritself • Gradients of tensors is also a tensor of the same size as data tensor or None • Parameter requires_grad.Need to compute gradients only for weights, not for data • A function to compute backpropagation
  • 22. COMPUTE GRADIENTS Function backward() computesgradients with backpropagation What is the output here?
  • 23. COMPUTE GRADIENTS RuntimeError: grad can be implicitly created only for scalar outputs Function backward() computesgradients with backpropagation
  • 24. COMPUTE GRADIENTS RuntimeError: grad can be implicitly created only for scalar outputs are vectors so a gradient is a matrix. What to do?
  • 26. COMPUTE GRADIENTS gradients double after the second call of backward()function
  • 27. COMPUTE GRADIENTS Gradients always sum up after backpropagation. So we need to set gradients to zero before calling backward() function second time. 𝑥1 exp mult 𝑥3 𝑥4𝑑3= 𝑑5 𝜕𝑑5 𝜕𝑑4 = 𝑑5 𝑑4= 𝑑5 𝜕𝑑5 𝜕𝑑3 = 𝑑5 𝑑1= 𝑑3 𝜕𝑑3 𝜕𝑑1 + 𝑑4 𝜕𝑑4 𝜕𝑑1 = ⅇ 𝑥1 + 𝑥2
  • 28. Equation of orange line: ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏 Blue dots: (𝑥𝑖, 𝑦𝑖) Minimize sum of squared lengths of greenlines: ෍ 𝑖 (ො𝑦𝑖 − 𝑦𝑖)2 𝑦 𝑥 LINEAR REGRESSION
  • 29. features and labels, (𝑥𝑖, 𝑦𝑖) initialize weights, need gradient 𝑤, 𝑏 train with a gradient descent: • compute predictions, ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏 • backpropagate loss ෌𝑖 ( ො𝑦𝑖 − 𝑦𝑖)2 • update weights • set gradients to zero LINEAR REGRESSION
  • 30. LINEAR REGRESSION also possible to use optimizer to accept weights as parameters optimizer updates all weights and sets gradients of all weights to zero
  • 31. LINEAR REGRESSION Note that these two lines do the same as:
  • 32. LINEAR REGRESSION Can easily plot results: for tensors with requires_grad=True need to cast .detach()function before transform to numpy
  • 33. EXCITING PART: PYTORCH FOR IMAGE PROCESSING
  • 34. OUTLINE 1. Build AlexNet model 2. Load dataset 3. CUDA/GPUcompatibility 4. Training 5. Speed-up,save-load model,evaluation
  • 35. STEP 1: BUILD 1. Build AlexNet model 2. Load dataset 3. CUDA/GPUcompatibility 4. Training 5. Speed-up,save-load model,evaluation
  • 36. BUILD ALEXNET MODEL Weights are downloaded automatically. features part is pretrained on ImageNet.It extracts the most useful features from the images. We will not train this part and use the weights we downloaded. We will substitute and retrain this part.
  • 37. BUILD ALEXNET MODEL do not need gradients for features-extractorweights a new modelforclassification syntaxis is similar to Keras set classifieras trainable set features as not trainable
  • 38. STEP 2: LOAD 1. Build AlexNet model 2. Load dataset 3. CUDA/GPUcompatibility 4. Training 5. Speed-up,save-load model,evaluation
  • 39. LOAD DATASET set transformation for data create dataset: no images in memoryyet, only their paths and labels split train and test: still no images in memory balance dataset and create a generator, which yields batches of images Images are loaded in memory only when iterations happen
  • 40. STEP 3: CONVERT 1. Build AlexNet model 2. Load dataset 3. CUDA/GPU compatibility 4. Training 5. Speed-up,save-load model,evaluation
  • 41. CUDA/GPU COMPATIBILITY Graphical Processing Unit GPU Random-access memory(RAM) • Model weights • Images • Labels Solid State Drive (SSD) • Images (DataLoader)
  • 42. STEP 4: TRAIN 1. Build AlexNet model 2. Load dataset 3. CUDA/GPUcompatibility 4. Training 5. Speed-up,save/load model,evaluation
  • 43. MODEL TRAINING send images and labels to GPU, if GPU is used non_blocking=Trueis used for asynchronous computations,which speedsup CUDA computations
  • 44. MODEL TRAINING compute predictions,estimate loss function and backpropagate
  • 45. MODEL TRAINING make one step of gradient descent and set gradients of trainable weights in alexnet.classifier to zero only classifierparameters are passed to the optimizer
  • 46. MODEL TRAINING use tqdm to show a progress bar and evaluate a current average batch loss function
  • 47. STEP 5: EVALUATE 1. Build AlexNet model 2. Load dataset 3. CUDA/GPUcompatibility 4. Training 5. Speed-up,save-loadmodel,evaluation
  • 48. SPEED UP IMAGE LOADING Graphical Processing Unit GPU Random-access memory(RAM) • Model weights • Labels Solid State Drive (SSD) • Images (DataLoader)
  • 49. MODEL EVALUATION iterate on test_loaderDataLoader push labels and probabilities firstly to CPU and then to numpy use scikit-learn to show metrics
  • 50. SAVE/LOAD MODEL save torch modeland the state of the optimizer when loading weights, need to initialize the modeland the optimizer first load the weights and the state of the optimizer
  • 51. THANK YOU FOR YOUR ATTENTION