SlideShare a Scribd company logo
From multiplication to 
convolutional networks 
How do ML with Theano
Today’s Talk 
● A motivating problem 
● Understanding a model based framework 
● Theano 
○ Linear Regression 
○ Logistic Regression 
○ Net 
○ Modern Net 
○ Convolutional Net
Follow along 
Tutorial code at: 
https://github.com/Newmu/Theano-Tutorials 
Data at: 
http://yann.lecun.com/exdb/mnist/ 
Slides at: 
http://goo.gl/vuBQfe
A motivating problem 
How do we program a computer to recognize a picture of a 
handwritten digit as a 0-9? 
What could we do?
A dataset - MNIST 
What if we have 60,000 of these images and their label? 
X = images 
Y = labels 
X = (60000 x 784) #matrix (list of lists) 
Y = (60000) #vector (list) 
Given X as input, predict Y
An idea 
For each image, find the “most similar” image and guess 
that as the label.
An idea 
For each image, find the “most similar” image and guess 
that as the label. 
KNearestNeighbors ~95% accuracy
Trying things 
Make some functions computing relevant information for 
solving the problem
What we can code 
Make some functions computing relevant information for 
solving the problem 
feature engineering
What we can code 
Hard coded rules are brittle and often aren’t obvious or 
apparent for many problems.
Model 
A Machine Learning Framework 
8 
Inputs Computation Outputs
from arXiv:1409.4842v1 [cs.CV] 17 Sep 2014 
A … model? - GoogLeNet
3 mult 
Input Computation Output 
A very simple model 
by x 12
Theano intro
Theano intro 
imports
Theano intro 
imports 
theano symbolic variable initialization
Theano intro 
imports 
theano symbolic variable initialization 
our model
Theano intro 
imports 
theano symbolic variable initialization 
our model 
compiling to a python function
Theano intro 
imports 
theano symbolic variable initialization 
our model 
compiling to a python function 
usage
Theano
Theano 
imports
Theano 
imports 
training data generation
Theano 
imports 
training data generation 
symbolic variable initialization
Theano 
imports 
training data generation 
symbolic variable initialization 
our model
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s)
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s) 
how to change parameter based on learning signal
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s) 
how to change parameter based on learning signal 
compiling to a python function
Theano 
imports 
training data generation 
symbolic variable initialization 
our model 
model parameter initialization 
metric to be optimized by model 
learning signal for parameter(s) 
how to change parameter based on learning signal 
compiling to a python function 
iterate through data 100 times and train model 
on each example of input, output pairs
Theano doing its thing
Zero One Two Three Four Five Six Seven Eight Nine 
0.1 
0. 0. 0.1 0. 0. 0. 0. 0.7 0.1 
Logistic Regression 
softmax(X) 
T.dot(X, w)
Back to Theano
Back to Theano 
convert to correct dtype
Back to Theano 
convert to correct dtype 
initialize model parameters
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions 
classification metric to optimize
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions 
classification metric to optimize 
compile prediction function
Back to Theano 
convert to correct dtype 
initialize model parameters 
our model in matrix format 
loading data matrices 
now matrix types 
probability outputs and maxima predictions 
classification metric to optimize 
compile prediction function 
train on mini-batches of 128 
examples
0 1 2 3 4 5 6 7 8 9 
What it learns
0 1 2 3 4 5 6 7 8 9 
What it learns 
Test Accuracy: 92.5%
Zero One Two Three Four Five Six Seven Eight Nine 
0.0 
0. 0. 0.1 0. 0. 0. 0. 0.9 0. 
y = softmax(T.dot(h, wo)) 
h = T.nnet.sigmoid(T.dot(X, wh)) 
An “old” net (circa 2000)
A “old” net in Theano
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters
2D moons dataset 
courtesy of scikit-learn 
Understanding SGD
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters 
2 layers of computation 
input -> hidden (sigmoid) 
hidden -> output (softmax)
Understanding Sigmoid Units
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters 
2 layers of computation 
input -> hidden (sigmoid) 
hidden -> output (softmax) 
initialize both weight matrices
A “old” net in Theano 
generalize to compute gradient descent on 
all model parameters 
2 layers of computation 
input -> hidden (sigmoid) 
hidden -> output (softmax) 
initialize both weight matrices 
updated version of updates
What an “old” net learns 
Test Accuracy: 98.4%
Zero One Two Three Four Five Six Seven Eight Nine 
0.0 
0. 0. 0.1 0. 0. 0. 0. 0.9 0. 
y = softmax(T.dot(h2, wo)) 
h2 = rectify(T.dot(h, wh)) 
h = rectify(T.dot(X, wh)) 
Noise 
Noise 
Noise 
(or augmentation) 
A “modern” net - 2012+
A “modern” net in Theano
rectifier 
A “modern” net in Theano
Understanding rectifier units
rectifier 
numerically stable softmax 
A “modern” net in Theano
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
A “modern” net in Theano
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
scale the gradient based on running average 
A “modern” net in Theano
2D moons dataset 
courtesy of scikit-learn 
Understanding RMSprop
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
scale the gradient based on running average 
A “modern” net in Theano 
randomly drop values and scale rest
rectifier 
numerically stable softmax 
a running average of the magnitude of the gradient 
scale the gradient based on running average 
A “modern” net in Theano 
randomly drop values and scale rest 
Noise injected into model 
rectifiers now used 
2 hidden layers
What a “modern” net learns 
Test Accuracy: 99.0%
Quantifying the difference
What a “modern” net is doing
from deeplearning.net 
Convolutional Networks
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
highest conv layer has 128 filters and a 3x3 grid of responses 
A convolutional network in Theano
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
highest conv layer has 128 filters and a 3x3 grid of responses 
A convolutional network in Theano 
noise during training
a “block” of computation conv -> activate -> pool -> noise 
convert from 4tensor to normal matrix 
reshape into conv 4tensor (b, c, 0, 1) format 
now 4tensor for conv instead of matrix 
conv weights (n_kernels, n_channels, kernel_w, kerbel_h) 
highest conv layer has 128 filters and a 3x3 grid of responses 
A convolutional network in Theano 
noise during training 
no noise for prediction
Test Accuracy: 99.5% 
What a convolutional network learns
Takeaways 
● A few tricks are needed to get good results 
○ Noise important for regularization 
○ Rectifiers for faster, better, learning 
○ Don’t use SGD - lots of cheap simple improvements 
● Models need room to compute. 
● If your data has structure, your model should 
respect it.
Resources 
● More in-depth theano tutorials 
○ http://www.deeplearning.net/tutorial/ 
● Theano docs 
○ http://www.deeplearning.net/software/theano/library/ 
● Community 
○ http://www.reddit.com/r/machinelearning
A plug 
Keep up to date with indico: 
https://indico1.typeform.com/to/DgN5SP
Questions?

More Related Content

PPTX
Caffe framework tutorial2
PDF
Training Neural Networks
PDF
Deep Learning in theano
PDF
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PDF
Deep Learning through Examples
PPTX
DIY Deep Learning with Caffe Workshop
 
PDF
Applying your Convolutional Neural Networks
Caffe framework tutorial2
Training Neural Networks
Deep Learning in theano
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Deep Learning through Examples
DIY Deep Learning with Caffe Workshop
 
Applying your Convolutional Neural Networks

What's hot (19)

PPTX
Diving into Deep Learning (Silicon Valley Code Camp 2017)
PDF
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
PPTX
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
PDF
Introduction to Neural Networks in Tensorflow
PDF
Language translation with Deep Learning (RNN) with TensorFlow
 
PDF
H2O Deep Learning at Next.ML
PDF
How to win data science competitions with Deep Learning
PDF
Introduction to deep learning @ Startup.ML by Andres Rodriguez
PDF
H2O Distributed Deep Learning by Arno Candel 071614
PDF
PyTorch for Deep Learning Practitioners
PDF
Learning stochastic neural networks with Chainer
PDF
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
PDF
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
PDF
H2O Open Source Deep Learning, Arno Candel 03-20-14
PDF
Distributed implementation of a lstm on spark and tensorflow
PDF
Introduction to Chainer
PPTX
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
PDF
Deep learning for molecules, introduction to chainer chemistry
PDF
Deep Learning in Python with Tensorflow for Finance
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Introduction to Neural Networks in Tensorflow
Language translation with Deep Learning (RNN) with TensorFlow
 
H2O Deep Learning at Next.ML
How to win data science competitions with Deep Learning
Introduction to deep learning @ Startup.ML by Andres Rodriguez
H2O Distributed Deep Learning by Arno Candel 071614
PyTorch for Deep Learning Practitioners
Learning stochastic neural networks with Chainer
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
H2O Open Source Deep Learning, Arno Candel 03-20-14
Distributed implementation of a lstm on spark and tensorflow
Introduction to Chainer
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Deep learning for molecules, introduction to chainer chemistry
Deep Learning in Python with Tensorflow for Finance
Ad

Viewers also liked (20)

PDF
Best Practices in Maintenance and Reliability
PDF
The Ultimate Guide to Creating Visually Appealing Content
PDF
Thai tech startup ecosystem report 2017
PPT
Mri brain anatomy Dr Muhammad Bin Zulfiqar
PPTX
What I Carry: 10 Tools for Success
PPT
Autonomic nervous system Physiology
PDF
Dear NSA, let me take care of your slides.
PDF
What Makes Great Infographics
PDF
Masters of SlideShare
 
PDF
You Suck At PowerPoint!
PDF
10 Ways to Win at SlideShare SEO & Presentation Optimization
PPTX
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
PDF
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
PDF
Engaging Learners with Technology
PDF
2015 Upload Campaigns Calendar - SlideShare
PPTX
Operating Systems - File Management
PDF
Actionable Customer Development
PDF
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
PDF
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
PDF
How to Make Awesome SlideShares: Tips & Tricks
Best Practices in Maintenance and Reliability
The Ultimate Guide to Creating Visually Appealing Content
Thai tech startup ecosystem report 2017
Mri brain anatomy Dr Muhammad Bin Zulfiqar
What I Carry: 10 Tools for Success
Autonomic nervous system Physiology
Dear NSA, let me take care of your slides.
What Makes Great Infographics
Masters of SlideShare
 
You Suck At PowerPoint!
10 Ways to Win at SlideShare SEO & Presentation Optimization
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
Engaging Learners with Technology
2015 Upload Campaigns Calendar - SlideShare
Operating Systems - File Management
Actionable Customer Development
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How to Make Awesome SlideShares: Tips & Tricks
Ad

Similar to Introduction to Deep Learning with Python (20)

PDF
Josh Patterson MLconf slides
 
PDF
Power ai tensorflowworkloadtutorial-20171117
PDF
Gradient Descent Code Implementation.pdf
PDF
Introduction to Deep Learning and neon at Galvanize
PDF
Neural networks with python
PDF
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
PDF
R Cheat Sheet
PDF
Next Generation Programming in R
PDF
Lesson 2 data preprocessing
PDF
Neural networks using tensor flow in amazon deep learning server
PDF
Machine Learning and Go. Go!
PPT
FivaTech
PPT
1212 regular meeting
PDF
maxbox starter60 machine learning
PDF
The ABC of Implementing Supervised Machine Learning with Python.pptx
PDF
Machine learning
PPTX
Android and Deep Learning
PPTX
Image classification using cnn
PDF
Forelasning4
PPTX
Angular and Deep Learning
Josh Patterson MLconf slides
 
Power ai tensorflowworkloadtutorial-20171117
Gradient Descent Code Implementation.pdf
Introduction to Deep Learning and neon at Galvanize
Neural networks with python
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
R Cheat Sheet
Next Generation Programming in R
Lesson 2 data preprocessing
Neural networks using tensor flow in amazon deep learning server
Machine Learning and Go. Go!
FivaTech
1212 regular meeting
maxbox starter60 machine learning
The ABC of Implementing Supervised Machine Learning with Python.pptx
Machine learning
Android and Deep Learning
Image classification using cnn
Forelasning4
Angular and Deep Learning

More from indico data (10)

PDF
Small Data for Big Problems: Practical Transfer Learning for NLP
PDF
Getting to AI ROI: Finding Value in Your Unstructured Content
PDF
Everything You Wanted to Know About Optimization
PDF
ODSC East: Effective Transfer Learning for NLP
PPTX
TensorFlow in Practice
PDF
The Unreasonable Benefits of Deep Learning
PDF
How Machine Learning is Shaping Digital Marketing
PPTX
Deep Advances in Generative Modeling
PPTX
Machine Learning for Non-technical People
PDF
Getting started with indico APIs [Python]
Small Data for Big Problems: Practical Transfer Learning for NLP
Getting to AI ROI: Finding Value in Your Unstructured Content
Everything You Wanted to Know About Optimization
ODSC East: Effective Transfer Learning for NLP
TensorFlow in Practice
The Unreasonable Benefits of Deep Learning
How Machine Learning is Shaping Digital Marketing
Deep Advances in Generative Modeling
Machine Learning for Non-technical People
Getting started with indico APIs [Python]

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Introduction to Business Data Analytics.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Global journeys: estimating international migration
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Quality review (1)_presentation of this 21
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Clinical guidelines as a resource for EBP(1).pdf
Moving the Public Sector (Government) to a Digital Adoption
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
climate analysis of Dhaka ,Banglades.pptx
Introduction to Business Data Analytics.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Global journeys: estimating international migration
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Knowledge Engineering Part 1
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Quality review (1)_presentation of this 21
Miokarditis (Inflamasi pada Otot Jantung)
Launch Your Data Science Career in Kochi – 2025
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Introduction to Deep Learning with Python

  • 1. From multiplication to convolutional networks How do ML with Theano
  • 2. Today’s Talk ● A motivating problem ● Understanding a model based framework ● Theano ○ Linear Regression ○ Logistic Regression ○ Net ○ Modern Net ○ Convolutional Net
  • 3. Follow along Tutorial code at: https://github.com/Newmu/Theano-Tutorials Data at: http://yann.lecun.com/exdb/mnist/ Slides at: http://goo.gl/vuBQfe
  • 4. A motivating problem How do we program a computer to recognize a picture of a handwritten digit as a 0-9? What could we do?
  • 5. A dataset - MNIST What if we have 60,000 of these images and their label? X = images Y = labels X = (60000 x 784) #matrix (list of lists) Y = (60000) #vector (list) Given X as input, predict Y
  • 6. An idea For each image, find the “most similar” image and guess that as the label.
  • 7. An idea For each image, find the “most similar” image and guess that as the label. KNearestNeighbors ~95% accuracy
  • 8. Trying things Make some functions computing relevant information for solving the problem
  • 9. What we can code Make some functions computing relevant information for solving the problem feature engineering
  • 10. What we can code Hard coded rules are brittle and often aren’t obvious or apparent for many problems.
  • 11. Model A Machine Learning Framework 8 Inputs Computation Outputs
  • 12. from arXiv:1409.4842v1 [cs.CV] 17 Sep 2014 A … model? - GoogLeNet
  • 13. 3 mult Input Computation Output A very simple model by x 12
  • 16. Theano intro imports theano symbolic variable initialization
  • 17. Theano intro imports theano symbolic variable initialization our model
  • 18. Theano intro imports theano symbolic variable initialization our model compiling to a python function
  • 19. Theano intro imports theano symbolic variable initialization our model compiling to a python function usage
  • 22. Theano imports training data generation
  • 23. Theano imports training data generation symbolic variable initialization
  • 24. Theano imports training data generation symbolic variable initialization our model
  • 25. Theano imports training data generation symbolic variable initialization our model model parameter initialization
  • 26. Theano imports training data generation symbolic variable initialization our model model parameter initialization metric to be optimized by model
  • 27. Theano imports training data generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s)
  • 28. Theano imports training data generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s) how to change parameter based on learning signal
  • 29. Theano imports training data generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s) how to change parameter based on learning signal compiling to a python function
  • 30. Theano imports training data generation symbolic variable initialization our model model parameter initialization metric to be optimized by model learning signal for parameter(s) how to change parameter based on learning signal compiling to a python function iterate through data 100 times and train model on each example of input, output pairs
  • 32. Zero One Two Three Four Five Six Seven Eight Nine 0.1 0. 0. 0.1 0. 0. 0. 0. 0.7 0.1 Logistic Regression softmax(X) T.dot(X, w)
  • 34. Back to Theano convert to correct dtype
  • 35. Back to Theano convert to correct dtype initialize model parameters
  • 36. Back to Theano convert to correct dtype initialize model parameters our model in matrix format
  • 37. Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices
  • 38. Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types
  • 39. Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions
  • 40. Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions classification metric to optimize
  • 41. Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions classification metric to optimize compile prediction function
  • 42. Back to Theano convert to correct dtype initialize model parameters our model in matrix format loading data matrices now matrix types probability outputs and maxima predictions classification metric to optimize compile prediction function train on mini-batches of 128 examples
  • 43. 0 1 2 3 4 5 6 7 8 9 What it learns
  • 44. 0 1 2 3 4 5 6 7 8 9 What it learns Test Accuracy: 92.5%
  • 45. Zero One Two Three Four Five Six Seven Eight Nine 0.0 0. 0. 0.1 0. 0. 0. 0. 0.9 0. y = softmax(T.dot(h, wo)) h = T.nnet.sigmoid(T.dot(X, wh)) An “old” net (circa 2000)
  • 47. A “old” net in Theano generalize to compute gradient descent on all model parameters
  • 48. 2D moons dataset courtesy of scikit-learn Understanding SGD
  • 49. A “old” net in Theano generalize to compute gradient descent on all model parameters 2 layers of computation input -> hidden (sigmoid) hidden -> output (softmax)
  • 51. A “old” net in Theano generalize to compute gradient descent on all model parameters 2 layers of computation input -> hidden (sigmoid) hidden -> output (softmax) initialize both weight matrices
  • 52. A “old” net in Theano generalize to compute gradient descent on all model parameters 2 layers of computation input -> hidden (sigmoid) hidden -> output (softmax) initialize both weight matrices updated version of updates
  • 53. What an “old” net learns Test Accuracy: 98.4%
  • 54. Zero One Two Three Four Five Six Seven Eight Nine 0.0 0. 0. 0.1 0. 0. 0. 0. 0.9 0. y = softmax(T.dot(h2, wo)) h2 = rectify(T.dot(h, wh)) h = rectify(T.dot(X, wh)) Noise Noise Noise (or augmentation) A “modern” net - 2012+
  • 58. rectifier numerically stable softmax A “modern” net in Theano
  • 59. rectifier numerically stable softmax a running average of the magnitude of the gradient A “modern” net in Theano
  • 60. rectifier numerically stable softmax a running average of the magnitude of the gradient scale the gradient based on running average A “modern” net in Theano
  • 61. 2D moons dataset courtesy of scikit-learn Understanding RMSprop
  • 62. rectifier numerically stable softmax a running average of the magnitude of the gradient scale the gradient based on running average A “modern” net in Theano randomly drop values and scale rest
  • 63. rectifier numerically stable softmax a running average of the magnitude of the gradient scale the gradient based on running average A “modern” net in Theano randomly drop values and scale rest Noise injected into model rectifiers now used 2 hidden layers
  • 64. What a “modern” net learns Test Accuracy: 99.0%
  • 69. a “block” of computation conv -> activate -> pool -> noise A convolutional network in Theano
  • 70. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix A convolutional network in Theano
  • 71. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format A convolutional network in Theano
  • 72. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix A convolutional network in Theano
  • 73. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) A convolutional network in Theano
  • 74. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) highest conv layer has 128 filters and a 3x3 grid of responses A convolutional network in Theano
  • 75. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) highest conv layer has 128 filters and a 3x3 grid of responses A convolutional network in Theano noise during training
  • 76. a “block” of computation conv -> activate -> pool -> noise convert from 4tensor to normal matrix reshape into conv 4tensor (b, c, 0, 1) format now 4tensor for conv instead of matrix conv weights (n_kernels, n_channels, kernel_w, kerbel_h) highest conv layer has 128 filters and a 3x3 grid of responses A convolutional network in Theano noise during training no noise for prediction
  • 77. Test Accuracy: 99.5% What a convolutional network learns
  • 78. Takeaways ● A few tricks are needed to get good results ○ Noise important for regularization ○ Rectifiers for faster, better, learning ○ Don’t use SGD - lots of cheap simple improvements ● Models need room to compute. ● If your data has structure, your model should respect it.
  • 79. Resources ● More in-depth theano tutorials ○ http://www.deeplearning.net/tutorial/ ● Theano docs ○ http://www.deeplearning.net/software/theano/library/ ● Community ○ http://www.reddit.com/r/machinelearning
  • 80. A plug Keep up to date with indico: https://indico1.typeform.com/to/DgN5SP