SlideShare a Scribd company logo
Introduction to Deep Learning
M S Ram
Dept. of Computer Science & Engg.
Indian Institute of Technology Kanpur
Reading of Chap. 1 from “Learning Deep Architectures for AI”; Yoshua Bengio; FTML Vol. 2, No. 1 (2009) 1–127
1
Date: 12 Nov, 2015
A Motivational Task: Percepts  Concepts
• Create algorithms
• that can understand scenes and describe
them in natural language
• that can infer semantic concepts to allow
machines to interact with humans using these
concepts
• Requires creating a series of abstractions
• Image (Pixel Intensities)  Objects in Image  Object
Interactions  Scene Description
• Deep learning aims to automatically learn these
abstractions with little supervision
Courtesy: Yoshua Bengio, Learning Deep Architectures for AI
2
Deep Visual-Semantic Alignments for Generating
Image Descriptions (Karpathy, Fei-Fei; CVPR 2015)
"boy is doing backflip
on wakeboard."
“two young girls are
playing with lego toy.”
"man in black shirt is
playing guitar."
"construction worker in
orange safety vest is
working on road."
http://cs.stanford.edu/people/karpathy/deepimagesent/
3
Challenge in Modelling Complex Behaviour
• Too many concepts to learn
• Too many object categories
• Too many ways of interaction between objects categories
• Behaviour is a highly varying function underlying factors
• f: L  V
• L: latent factors of variation
• low dimensional latent factor space
• V: visible behaviour
• high dimensional observable space
• f: highly non-linear function
4
Example: Learning the Configuration Space of a Robotic Arm
5
C-Space Discovery using Isomap
6
How do We Train Deep Architectures?
• Inspiration from mammal brain
• Multiple Layers of “neurons” (Rumelhart et al 1986)
• Train each layer to compose the representations of the previous layer
to learn a higher level abstraction
• Ex: Pixels  Edges  Contours  Object parts  Object categories
• Local Features  Global Features
• Train the layers one-by-one (Hinton et al 2006)
• Greedy strategy
7
Multilayer Perceptron with Back-propagation
First deep learning model (Rumelhart, Hinton, Williams 1986)
input vector
hidden
layers
outputs
Back-propagate
error signal to
get derivatives
for learning
Compare outputs with
correct answer to get
error signal
Source: Hinton’s 2009 tutorial on Deep Belief Networks 8
Drawbacks of Back-propagation based Deep Neural
Networks
• They are discriminative models
• Get all the information from the labels
• And the labels don’t give so much of information
• Need a substantial amount of labeled data
• Gradient descent with random initialization leads to poor local
minima
Hand-written digit recognition
• Classification of MNIST hand-written digits
• 10 digit classes
• Input image: 28x28 gray scale
• 784 dimensional input
A Deeper Look at the Problem
• One hidden layer with 500 neurons
=> 784 * 500 + 500 * 10
≈ 0.4 million weights
• Fitting a model that best explains the training data is an
optimization problem in a 0.4 million dimensional space
• It’s almost impossible for Gradient descent with random
initialization to arrive at the global optimum
A Solution – Deep Belief Networks
(Hinton et al. 2006)
Pre-trained
N/W Weights
Fast unsupervised
pre-training
Good
Solution
Slow Fine-tuning
(Using Back-propagation)
Very slow Back-propagation
(Often gets stuck at poor local minima)
Random
Initial position
Very high-dimensional parameter space
A Solution – Deep Belief Networks
(Hinton et al. 2006)
• Before applying back-propagation, pre-train the network as a
series of generative models
• Use the weights of the pre-trained network as the initial point
for the traditional back-propagation
• This leads to quicker convergence to a good solution
• Pre-training is fast; fine-tuning can be slow
Quick Check: MLP vs DBN on MNIST
• MLP (1 Hidden Layer)
• 1 hour: 2.18%
• 14 hours: 1.65%
• DBN
• 1 hour: 1.65%
• 14 hours: 1.10%
• 21 hours: 0.97%
Intel QuadCore 2.83GHz, 4GB RAM
MLP: Python :: DBN: Matlab
Intermediate Representations in Brain
• Disentanglement of factors of variation
underlying the data
• Distributed Representations
• Activation of each neuron is a function of
multiple features of the previous layer
• Feature combinations of different neurons
are not necessarily mutually exclusive
• Sparse Representations
• Only 1-4% neurons are active at a time
15
Localized Representation
Distributed Representation
Local vs. Distributed in Input Space
• Local Methods
• Assume smoothness prior
• g(x) = f(g(x1), g(x2), …, g(xk))
• {x1, x2, …, xk} are neighbours of x
• Require a metric space
• A notion of distance or similarity in the input space
• Fail when the target function is highly varying
• Examples
• Nearest Neighbour methods
• Kernel methods with a Gaussian kernel
• Distributed Methods
• No assumption of smoothness  No need for a notion of similarity
• Ex: Neural networks
16
Multi-task Learning
17
Source: https://en.wikipedia.org/wiki/Multi-task_learning
Desiderata for Learning AI
• Ability to learn complex, highly-varying functions
• Ability to learn multiple levels of abstraction with little human input
• Ability to learn from a very large set of examples
• Training time linear in the number of examples
• Ability to learn from mostly unlabeled data
• Unsupervised and semi-supervised
• Multi-task learning
• Sharing of representations across tasks
• Fast predictions
18
References
 Primary
 Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine
Learning Vol. 2, No. 1 (2009) 1–127
 Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets. Neural
Computation 18 (2006), pp 1527-1554
 Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. Learning Internal
Representations by Error Propagation. David E. Rumelhart, James L. McClelland, and the
PDP research group. (editors), Parallel distributed processing: Explorations in the
microstructure of cognition, Volume 1: Foundations. MIT Press, 1986.
 Secondary
 Hinton, G. E., Learning Multiple Layers of Representation, Trends in Cognitive Sciences, Vol.
11, (2007) pp 428-434.
 Hinton G.E., Tutorial on Deep Belief Networks, Machine Learning Summer School,
Cambridge, 2009
 Andrej Karpathy, Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image
Descriptions. CVPR 2015.

More Related Content

PDF
Deep learning
PPTX
Computer vision lab seminar(deep learning) yong hoon
PDF
An Introduction to Deep Learning
PPTX
Deep Learning - A Literature survey
PDF
Deep Learning: a birds eye view
PDF
Big Data Malaysia - A Primer on Deep Learning
PPTX
Deep learning short introduction
PDF
Neural networks and deep learning
Deep learning
Computer vision lab seminar(deep learning) yong hoon
An Introduction to Deep Learning
Deep Learning - A Literature survey
Deep Learning: a birds eye view
Big Data Malaysia - A Primer on Deep Learning
Deep learning short introduction
Neural networks and deep learning

What's hot (20)

PDF
DSRLab seminar Introduction to deep learning
PPTX
Deep Learning Tutorial
PPTX
Basics of Deep learning
PDF
Deep Learning
PDF
Deep Neural Networks 
that talk (Back)… with style
PPTX
Deep Learning for Artificial Intelligence (AI)
PPTX
HML: Historical View and Trends of Deep Learning
PPTX
Neural Networks-1
PDF
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
PPTX
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
PDF
101: Convolutional Neural Networks
PPTX
Introduction to Deep learning
PPTX
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
PDF
Deep Learning
PDF
Deep learning - A Visual Introduction
PDF
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
PDF
Tutorial on Deep Learning
PPTX
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
PPTX
Ai overview
PDF
Intro To Convolutional Neural Networks
DSRLab seminar Introduction to deep learning
Deep Learning Tutorial
Basics of Deep learning
Deep Learning
Deep Neural Networks 
that talk (Back)… with style
Deep Learning for Artificial Intelligence (AI)
HML: Historical View and Trends of Deep Learning
Neural Networks-1
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
101: Convolutional Neural Networks
Introduction to Deep learning
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning
Deep learning - A Visual Introduction
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Tutorial on Deep Learning
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
Ai overview
Intro To Convolutional Neural Networks
Ad

Similar to Deep learning 1 (20)

PPTX
Deep Learning: Towards General Artificial Intelligence
PDF
Deep Learning & NLP: Graphs to the Rescue!
PDF
Deep Learning, an interactive introduction for NLP-ers
PPTX
Introduction to Deep learning
PDF
MLIP - Chapter 3 - Introduction to deep learning
PPTX
Deep neural networks
PPTX
Tsinghua invited talk_zhou_xing_v2r0
PPTX
Deep learning from a novice perspective
PPT
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
Introduction to Deep Learning: Concepts, Architectures, and Applications
PPTX
A simple presentation for deep learning.
PPT
Introduction_to_DEEP_LEARNING ppt 101ppt
PPTX
Deep learning tutorial 9/2019
PPT
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
PDF
Deep Learning & Tensor flow: An Intro
PPT
Introduction_to_DEEP_LEARNING.ppt
PPTX
Deep_Learning_Algorithms_Presentation.pptx
PPTX
Neural Networks and Deep Learning Basics
PDF
Neural Networks and Deep Learning Syllabus
PPTX
DEEP LEARNING (UNIT 2 ) by surbhi saroha
Deep Learning: Towards General Artificial Intelligence
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning, an interactive introduction for NLP-ers
Introduction to Deep learning
MLIP - Chapter 3 - Introduction to deep learning
Deep neural networks
Tsinghua invited talk_zhou_xing_v2r0
Deep learning from a novice perspective
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Introduction to Deep Learning: Concepts, Architectures, and Applications
A simple presentation for deep learning.
Introduction_to_DEEP_LEARNING ppt 101ppt
Deep learning tutorial 9/2019
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
Deep Learning & Tensor flow: An Intro
Introduction_to_DEEP_LEARNING.ppt
Deep_Learning_Algorithms_Presentation.pptx
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning Syllabus
DEEP LEARNING (UNIT 2 ) by surbhi saroha
Ad

Recently uploaded (20)

PPTX
modul_python (1).pptx for professional and student
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Business Analytics and business intelligence.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Leprosy and NLEP programme community medicine
PPT
Predictive modeling basics in data cleaning process
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Introduction to Data Science and Data Analysis
PDF
Introduction to the R Programming Language
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
modul_python (1).pptx for professional and student
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Business Analytics and business intelligence.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Leprosy and NLEP programme community medicine
Predictive modeling basics in data cleaning process
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Qualitative Qantitative and Mixed Methods.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
SAP 2 completion done . PRESENTATION.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Introduction to Data Science and Data Analysis
Introduction to the R Programming Language
A Complete Guide to Streamlining Business Processes
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf

Deep learning 1

  • 1. Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from “Learning Deep Architectures for AI”; Yoshua Bengio; FTML Vol. 2, No. 1 (2009) 1–127 1 Date: 12 Nov, 2015
  • 2. A Motivational Task: Percepts  Concepts • Create algorithms • that can understand scenes and describe them in natural language • that can infer semantic concepts to allow machines to interact with humans using these concepts • Requires creating a series of abstractions • Image (Pixel Intensities)  Objects in Image  Object Interactions  Scene Description • Deep learning aims to automatically learn these abstractions with little supervision Courtesy: Yoshua Bengio, Learning Deep Architectures for AI 2
  • 3. Deep Visual-Semantic Alignments for Generating Image Descriptions (Karpathy, Fei-Fei; CVPR 2015) "boy is doing backflip on wakeboard." “two young girls are playing with lego toy.” "man in black shirt is playing guitar." "construction worker in orange safety vest is working on road." http://cs.stanford.edu/people/karpathy/deepimagesent/ 3
  • 4. Challenge in Modelling Complex Behaviour • Too many concepts to learn • Too many object categories • Too many ways of interaction between objects categories • Behaviour is a highly varying function underlying factors • f: L  V • L: latent factors of variation • low dimensional latent factor space • V: visible behaviour • high dimensional observable space • f: highly non-linear function 4
  • 5. Example: Learning the Configuration Space of a Robotic Arm 5
  • 7. How do We Train Deep Architectures? • Inspiration from mammal brain • Multiple Layers of “neurons” (Rumelhart et al 1986) • Train each layer to compose the representations of the previous layer to learn a higher level abstraction • Ex: Pixels  Edges  Contours  Object parts  Object categories • Local Features  Global Features • Train the layers one-by-one (Hinton et al 2006) • Greedy strategy 7
  • 8. Multilayer Perceptron with Back-propagation First deep learning model (Rumelhart, Hinton, Williams 1986) input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal Source: Hinton’s 2009 tutorial on Deep Belief Networks 8
  • 9. Drawbacks of Back-propagation based Deep Neural Networks • They are discriminative models • Get all the information from the labels • And the labels don’t give so much of information • Need a substantial amount of labeled data • Gradient descent with random initialization leads to poor local minima
  • 10. Hand-written digit recognition • Classification of MNIST hand-written digits • 10 digit classes • Input image: 28x28 gray scale • 784 dimensional input
  • 11. A Deeper Look at the Problem • One hidden layer with 500 neurons => 784 * 500 + 500 * 10 ≈ 0.4 million weights • Fitting a model that best explains the training data is an optimization problem in a 0.4 million dimensional space • It’s almost impossible for Gradient descent with random initialization to arrive at the global optimum
  • 12. A Solution – Deep Belief Networks (Hinton et al. 2006) Pre-trained N/W Weights Fast unsupervised pre-training Good Solution Slow Fine-tuning (Using Back-propagation) Very slow Back-propagation (Often gets stuck at poor local minima) Random Initial position Very high-dimensional parameter space
  • 13. A Solution – Deep Belief Networks (Hinton et al. 2006) • Before applying back-propagation, pre-train the network as a series of generative models • Use the weights of the pre-trained network as the initial point for the traditional back-propagation • This leads to quicker convergence to a good solution • Pre-training is fast; fine-tuning can be slow
  • 14. Quick Check: MLP vs DBN on MNIST • MLP (1 Hidden Layer) • 1 hour: 2.18% • 14 hours: 1.65% • DBN • 1 hour: 1.65% • 14 hours: 1.10% • 21 hours: 0.97% Intel QuadCore 2.83GHz, 4GB RAM MLP: Python :: DBN: Matlab
  • 15. Intermediate Representations in Brain • Disentanglement of factors of variation underlying the data • Distributed Representations • Activation of each neuron is a function of multiple features of the previous layer • Feature combinations of different neurons are not necessarily mutually exclusive • Sparse Representations • Only 1-4% neurons are active at a time 15 Localized Representation Distributed Representation
  • 16. Local vs. Distributed in Input Space • Local Methods • Assume smoothness prior • g(x) = f(g(x1), g(x2), …, g(xk)) • {x1, x2, …, xk} are neighbours of x • Require a metric space • A notion of distance or similarity in the input space • Fail when the target function is highly varying • Examples • Nearest Neighbour methods • Kernel methods with a Gaussian kernel • Distributed Methods • No assumption of smoothness  No need for a notion of similarity • Ex: Neural networks 16
  • 18. Desiderata for Learning AI • Ability to learn complex, highly-varying functions • Ability to learn multiple levels of abstraction with little human input • Ability to learn from a very large set of examples • Training time linear in the number of examples • Ability to learn from mostly unlabeled data • Unsupervised and semi-supervised • Multi-task learning • Sharing of representations across tasks • Fast predictions 18
  • 19. References  Primary  Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning Vol. 2, No. 1 (2009) 1–127  Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets. Neural Computation 18 (2006), pp 1527-1554  Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. Learning Internal Representations by Error Propagation. David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundations. MIT Press, 1986.  Secondary  Hinton, G. E., Learning Multiple Layers of Representation, Trends in Cognitive Sciences, Vol. 11, (2007) pp 428-434.  Hinton G.E., Tutorial on Deep Belief Networks, Machine Learning Summer School, Cambridge, 2009  Andrej Karpathy, Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. CVPR 2015.