SlideShare a Scribd company logo
Introduction to
Neural Networks
and Theano
Kushal Arora
Karora@cise.ufl.edu
Outline
1. Why Care?
2. Why it works?
3. Logistic Regression to Neural Networks
4. Deep Networks and Issues
5. Autoencoders and Stacked Autoencoders
6. Why Deep Learning Works
7. Theano Overview
8. Code Hands On
Why Care?
● Has bettered state of art of various tasks
– Speech Recognition
● Microsoft Audio Video Indexing Service speech system uses Deep Learning
● Brought down word error rate by 30% compared to SOTA GMM
– Natural Language Understanding/Processing
● Neural net language models are current state of art in language modeling, Sentiment 
Analysis, Paraphrase Detection and many other NLP task
● SENNA system uses neural embedding for various NLP tasks like POS tagging, NER, 
chunking etc. 
● SENNA is not only better than SOTA but also much faster 
– Object Recognition
● Breakthrough started in 2006 with MNIST dataset. Still the best (0..27% error )
● SOTA error rate over ImageNet dataset down to 15.3% compared to 26.1%   
Why Care?
● MultiTask and Transfer Learning
– Ability to exploit  commonalities of different learning task by transferring learned knowledge
– Example Google Image Search (Woman in a red dress in a garden) 
– Other example is Google's Image Caption (http://googleresearch.blogspot.com/2014/11/a­
picture­is­worth­thousand­coherent.html) 
Why it works?
● Learning Representations
– Time to move over hand crafted features
● Distributed Representation
– More robust and expressive features
● Learning  Multiple Level of Representation
– Learning multiple level of abstraction
#1 Learning Representation
• Handcrafting features is inefficient and time consuming
• Must be done again for each task and domain
• The features are often incomplete and highly correlated
#2 Distributed Representation
● Features can be non­mutually exclusive example language. 
● Need  to  move  beyond  one­hot  representations  such  as  from  clustering 
algorithms, k­nearest neighbors etc
● O(n)  parameters/examples  for  O(N)  input  regions  but  we  can  do  better  O(k) 
parameters/inputs for O(2k
) input regions
● Similar to multiclustering, where multiple clustering algorithms are applied in 
parallel or same clustering algorithm is applied to different input region 
#3 Learning multiple level of 
representation
● Deep architectures promote reuse of features
● Deep architecture can potentially lead to progressively more abstract 
features at higher layer
● Good intermediate representation can be shared across tasks
● Insufficient depth can be exponentially inefficient
 
The Basics
●
Building block of the network is a neuron. 
●
The basic terminologies
– Input
– Activation layer
– Output
●
Activation function is usually of the form
where
input
activation 
function
output
h(w,b)=f (w
T
.x+b)
f ( z)=
1
1+e
−z
Logistic Regression
● Logistic regression is a probabilistic, linear classifier
● Parameterized by W and b
● Each output is probability of input belonging to class yi
● Prob of x being member of class Yi
is calculated as
where
and
P(Y=Yi∣x)=softmaxi(Wx+b)
softmaxi (Wx +b)=
e
W i
Tx
+ bi
∑
j
e
W j
Tx
+ bi
ypred=argmaxYi
P(Y=Yi∣x)
Logistic Regression – Training
● The loss function for logistic regression is negative log likelihood, defined as 
● The loss function minimization is done using stochastic gradient descent or 
mini batch stochastic gradient descent.   
● The function is convex and will always reach global minima which is not true 
for other architectures we will discuss.
● Cannot solve famous XOR problem.
Multilayer preceptron
● An MLP can be viewed as a logistic regression classifier where the
input is first transformed using a learned non-linear transformation.
● The non linear transformation can be a sigmoid or a tanh function.
● Due to addition of non linear hidden layer, the loss function now is non
convex
● Though there is no sure way to avoid minima, some empirical value
initializations of the weight matrix helps.
Multilayer preceptron – Training 
● Let  D be the size of input vector and there be L output labels. The output 
vector (of size L) will be given as 
 
where   and   are activation functions.
● The hard part of training is calculating the gradient for stochastic gradient 
descent and of course avoid the local minima. 
● Back­propagation used as an optimization technique to avoid calculation 
gradient at each step. (It is like Dynamic programming for derivative chain 
rule recursion)
 
G s
Deep Neural Nets and issues
● Generally networks with more than 2­3 hidden layers are 
called deep nets. 
● Pre 2006 they performed worse than shallow networks. 
● Though hard to analyze the exact cause the experimental 
results suggested that gradient­based training for deep 
nets got stuck in local minima due to gradient diffusion 
● It was difficult to get good generalization.
● Random initialization which worked for shallow network 
couldn't be used deep networks.
● The issue was solved using a unsupervised pre­training 
of hidden layers followed by a fine tuning or supervised 
training.
Auto­Encoders
● Multilayer neural nets with target output = input.
● Reconstruction = decoder(encoder(input))
● Objective is to minimize the reconstruction error.
● PCA can be seen as a auto-encoder with and
● So autoencoder could be seen as an non-linear PCA which tries of learn latent
representation of input.
a=tanh(Wx+b)
x '=tanh(W
Tx
+c)
L=‖x '−x‖2
a=Wx x'=W
T
a
Auto­Encoders ­ Training
Stacked Auto­encoders
●
One way of creating deep networks.
●
Other is Deep Belief Networks that is made of 
stacked RBMs trained greedily.
●
Training is divided into two steps
– Pre Training
– Fine Tuning
●
In Pre Training auto­encoders are recursively 
trained one at a time.
●
In second phase, i.e. fine tuning, the whole 
network is trained using to minimize negative 
log likelihood of predictions.
●
Pre Training is unsupervised but post training 
is supervised 
Pre­Training
Why Pre­Training Works
● Hard to know exactly as deep nets are hard to analyze
● Regularization Hypothesis
– Pre­Training acts as adding regularization term leading to better generalization.
– It can be seen as good representation for P(X) leads to good representation of P(Y|X) 
● Optimization Hypothesis
– Pre­Training leads to weight initialization that restricts the parameter space near better 
local minima
– These minimas are not achievable via random initialization 
Other Type of Neural Networks
● Recurrent Neural Nets
● Recursive Neural Networks
● Long Short Term Memory Architecture
● Convolutional Neural Networks
Libraries to use
● Theano/Pylearn2 (U Motreal, Python)
● Caffe (UCB, C++, Fast, CUDA based)
● Torch7 (NYU, Lua, supported by Facebook)
Introduction to Theano
What is Theano?
Theano is a Python library that allows you to define, optimize, and evaluate mathematical 
expressions involving multi­dimensional arrays efficiently. 
Why should it be used?
– Tighter integration with numpy
– Transparent use of GPU
– Efficient symbolic representation
– Does lots of heavy lifting like gradient calculation for you 
– Uses simple abstractions for tensors and matrices so you don't have to deal with it
 Theano Basics (Tutorial)
● How To: 
– Declare Expression
– Compute Gradient 
● How expression is evaluated. (link)
● Debugging
Implementations
● Logistic Regression
● Multilayer preceptron
● Auto­Encoders
● Stacked Auto­Encoders
Refrences
● [Bengio07]  Bengio, P. Lamblin, D. Popovici and H. Larochelle, Greedy 
Layer­Wise Training of Deep Networks, in Advances in Neural Information 
Processing Systems 19 (NIPS‘06), pages 153­160, MIT Press 2007.
● [Bengio09]    Bengio, Learning deep architectures for AI, Foundations and 
Trends in Machine Learning 1(2) pages 1­127.
● [Xavier10] Bengio, X. Glorot, Understanding the difficulty of training 
deep feedforward neuralnetworks, AISTATS 2010
● Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation 
learning: A review and new perspectives." Pattern Analysis and Machine 
Intelligence, IEEE Transactions on 35.8 (2013): 1798­1828.
● Deep learning for NLP (http://nlp.stanford.edu/courses/NAACL2013/)
Thank You!

More Related Content

PPTX
Neural Networks and Deep Learning
PPTX
Unsupervised Feature Learning
PPTX
Autoencoders for image_classification
PDF
A Brief Introduction on Recurrent Neural Network and Its Application
ODP
Simple Introduction to AutoEncoder
PPTX
Deep learning
PPT
Intro to Deep learning - Autoencoders
PDF
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Neural Networks and Deep Learning
Unsupervised Feature Learning
Autoencoders for image_classification
A Brief Introduction on Recurrent Neural Network and Its Application
Simple Introduction to AutoEncoder
Deep learning
Intro to Deep learning - Autoencoders
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...

What's hot (20)

PPT
introduction to deep Learning with full detail
PDF
Deep Style: Using Variational Auto-encoders for Image Generation
PDF
2010 deep learning and unsupervised feature learning
PDF
Artificial neural network for machine learning
PPTX
Deep neural networks
PPTX
Activation functions and Training Algorithms for Deep Neural network
PDF
Artificial neural networks
PPT
Character Recognition using Artificial Neural Networks
PPT
Artificial Intelligence: Artificial Neural Networks
PDF
Fundamental of deep learning
PPT
Nural network ER. Abhishek k. upadhyay
PPTX
Neural network
PDF
Deep learning frameworks v0.40
PPTX
Deep learning
PPTX
Geek Night 17.0 - Artificial Intelligence and Machine Learning
PDF
Neural network in matlab
PPTX
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
PPTX
Artifical Neural Network and its applications
PPTX
Project presentation
PPT
Artificial neural network
introduction to deep Learning with full detail
Deep Style: Using Variational Auto-encoders for Image Generation
2010 deep learning and unsupervised feature learning
Artificial neural network for machine learning
Deep neural networks
Activation functions and Training Algorithms for Deep Neural network
Artificial neural networks
Character Recognition using Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
Fundamental of deep learning
Nural network ER. Abhishek k. upadhyay
Neural network
Deep learning frameworks v0.40
Deep learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Neural network in matlab
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Artifical Neural Network and its applications
Project presentation
Artificial neural network
Ad

Viewers also liked (20)

PPTX
Reasoning Over Knowledge Base
PPTX
Reasoning Over Knowledge Base
PDF
Logistic Regression Demystified (Hopefully)
PPTX
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
PDF
20170108 微軟大數據整合解決方案- cortana intelligence suite
ODP
簡易KNN
PDF
20160526交通部:「因應氣候變遷之氣象測報精進作為及防災應用」報告
PDF
20170123 外交學院 大數據趨勢與應用
PPTX
機器學習與資料探勘:決策樹
PDF
大數據時代的行動支付風險控制
PDF
0408 開放資料實作課 @ CDC
PDF
20160818巨量資料的分析現況與展望(國發會) 張大明v2.1
PDF
Neural Network in Knowledge Bases
PPTX
30 分鐘學會實作 Python Feature Selection
PPTX
資料視覺化 / 数据可视化 Data Visualization
PPTX
機器學習工具介紹 / 机器学习工具介绍 Demos for Machine Learning Tools
PDF
南港區智慧城市推動全民座談會簡報
PPTX
Introduction to deep learning
PPTX
連淡水阿嬤都聽得懂的 機器學習入門 scikit-learn
PPTX
機器學習簡報 / 机器学习简报 Machine Learning
Reasoning Over Knowledge Base
Reasoning Over Knowledge Base
Logistic Regression Demystified (Hopefully)
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
20170108 微軟大數據整合解決方案- cortana intelligence suite
簡易KNN
20160526交通部:「因應氣候變遷之氣象測報精進作為及防災應用」報告
20170123 外交學院 大數據趨勢與應用
機器學習與資料探勘:決策樹
大數據時代的行動支付風險控制
0408 開放資料實作課 @ CDC
20160818巨量資料的分析現況與展望(國發會) 張大明v2.1
Neural Network in Knowledge Bases
30 分鐘學會實作 Python Feature Selection
資料視覺化 / 数据可视化 Data Visualization
機器學習工具介紹 / 机器学习工具介绍 Demos for Machine Learning Tools
南港區智慧城市推動全民座談會簡報
Introduction to deep learning
連淡水阿嬤都聽得懂的 機器學習入門 scikit-learn
機器學習簡報 / 机器学习简报 Machine Learning
Ad

Similar to Intro to Deep Learning (20)

PPTX
Introduction to deep learning workshop
PDF
Scolari's ICCD17 Talk
PPTX
Deep learning crash course
PPTX
V2.0 open power ai virtual university deep learning and ai introduction
PPTX
323203320037,38,39_Single Layer Feed Forward Networks In AI-1.pptx
PPTX
Visualization of Deep Learning
PPTX
Introduction to deep Learning Fundamentals
PPTX
Introduction to deep Learning Fundamentals
PPTX
Practical ML
ODP
Challenges in Large Scale Machine Learning
PPTX
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
PPTX
Recurrent Neural Networks for Text Analysis
PDF
PR-355: Masked Autoencoders Are Scalable Vision Learners
PPTX
Unit one ppt of deeep learning which includes Ann cnn
PDF
Unit 6: Introduction to Deep Learning & RNN
PDF
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
PPTX
tensorflow.pptx
PPTX
AI hype or reality
PDF
Finding the best solution for Image Processing
PDF
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Introduction to deep learning workshop
Scolari's ICCD17 Talk
Deep learning crash course
V2.0 open power ai virtual university deep learning and ai introduction
323203320037,38,39_Single Layer Feed Forward Networks In AI-1.pptx
Visualization of Deep Learning
Introduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
Practical ML
Challenges in Large Scale Machine Learning
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
Recurrent Neural Networks for Text Analysis
PR-355: Masked Autoencoders Are Scalable Vision Learners
Unit one ppt of deeep learning which includes Ann cnn
Unit 6: Introduction to Deep Learning & RNN
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
tensorflow.pptx
AI hype or reality
Finding the best solution for Image Processing
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
UNIT 4 Total Quality Management .pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPT
introduction to datamining and warehousing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
DOCX
573137875-Attendance-Management-System-original
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPT
Project quality management in manufacturing
PPTX
Safety Seminar civil to be ensured for safe working.
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Model Code of Practice - Construction Work - 21102022 .pdf
UNIT 4 Total Quality Management .pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
Current and future trends in Computer Vision.pptx
Internet of Things (IOT) - A guide to understanding
additive manufacturing of ss316l using mig welding
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
introduction to datamining and warehousing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
bas. eng. economics group 4 presentation 1.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
573137875-Attendance-Management-System-original
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Project quality management in manufacturing
Safety Seminar civil to be ensured for safe working.

Intro to Deep Learning