SlideShare a Scribd company logo
Deep Learning
Supervised by prof.asst. Dr.mohammad najem
by: Huda hamdan ali
contents
 Introduction and overveiw
 Deep learning challenges
 Deep N.N
 Unsupervised Preprocessing Networks
 Deep Belief Networks
 Denoising auto encoder
 Stacked Auto Encoders
 Deep Boltzmann Machines
 CNN – Convolutional Neural Networks
 Recurrent N.N
 Long Short-Term Memory RNN (LSTM)
 Generative Adversarial Neural
 Deep Reinforcement Learning
 Applications.
2
introduction
 Deep learning (also known as deep structured
learning or hierarchical learning) is part of a broader family
of machine learning methods based on learning data
representations, as opposed to task-specific algorithms.
Learning can be supervised, semi-
supervised or unsupervised.
 use a cascade of multiple layers of nonlinear
processing units for feature extraction and transformation.
Each successive layer uses the output from the previous layer
as input
 Deep learning architectures such as deep neural
networks, deep belief networks and recurrent neural
networks have been applied to fields including computer
vision, speech recognition, natural language processing,
Introduction cont..
 Deep learning algorithms can be applied to unsupervised learning
tasks.
 This is an important benefit because unlabeled data are more
abundant than labeled data.
Inspired by the Brain
 The first hierarchy of neurons that receives information in the
visual cortex are sensitive to specific edges while brain regions
further down the visual pipeline are sensitive to more complex
structures such as faces.
 Our brain has lots of neurons connected together and the
strength of the connections between neurons represents long
term knowledge.
Deep Learning training
Overview
 Train networks with many layers (Multiple layers work to build an
improved feature space
 First layer learns 1st
order features (e.g. edges…)
 2nd
layer learns higher order features (combinations of first layer
features, combinations of edges, etc.)
 Some models learn in an unsupervised mode and discover general
features of the input space – serving multiple tasks related to the
unsupervised instances (image recognition, etc.)
 Final layer of transformed features are fed into supervised layer(s)
 And entire network is often subsequently tuned using supervised training of
the entire net, using the initial weightings learned in the unsupervised phase
Deep Learning Architecture
A deep neural network consists of a hierarchy of layers, whereby each
layer transforms the input data into more abstract representations (e.g.
edge -> nose -> face). The output layer combines those features to make
predictions
What did it learn?
No more feature engineering
deeplearning
Problems with Back Propagation
 Gradient is progressively getting more dilute
 Below top few layers, correction signal is minimal
 Gets stuck in local minima
 Especially since they start out far from ‘good’ regions
(i.e., random initialization)
DNN challenges
 As with ANNs, many issues can arise with naively trained DNNs. Two
common issues are overfitting and computation time.
 DNNs are prone to overfitting because of the added layers of
abstraction, which allow them to model rare dependencies in the
training data.
 Regularization methods such as Ivakhnenko's unit pruning or
weight decay (regularization) or sparsity (regularization) can be
applied during training to combat overfitting. Alternatively dropout
regularization randomly omits units from the hidden layers during
training.
 This helps to exclude rare dependencies.
 Finally, data can be augmented via methods such as cropping
and rotating such that smaller training sets can be increased in size
to reduce the chances of overfitting.

Challenge cont..
 DNNs must consider many training parameters, such as
the size (number of layers and number of units per layer),
the learning rate and initial weights.
 Sweeping through the parameter space for optimal
parameters may not be feasible due to the cost in time
and computational resources.
 Various tricks such as batching (computing the gradient
on several training examples at once rather than
individual examples) speed up computation.
 The large processing throughput of GPUs has produced
significant speedups in training, because the matrix and
vector computations required are well-suited for GPUs.
Challenge Cont..
 Alternatively, we may need to look for other type of
neural network which has straightforward and
convergent training algorithm.
 CMAC (cerebellar model articulation controller) is such
kind of neural network. For example, there is no need to
adjust learning rates or randomize initial weights for
CMAC. The training process can be guaranteed to
converge in one step with a new batch of data, and the
computational complexity of the training algorithm is
linear with respect to the number of neurons involved
Greedy Layer-Wise Training
1. Train first layer using your data without the labels (unsupervised)
 Since there are no targets at this level, labels don't help.
 Then freeze the first layer parameters and start training the second
layer using the output of the first layer as the unsupervised input to the
second layer
1. Repeat this for as many layers as desired
 This builds the set of robust features
1. Use the outputs of the final layer as inputs to a supervised
layer/model and train the last supervised layer (s) (leave early
weights frozen)
2. Unfreeze all weights and fine tune the full network by training with
a supervised approach, given the pre-training weight settings
15
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Deep Belief Networks(DBNs)
 Unsupervised pre-learning provides a good initialization
of the network
 Probabilistic generative model
 Deep architecture – multiple layers
 Supervised fine-tuning
 Generative: Up-down algorithm
 Discriminative: backpropagation
deeplearning
DBN Greedy training
 First step:
 Construct an RBM with an input layer v and a hidden
layer h
 Train the RBM
 A restricted Boltzmann machine (RBM) is:
a generative stochastic artificial neural network that
can learn a probability distribution over its set of inputs.
deeplearning
deeplearning
Auto-Encoders
 A type of unsupervised learning,
 An autoencoder is typically a feedforward neural network which aims to
learn a compressed, distributed representation (encoding) of a dataset.
 Conceptually, the network is trained to “recreate” the input, i.e., the
input and the target data are the same. In other words: you’re trying to
output the same thing you were input, but compressed in some way.
 In effect, we want a few small nodes in the middle to really learn the
data at a conceptual level, producing a compact representation that in
some way captures the core features of our input.
22
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Denoising Auto-Encoder
Stacked (Denoising) Auto-
Encoders
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Deep Boltzmann Machines
DBMs vs. DBNs
Convolutional Neural Nets
(CNN)
Convolutional Neural Nets (CNN)
Convolution layers a feature detector that automagically learns to filter out
not needed information from an input by using convolution kernel.
Pooling layers compute the max or average value of a particular feature over
a region of the input data (downsizing of input images).Also helps to detect
objects in some unusual places and reduces memory size.
CNN
 High accuracy for image applications – Breaking all records and
doing it using just raw pixel features.
 Special purpose net – Just for images or problems with strong grid-like
local spatial/temporal correlation
 Once trained on one problem (e.g. vision) could use same net (often
tuned) for a new similar problem – general creator of vision features
 Unlike traditional nets, handles variable sized inputs
 Same filters and weights, just convolve across different sized image and
dynamically scale size of pooling regions (not # of nodes), to normalize
 Different sized images, different length speech segments, etc.
 Lots of hand crafting and CV tuning to find the right recipe of
receptive fields, layer interconnections, etc.
 Lots more Hyperparameters than standard nets, and even than other
deep networks, since the structures of CNNs are more handcrafted
 CNNs getting wider and deeper with speed-up techniques (e.g. GPU,
ReLU, etc.) and lots of current research, excitement, and success
31
Recurrent Neural Nets (RNN)
Long Short-Term Memory RNN (LSTM)
Deep Reinforcement
Learning
Generative Adversarial Networks
Deep learning Applications
Deep Learning in Computer
Vision
Image Segmentation
Deep Learning in Computer Vision
Image Captioning
Deep Learning in Computer
Vision
Image Compression
Deep Learning in Computer Vision
Image Localization
Deep Learning in Computer Vision
Image Transformation –Adding
features
Deep Learning in Computer
Vision
Image Colorization
Image Generation –From
Descriptions
Style Transfer –morph images
into paintings
Deep Learning in Audio
Processing Sound Generation
Deep Learning in NLP
Syntax Parsing
Deep Learning in NLP
Generating Text
Deep Learning in Medicine
Skin Cancer Diagnoses
Deep Learning in Medicine
Detection of diabetic eye disease
Deep Learning in Science
Saving Energy
Deep Learning in Cryptography
Learning to encrypt and decrypt
communication
Autonomous drone navigation
with deep learning
deeplearning
Finally ..
 That’s the basic idea..
 There are many types of deep learning,
 different kinds of autoencoder, variations on architectures and
training algorithms, etc…
 Very fast growing area …
Thanks for attention
2017 //H u d a

More Related Content

PPTX
Three classes of deep learning networks
PPTX
Supervised sequence labelling with recurrent neural networks ch1 6
PPTX
Recurrent neural networks for sequence learning and learning human identity f...
PPTX
Deep learning
PDF
A Survey of Deep Learning Algorithms for Malware Detection
PDF
Et25897899
PPTX
Introduction to deep learning
PDF
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Three classes of deep learning networks
Supervised sequence labelling with recurrent neural networks ch1 6
Recurrent neural networks for sequence learning and learning human identity f...
Deep learning
A Survey of Deep Learning Algorithms for Malware Detection
Et25897899
Introduction to deep learning
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks

What's hot (20)

PDF
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
PPT
Character Recognition using Artificial Neural Networks
PPTX
Basics of Deep learning
PDF
DSRLab seminar Introduction to deep learning
PDF
Deep learning frameworks v0.40
PDF
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
PPTX
Survey on Deep Neural Network Watermarking techniques
PDF
Fundamental of deep learning
PPTX
Deep Semi-supervised Learning methods
PPT
NEURAL NETWORKS
 
PDF
Artificial neural network for machine learning
 
PDF
Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
PDF
Top 10 deep learning algorithms you should know in
PPTX
Neural
PDF
Artificial neural networks
PPTX
Image Segmentation Using Deep Learning : A survey
PPS
Neural Networks
PDF
From neural networks to deep learning
PDF
Handwritten digits recognition report
PPTX
Image captioning
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Character Recognition using Artificial Neural Networks
Basics of Deep learning
DSRLab seminar Introduction to deep learning
Deep learning frameworks v0.40
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Survey on Deep Neural Network Watermarking techniques
Fundamental of deep learning
Deep Semi-supervised Learning methods
NEURAL NETWORKS
 
Artificial neural network for machine learning
 
Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
Top 10 deep learning algorithms you should know in
Neural
Artificial neural networks
Image Segmentation Using Deep Learning : A survey
Neural Networks
From neural networks to deep learning
Handwritten digits recognition report
Image captioning
Ad

Similar to deeplearning (20)

PDF
Big Data Malaysia - A Primer on Deep Learning
PDF
AINL 2016: Filchenkov
PPTX
Introduction to deep learning
PPTX
Deep learning from a novice perspective
PPTX
Deep Learning Tutorial
PPTX
Deep learning tutorial 9/2019
PPTX
Introduction to deep learning workshop
PDF
An Introduction to Deep Learning
PPTX
Deep learning short introduction
PPTX
A simple presentation for deep learning.
PDF
Introduction to Deep learning
PDF
MLIP - Chapter 3 - Introduction to deep learning
PDF
Tutorial on Deep Learning
DOCX
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
PPT
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
PPT
Introduction_to_DEEP_LEARNING ppt 101ppt
PDF
Deep learning - A Visual Introduction
PDF
Deep Learning And Business Models (VNITC 2015-09-13)
DOCX
deep learning
PPT
Introduction_to_DEEP_LEARNING.ppt
Big Data Malaysia - A Primer on Deep Learning
AINL 2016: Filchenkov
Introduction to deep learning
Deep learning from a novice perspective
Deep Learning Tutorial
Deep learning tutorial 9/2019
Introduction to deep learning workshop
An Introduction to Deep Learning
Deep learning short introduction
A simple presentation for deep learning.
Introduction to Deep learning
MLIP - Chapter 3 - Introduction to deep learning
Tutorial on Deep Learning
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
Introduction_to_DEEP_LEARNING ppt 101ppt
Deep learning - A Visual Introduction
Deep Learning And Business Models (VNITC 2015-09-13)
deep learning
Introduction_to_DEEP_LEARNING.ppt
Ad

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Computer network topology notes for revision
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Lecture1 pattern recognition............
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Supervised vs unsupervised machine learning algorithms
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Launch Your Data Science Career in Kochi – 2025
Introduction to Knowledge Engineering Part 1
Computer network topology notes for revision
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Lecture1 pattern recognition............
STUDY DESIGN details- Lt Col Maksud (21).pptx
Database Infoormation System (DBIS).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Foundation of Data Science unit number two notes
Introduction-to-Cloud-ComputingFinal.pptx

deeplearning

  • 1. Deep Learning Supervised by prof.asst. Dr.mohammad najem by: Huda hamdan ali
  • 2. contents  Introduction and overveiw  Deep learning challenges  Deep N.N  Unsupervised Preprocessing Networks  Deep Belief Networks  Denoising auto encoder  Stacked Auto Encoders  Deep Boltzmann Machines  CNN – Convolutional Neural Networks  Recurrent N.N  Long Short-Term Memory RNN (LSTM)  Generative Adversarial Neural  Deep Reinforcement Learning  Applications. 2
  • 3. introduction  Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi- supervised or unsupervised.  use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input  Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing,
  • 4. Introduction cont..  Deep learning algorithms can be applied to unsupervised learning tasks.  This is an important benefit because unlabeled data are more abundant than labeled data.
  • 5. Inspired by the Brain  The first hierarchy of neurons that receives information in the visual cortex are sensitive to specific edges while brain regions further down the visual pipeline are sensitive to more complex structures such as faces.  Our brain has lots of neurons connected together and the strength of the connections between neurons represents long term knowledge.
  • 6. Deep Learning training Overview  Train networks with many layers (Multiple layers work to build an improved feature space  First layer learns 1st order features (e.g. edges…)  2nd layer learns higher order features (combinations of first layer features, combinations of edges, etc.)  Some models learn in an unsupervised mode and discover general features of the input space – serving multiple tasks related to the unsupervised instances (image recognition, etc.)  Final layer of transformed features are fed into supervised layer(s)  And entire network is often subsequently tuned using supervised training of the entire net, using the initial weightings learned in the unsupervised phase
  • 7. Deep Learning Architecture A deep neural network consists of a hierarchy of layers, whereby each layer transforms the input data into more abstract representations (e.g. edge -> nose -> face). The output layer combines those features to make predictions
  • 8. What did it learn?
  • 9. No more feature engineering
  • 11. Problems with Back Propagation  Gradient is progressively getting more dilute  Below top few layers, correction signal is minimal  Gets stuck in local minima  Especially since they start out far from ‘good’ regions (i.e., random initialization)
  • 12. DNN challenges  As with ANNs, many issues can arise with naively trained DNNs. Two common issues are overfitting and computation time.  DNNs are prone to overfitting because of the added layers of abstraction, which allow them to model rare dependencies in the training data.  Regularization methods such as Ivakhnenko's unit pruning or weight decay (regularization) or sparsity (regularization) can be applied during training to combat overfitting. Alternatively dropout regularization randomly omits units from the hidden layers during training.  This helps to exclude rare dependencies.  Finally, data can be augmented via methods such as cropping and rotating such that smaller training sets can be increased in size to reduce the chances of overfitting. 
  • 13. Challenge cont..  DNNs must consider many training parameters, such as the size (number of layers and number of units per layer), the learning rate and initial weights.  Sweeping through the parameter space for optimal parameters may not be feasible due to the cost in time and computational resources.  Various tricks such as batching (computing the gradient on several training examples at once rather than individual examples) speed up computation.  The large processing throughput of GPUs has produced significant speedups in training, because the matrix and vector computations required are well-suited for GPUs.
  • 14. Challenge Cont..  Alternatively, we may need to look for other type of neural network which has straightforward and convergent training algorithm.  CMAC (cerebellar model articulation controller) is such kind of neural network. For example, there is no need to adjust learning rates or randomize initial weights for CMAC. The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved
  • 15. Greedy Layer-Wise Training 1. Train first layer using your data without the labels (unsupervised)  Since there are no targets at this level, labels don't help.  Then freeze the first layer parameters and start training the second layer using the output of the first layer as the unsupervised input to the second layer 1. Repeat this for as many layers as desired  This builds the set of robust features 1. Use the outputs of the final layer as inputs to a supervised layer/model and train the last supervised layer (s) (leave early weights frozen) 2. Unfreeze all weights and fine tune the full network by training with a supervised approach, given the pre-training weight settings 15
  • 16. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
  • 17. Deep Belief Networks(DBNs)  Unsupervised pre-learning provides a good initialization of the network  Probabilistic generative model  Deep architecture – multiple layers  Supervised fine-tuning  Generative: Up-down algorithm  Discriminative: backpropagation
  • 19. DBN Greedy training  First step:  Construct an RBM with an input layer v and a hidden layer h  Train the RBM  A restricted Boltzmann machine (RBM) is: a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.
  • 22. Auto-Encoders  A type of unsupervised learning,  An autoencoder is typically a feedforward neural network which aims to learn a compressed, distributed representation (encoding) of a dataset.  Conceptually, the network is trained to “recreate” the input, i.e., the input and the target data are the same. In other words: you’re trying to output the same thing you were input, but compressed in some way.  In effect, we want a few small nodes in the middle to really learn the data at a conceptual level, producing a compact representation that in some way captures the core features of our input. 22
  • 23. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
  • 26. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
  • 30. Convolutional Neural Nets (CNN) Convolution layers a feature detector that automagically learns to filter out not needed information from an input by using convolution kernel. Pooling layers compute the max or average value of a particular feature over a region of the input data (downsizing of input images).Also helps to detect objects in some unusual places and reduces memory size.
  • 31. CNN  High accuracy for image applications – Breaking all records and doing it using just raw pixel features.  Special purpose net – Just for images or problems with strong grid-like local spatial/temporal correlation  Once trained on one problem (e.g. vision) could use same net (often tuned) for a new similar problem – general creator of vision features  Unlike traditional nets, handles variable sized inputs  Same filters and weights, just convolve across different sized image and dynamically scale size of pooling regions (not # of nodes), to normalize  Different sized images, different length speech segments, etc.  Lots of hand crafting and CV tuning to find the right recipe of receptive fields, layer interconnections, etc.  Lots more Hyperparameters than standard nets, and even than other deep networks, since the structures of CNNs are more handcrafted  CNNs getting wider and deeper with speed-up techniques (e.g. GPU, ReLU, etc.) and lots of current research, excitement, and success 31
  • 37. Deep Learning in Computer Vision Image Segmentation
  • 38. Deep Learning in Computer Vision Image Captioning
  • 39. Deep Learning in Computer Vision Image Compression
  • 40. Deep Learning in Computer Vision Image Localization
  • 41. Deep Learning in Computer Vision Image Transformation –Adding features
  • 42. Deep Learning in Computer Vision Image Colorization
  • 44. Style Transfer –morph images into paintings
  • 45. Deep Learning in Audio Processing Sound Generation
  • 46. Deep Learning in NLP Syntax Parsing
  • 47. Deep Learning in NLP Generating Text
  • 48. Deep Learning in Medicine Skin Cancer Diagnoses
  • 49. Deep Learning in Medicine Detection of diabetic eye disease
  • 50. Deep Learning in Science Saving Energy
  • 51. Deep Learning in Cryptography Learning to encrypt and decrypt communication
  • 54. Finally ..  That’s the basic idea..  There are many types of deep learning,  different kinds of autoencoder, variations on architectures and training algorithms, etc…  Very fast growing area …

Editor's Notes

  • #3: Pre-Traiining
  • #16: If do full supervised, we may not bet the benefits of building up the incrementally abstracted feature space Steps 1-4 called pre-training as it gets the weights close enough so that standard training in step 5 can be effective Do fine tuning for sure if lots of labeled data, if little labeled data, not as helpful.
  • #23: Though Deep Nets done first, start with auto-encoders because they are simpler Mention Zipser auotencoder with reverse engineering, then Cottrell compression where unable to reverse engineer If h is smaller than x then “undercomplete” autoencoding – also would use “regularized” autoencoding Can use just new features in the new training set or concatenate both original and new
  • #32: Dynamic size – Pooling region just sums/maxes over an area with one final weight so no hard changes when we adjust pool region size Simard 2003, Simple consistent CNN structure 5x5 with 2x2 subsampling with number of features 5 in first C-layer, 50 in next, until too small. Don’t actually used pool layer as instead just connect every other node which samples rather than max/average. Each layer reduces feature size by (n-3)/2. Just two layers for mnist. They also use elastic distortions which is a type of jitter to get increased data. 99.6% - best at the time, Distortions also help a lot with standard MLP Thus an approach with less Hyperparameter fiddling Ciresan and Schmidhuber 2012, Multi column DNN. CNN with depth 6-10 (deeper if initial input image is bigger), and wider on fields, 1-2 hidden layers in MLP, columns are CNNs (an ensemble with different parameters, features, etc.) where their output is averaged, jitter inputs, multi-day GPU training, annealed LR (.001 dropping to .00003) 99.76% mnist