Neural Network Architectures

Martin Ockajak from Zürich
Software Engineer

Outline
●
Introduction
●
Feed-forward networks
●
Convolutional networks
●
Recurrent networks
●
Learning more

Overview
●
Class of machine learning models
●
Inspired by brain biology
●
Connectionist AI approach
●
Highly parallel computation
●
Various learning types
●
Supervised
●
Reinforcement
●
Unsupervised

Applications
●
Character recognition
●
Medical diagnostics
●
Speech recognition
●
Machine translation
●
Text generation
●
Stock price prediction
●
Optimization problems

Advantages
●
Prediction accuracy
●
Complex non-linear relationships
●
Non-constantly variable data – heteroskedasticity
●
Hard to understand problems
●
Many possible architectures

Disadvantages
●
Large amount of training data
●
Long time to train
●
Computationally expensive
●
Hard to interpret - black box
●
Many possible architectures

Perceptron
●
Simplified model of a neuron (1957)
●
Linear binary classifier
●
Multiple numeric inputs
●
One boolean output
●
Linearly separable classes only

Perceptron
0.5 1 1.5 2
-0.5
-1
-1.5
-2
0.5
1
t
f(t)

Perceptron
●
Inputs
●
Weights
●
Bias
●
w0
●
Sum
●
Activation function
●
Unit step

Multi-layer perceptron
●
Nonlinear classification or regression
●
Inputs
●
Features
●
Hidden layers
●
Parallel neurons feeding the next layer
●
Dot product
●
Sigmoid activation function
●
Output layer
●
Arbitrary activation function

Training
●
Calculate the output
●
Apply differentiable loss function
●
Must be differentiable
●
Should be minimized – optimization problem
●
Gradient descent to update the weights
●
Proportional to the learning rate
●
Stochastic approximations

Training
●
Backpropagation (1974)
●
Derivative of the loss with regard to the weights
●
Apply to previous layers by using the chain rule
●
Regularization
●
Reduce overfitting
●
L1 or L2 norm
●
Dropout – ignore random neurons during training

●
Image classification (1998)
●
Image analysis
●
Object detection
●
Recommender systems
●
Text classification
●
Spatial patterns

●
Convolutional layer
●
Filter that scans the image – convolution matrix
●
Receptive field – filter size
●
Depth – number of filters
●
Space invariant
●
Pooling layer
●
Combine cluster of neurons into one
●
Non-linear down-sampling

●
Fully connected layer
●
Dense
●
Just like in multi-layer perceptron
●
Activation function
●
Rectifier – linear but remove negative values
●
Trains faster and reduces the vanishing gradient problem
●
Output activation function
●
Softmax - single-class
●
Sigmoid - multi-class

Recurrent networks
●
Sequence prediction (1986)
●
Natural language processing
●
Speech recognition
●
Machine translation
●
Generative models
●
Temporal patterns

Recurrent networks
●
Multi-layer perceptron with back-connections
●
Topology is a directed graph
●
Internal state – memory
●
Variable length sequence with dependencies within
●
Training
●
Backpropagation through time
●
Vanishing gradient problem reduction via gated state
●
Long short-term memory (1997)
●
Gated recurrent unit (2014)

Materials
●
Deep Learning @ MIT Press
●
Neural Networks and Deep Learning @ Michael Nielsen
●
Practical Deep Learning @ Coursera
●
Deep Learning Specialization @ Coursera
●
Deep Learning Courses @ edX

Libraries
●
Keras
●
Tensorflow
●
MXNet
●
Theano
●
CNTK
●
PyTorch
●
Deeplearning4j

Neural Network Architectures

More Related Content

What's hot (20)

Similar to Neural Network Architectures (20)

Recently uploaded (20)

Neural Network Architectures