Deep learning

Neural Network
● Neural network is what in the deep learning
● Designed to simulate human brain
● Consist of several layers and data will be passed
one by one
Input Layer
Hidden Layer
Hidden Layer
Output Layer
Data Input
Data Output

Perceptron
● In one perceptron, output value is computed by
● Where xi is the input, wi is weight and f is activation function
y
x1
x2
x3
f z

One Layer
● For input vector x, output of one layer is computed by
y = Wx+b
z = f(y)
Where W is weight of the layer, b is bias, f is activation function.
● W and b are parameters because we will modify
them to optimize the model

Activation Function
● Without activation function, multiple layers would be meaningless
● Good activation function is non-linear, differentiable, monotonically increasing.
● Logistic function
● Hyperbolic Tangent
● Retified linear function (ReLU)

For Various Problems
● The activation function of the output layer depends on the type of problem
● For regression:
○ Activation function: Identity function
○ Length of output vector: Any
● For binary classification:
○ Activation function: Logistic function
○ Length of output vector: 1
● For multi-class classification:

Example Task
● Task setting:
○ Given picture of hand-written numbers 0-9, we want to tell which number
it is
○ Training dataset consist of a lot of pictures and all picture is labeled with
correct answer.
● Analyze task:
○ Problem type: Multi-class classification
○ Activation function of output layer: Softmax function

Learn: Minimize Error
● Consider error function E(xi, di; W, b) which represents how off the model is
from the true value for the ith picture. Here xi is the vector representation of
picture.
● In our example we use
Here, dik is 1 only if xi is actually picture of number k and otherwise 0.
● We want to modify W and b to minimize error function.

Learn: Gradient Descent
We use Gradient Descent to modify parameters W and b.
● Consider yourself in mountain and willing to reach the top of mountain, but you
lost map and you can’t look distance because of smog. How do you reach the
top?
➔Move to direction that bring you to the highest.
● The vector that indicate the direction to move is called gradient
● Since we want to minimize (instead of maximize), we update parameters by
subtracting gradient.

Learn: Backpropagation
● Neural network with multiple layers are too complex to
simply compute gradients.
● This problem was one bottleneck in early stage of
development of deep learning.
● Backpropagation compute gradient from output layer
to input layer (A lot of chain rules).
Input Layer
Hidden Layer
Hidden Layer
Output Layer
Data Input
Data Output

Why Deep Learning?
● Deep learning has tons of parameters (things that we can change to optimize
the model)
➔Better accuracy
➔Hard to optimize
➔Need a lot of data
● Flexible model
➔Can be used to different types of problems
➔Easy to modify models for various situations

Experiment
● Run python script for the example task
● Input vector is given as vector of length 784
● Configurations are
○ 2 hidden layers
○ 1000 perceptrons for each layers
○ ReLU function for activation function
○ Softmax function for activation function for the output layer
○ Batch size: 100

Result
● For this experiment, I used example code of chainer
● Execution time: about 45min
● Final Validation Loss: 0.107
● Final Validation Accuracy: 0.98

Use case of Deep Learning
● Convolutional Neural Network: Deep learning specific to picture data
○ Object identification
○ Face recognition
● Recurrent Neural Network: Deep learning for sequential data
○ Speech recognition
○ For text
● DQN: Combination of deep learning and Q learning
○ Alpha Go uses DQN and won top level Go player

Deep learning

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Deep learning (20)

Recently uploaded (20)

Deep learning

Editor's Notes