Deep learning crash course

Deep Learning Crash
Course
By : Vishwas Narayan

What you should Learn is to
● Build a groundbreaking intelligence just like humans through Deep Learning
● Build Neural Network with your approach and also make them make some
sustainable Decision
● Understand and give your own approach to give a effective training for the
deep learning model

What is the Difference between the Ai and others
Like
1. Deep Learning with ML and AI
2. Machine Learning with AI
Here never forget somehow you train to get the model

What you will learn?
● Loss Function and Optimizers
● Gradient Descent Algorithm
● Neural Network Architecture

What is Deep Learning?
Microsoft Word - Turing Test.doc (umbc.edu)

Machine learning is basically
Teaching machine learn patterns in data

What is Deep Learning
A Machine learning Technique that learn features and tasks directly from data.
Inputs are run through “Neural Networks”
Neural Network have hidden layers

Why Deep Learning?
● Machines never get Fatigue.
● They need to get trained from the Human intelligence -
● They just fetch patterns.

In Deep Learning
Features can be learnt from Raw Data

What they Really mean to us?
Black Box
Neuron
X
Y
Some functional Output
X
Y
Some functional Output that
is inspired by brain

DATA
Algorithm
Output
Traditionally

DATA
Output
DATA
Model
Model
Predictions

Neuron
DATA
Output
Model
Insight
Intent

Why do we need now?
● Data is Prevalent?
● Improved Hardware Architecture
● New Software Architecture

Neural Networks
Inspired by the Neurons on the Brain.

Building Block of the Neural Network is
Neuron

Neural Network
● Take data as the input
● Train themselves to understand the patterns in the data

Learning Process of the Neural Network
1. Learning Process of the Neural Network
2. Forward Propagation
3. Back Propagation

Weights and Biases
● Weights - How important information can this neural network can get
● Bias - allow the right decision to be taken into consideration

Back Propagation
Feedback loop

In Backpropagation
Loss function helps the Neural Network Quantify the deviation front he expected
output.

Randomly Initialize the Parameters

Back Propagation
● Use of the Loss Function
● Go Backwards and self tune the initial weights and biases
● Values adjusted to better fit prediction of the model that is trained from the
data.

Learning Algorithm for the Neural Network
● Initialize the parameters with some tuned and calculated value.
● Feed Input data to the Network
● Compare the Predicted value with the expected data and calculate loss
● Perform Backpropagation to propagate this loss back through the network.
● Update parameters based on the loss
● Iterate the previous steps till the loss is minimized.

Terms used in the Neural Network
Activation Function
● Helps to decide whether a Neuron can be a drop out or can contribute to the
next layer based on the dataset that it is trained on.
● Introduce non linearity into the Neural Network.

Which Activation Function to use?
● For the Binary Classification: Sigmoid or the Relu is Used for the best results.
● In the case of classifiers, sigmoid functions and their combinations often
perform better.
● Because of the vanishing gradient issue, sigmoids and tanh functions are
sometimes avoided.
● The ReLU function is a generic activation function that is employed in the
majority of applications these days.

Activation Function Condition
● If we have dead neurons in our networks, the leaky ReLU function is the best
option.
● Remember that the ReLU function should only be utilised in the hidden
layers.
● As a general guideline, you should start with the ReLU function and then go
on to other activation functions if the ReLU function does not produce the best
results.

The king here is the data nad queen also
No matter what you are training a model using a dataset that is available for
you,Neural Network is as it becomes.

Loss Function
We know that from the Random Weights and Biases the Neural Network Makes
decision - Expected Output ,Weights and Biases are calculated.
Thus they Quantify the deviation of the predicted output by the NEural NEtwork to
the expected output.

The loss functions in Regression are
● Absolute Error Loss
● Huber Loss
● Squared Error Loss

The loss Function in Binary Classification
Binary Cross Entropy
Hinge Loss

Multi Class Classification loss functions
Multi-Class Cross Entropy Loss
KL(Kullback–Leibler) Divergence

Optimizers
During the Training process we will adjust the parameters to minimize the loss
function and make our model as optimized as possible for the Use.

Optimizers are basically
A function that combines to the loss function and model parameters by updating
the Neural Network based on the output of the Loss Function.

Gradient Descent
Iterative function that starts off at a random Point on the loss function and travel
down its slope in steps(learning rate -from user) until it reaches the lowest point of
the function.
1. Again this depends on the data.but they are
2. Most popular optimizer
3. Fast,Robust,Flexible.

Algorithm in the Lay man Terms
1. Calculate what a small change in the each individual weights would do to the
loss function
2. Adjust each parameter based int eh its gradient(differential)
3. Repeat Steps one and Two until lower loss function is calculated by the
Neural Network.

To avoid getting stuck in the local minima
We use Learning Rate
● Usually a small number that is multiplied to the scale of the gradients,which is
any changes made to the weights are quite small.
● If we take large steps as learning rate then algorithm will tend to overshoot
the global minimum
● Where we also don't want the algorithm to take forever to train and converge
to the Global minimum.

They are more robust ,why?
● Like Gradient Descent,Except uses a subset of training example rather than
the entire lot.
● SGD is Gradient descent that uses batch on each training.
● Use of the Momentum to Accumulate gradients.
● Less intensive computation as they are batched.

Backpropagation
A simple implementation of the Gradient descent on the neural network.

AdaGrad
● Adaptive learning rate to individual features.
● Some weights will have different learning rates
● Ideal for the Sparse datasets with many input examples missing
● Learning rate tends to get lower accordingly.

Parameters and Hyperparameters
● What are model parameters?
● Variable from the neural Network whose values can be estimated from the
data.
● Required by the model to make prediction
● Value define the learnt parameters from the data.
● Not set manually.saved as the Neural Network is trained.
Example - Weights and Biases.

What are model Hyper parameters?
● They are configured externally to the neural network : Value cannot be
estimated until we train the dataset on the neural network
● No clear way to find the best value
● When the DL algorithm is tuned ,you are really tuning the hyperparameters.
● This is tuned manually
Example - Learning Rate,C and Alpha in SVM,Epochs,k in the kNEes

Summary
Model parameters -> Estimated front the data
Model Hyperparameters -> Can't be estimated from the data
HyperParameters are often called as the parameters as they are a part of the
Machine Learning that must set manually and tuned.

Epochs,Batches,Batch Size and Iterations
Need to learn to do this to your Neural Network when the dataset is too Big.
Break the dataset into smaller chunks and feed those chunks to the Neural
Network One by One.

Epochs
When the Entire dataset is passed forward to the Neural Network and only once
they get trained in the Network.
We use more than one epoch to help model generalize better and accurate.
There is no absolute count for the dataset as its different for different datasets.

Batch and Batch size
We divide large dataset into the smaller batches and feed those batches to the
Neural Network
Batch Size - Total number of the training examples in the Batches.

Iterations
Number of Batches needed to complete one epoch
Number f batches = Number of iterations in one epochs

Let's have some more insights
Suppose we have 1 million Number if dataset as the Training Example and you
divide the dataset into the batches of 500 ,to Complete 1 Epoch ,it would take
20000 iterations.

Conclusion for the terms used in NN’s
How to design an Architecture?
Which Activation Function to use?
The only thing is to

Types of Learning
There are Three Main Types -
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning

Supervised Learning
● Algorithms designed to learn from Examples.
● Models are trained on well-labelled data
● Each example has
● Input Object - Typically a Vector
● Desired Output Value Supervised Signal

During Training
Searches the pattern and correlate with the desired output.

After Training
Takes the unseen inputs and determine which label to classify it to.

Objective of a Supervised learning model
Is to predict the correct label for the unseen data.

Supervised learning is of two types
● Classification
● Regression

Classification
● Take Input data and assign it to a class/category.
● Models finds features in the data that correlates to the class and creates a
mapping function
● This mapping function will be used to classify unseen data from testing and
the validation set from the cross validation of the data

Binary and Multiclass classification
Definition and Example

Popular Classification Algorithms
● Logistic Regression.
● Naïve Bayes.
● Stochastic Gradient Descent.
● K-Nearest Neighbours.
● Decision Tree.
● Random Forest.
● Support Vector Machine

Regression
Model tries to find a relationship between dependent and independent variable.
Goal is always to predict continuous values such as a test score.

Different Regression Algorithm
● Linear Regression.
● Logistic Regression.
● Ridge Regression.
● Lasso Regression.
● Polynomial Regression.
● Bayesian Linear Regression.

Application of Supervised Learning
● Text categorization
● Face Detection
● Signature recognition
● Customer discovery
● Spam detection
● Weather forecasting
● Predicting housing prices based on the prevailing market price
● Stock price predictions, among others

Unsupervised Learning
● Uses to manifest underlying pattern in data
● Used in Exploratory Data Analysis
● Need no labelled data,they use the feature from the Data

Unsupervised Learning is of
● Clustering
● Association

Clustering -Partitional Clustering
● Partitional Clustering
● Each Data point can belong to a single cluster

Clustering - Hierarchical Clustering
● Clusters within the clusters
● Datapoint may belong to different clusters

Association
Attempts to find different relationship between the different entities.
Example - Market Basket Analysis

Some Clustering Algorithm
1. Clustering Dataset
2. Affinity Propagation
3. Agglomerative Clustering
4. BIRCH
5. DBSCAN
6. K-Means
7. Mini-Batch K-Means
8. Mean Shift
9. OPTICS
10.Spectral Clustering
11.Gaussian Mixture Model

Application of the Unsupervised Learning
● Fraud detection
● Malware detection
● Identification of human errors during data entry
● Conducting accurate basket analysis, etc.

Reinforcement Learning
Enable the intelligent entity to learn in an interactive environment by trial and
error(by Policy and reward network) based on its own actions and experience.
This is a very new way of getting the things learnt.
If your Neural Network doesn't work well then you have to use the Reinforcement
:Learning.

Reward and Punishment is the key here
Uses the positive and negative signals as the behavior to understand what has
been learnt.

Goal of Reinforcement learning is to
● Find a Suitable model that would maximize the total cumulative reward and
make a very approximate result that might help in making some more
discussion.
● Maximize the points won in a training over many examples
● Penalize when they make wrong decisions
● Reward where they make Right decision
Usually modelled as a “Markov Decision Process”

Penalty/
Reward
Next State
Action

Application of the Robotics
● Robotics
● Business strategy
● Traffic Light Control
● Web system configuration
● NLP
○ to personalize suggestions
○ deliver more meaningful notifications to users
○ optimize video streaming quality.
● Gaming
● Bidding

Some core and Canonical Problems in Deep Learning
Basically we find this as the situation:
Model should perform well on training data and new test data
Most common problem faced will always be overfitting

So the data points as the example is here
● Data is Skew
● Data is Random
● They don't care anything and
anybody they are just generated
● They are collected to make
sense and make a model
● They are collected to make the
right decision from the model

Tackling Overfitting is
1. Hold-out
2. Cross-validation
3. Data augmentation
4. Feature selection
5. L1 / L2 regularization
6. Remove layers / number of units per layer
7. Dropout
8. Early stopping

Data Augmentation
Just create some fake data as much as possible from the data itself.

Early Stopping
Use Early Stopping to Halt the Training of Neural Networks At the Right Time (machinelearningmastery.com)

When do you need to do this?
Training error decreases steadily but the validation error increases after a certain
point.

Neural Network are in plenty
So go the sources that I am saying in this stream.

So we talked a lot about the models
So now let's get to know how we can build a model.

Gathering Data
Picking the right data is very important,Good way to start is you need to make
assumption about the data that you need.

Size of the data set also matters
No one size fits all
Amount of the data needed = 10 times the model parameters

Quality of the data also matters
Data has to be more Accurate and Reliable with no Adversaries.
Noiseless Features.

Some Dataset Repositories are
I will list out here.

Pre- Processing dataset
Split the dataset into the subset.
Training data set
Testing Data Set
Validation Data Set
We can randomly split the dataset

This process depends on
● Number of the samples in eh data
● Model Being Trained

Simple rule of thumb
● Few Hyperparameters means small validation set
● Many hyperparameters means large validation set

The ratio in which you split the dataset is specific to your
Use Case

Dataset
Train
Train
Test
Folds
Test Validation

Look for the missing data
● Nan or Null
● Eliminated Features or the Missing Value
● Impute the Missing data

Sampling
Use a sample of the dataset

Why we need this
● Faster Convergence
● Reduces the Disk Space

Preprocessing is Required for the FEature Scaling
● Crucial Step for the Model TRaining:
● Normalization
● Standardization
Then obviously train and Evaluate.

Optimization
Will be Continued ...

Deep learning crash course

More Related Content

What's hot (20)

Similar to Deep learning crash course (20)

More from Vishwas N (20)

Recently uploaded (20)

Deep learning crash course

Editor's Notes