SlideShare a Scribd company logo
Dr. Rahul J. Pandya,
Assistant Professor,
Electrical, Electronics, and Communication Engineering (EECE) Dept.,
Indian Institute of Technology (IIT), Dharwad
Email: rpandya@iitdh.ac.in 1
Introduction to Machine Learning
Course content - Syllabus
RJEs: Remote job entry points
▪ Introduction to Machine Learning (ML)
▪ Types of Machine learning
▪ Supervised ML
▪ Unsupervised ML
▪ Semi-Supervised ML
▪ Reinforcement Learning (RL)
▪ Machine learning (ML) algorithms
▪ Regression- Linear Regression, Logistic Regression, Multivariate Regression
▪ Classification
▪ Clustering – Partitional clustering, Hierarchical clustering, Density based clustering
▪ Decision trees
▪ K-Nearest Neighbours (KNN)
▪ Kernel methods: Support vector machine
▪ Reinforcement Learning (RL) algorithms
▪ Graphical models: Gaussian mixture models and hidden Markov models
▪ Introduction to Bayesian Approach: Bayesian classification, Bayesian learning,
Bayes optimal classifier, and Naïve Bayes Classifier. 2
Reference books
RJEs: Remote job entry points
▪ C. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006
▪ K. P. Murphy, “Machine Learning: A Probability Perspective”, MIT Press,
2012.
3
Introduction
to
Machine Learning (ML)
4
*`
Artificial Intelligence (AI)
Enables systems to perform
intelligent tasks through a set of rules
https://www.geeksforgeeks.org/difference-between-artificial-intelligence-
vs-machine-learning-vs-deep-learning/
5
*`
Artificial Intelligence (AI)
Enables systems to perform
intelligent tasks through a set of rules
Machine Learning (ML)
It is a process of learning from the
data without using complex rules. It
involves training a model from
datasets and predicting the outcome.
https://www.geeksforgeeks.org/difference-between-artificial-intelligence-
vs-machine-learning-vs-deep-learning/
6
*`
Artificial Intelligence (AI)
Enables systems to perform
intelligent tasks through a set of rules
Machine Learning (ML)
It is a process of learning from the
data without using complex rules. It
involves training a model from
datasets and predicting the outcome.
Deep Learning (DL)
ML at a large-scale,
Equipped with
artificial neural
networks
https://www.geeksforgeeks.org/difference-between-artificial-intelligence-
vs-machine-learning-vs-deep-learning/
7
Introduction to Machine Learning (ML)
RJEs: Remote job entry points
▪ Artificial Intelligence (AI):
Approaches that enable computers
to perform intelligent tasks.
▪ Machine Learning (ML): Approaches
that learn the underlying pattern in
given set of features without being
explicitly programmed.
▪ Deep Learning (DL): Approaches
that learn the underlying
representations and patterns in
given set of raw data without being
explicitly programmed.
8
Artificial Intelligence
RJEs: Remote job entry points
▪ Intelligence: Experiencing (ability to learn & understand) and use it for deciding
future course
▪ Artificial Intelligence (AI): Enabling machines to do so called intelligent tasks
▪ Problem solving
▪ Discovery
▪ Learning
▪ Dealing with uncertainties
▪ AI Categories:
▪ Problem solving using search methods
▪ State space search, heuristic search, randomized search, rule based,
▪ Symbolic manipulation is one form of AI
▪ Connectionist approach is another form of AI
▪ S R Mahadeva Prasanna PRML August 9
9
Machine Learning
RJEs: Remote job entry points
▪ With more and more digital data available, task of automatic discovery and
learning of patterns, both natural and synthetic data.
▪ Not much focus on feature extraction, signal processing knowledge not
pre-requisite !
▪ More emphasis on discovery and learning of patterns by machine.
▪ Ability to learn by extracting patterns from data (features)
▪ Treated pattern learning more like associated function learning.
▪ Output y = f (x), where y is output and x is input data (features).
▪ Goal of ML is to learn f () that maps x to y.
Ref: https://www.javatpoint.com/reinforcement-learning
10
Deep Learning
RJEs: Remote job entry points
▪ Task of learning both features (representation) and also patterns for pattern
recognition.
▪ Trying to mimic human way of learning.
▪ Learning from experience
▪ Need not specify everything in the beginning
▪ Understand in terms of hierarchy of concepts
▪ Each concept defined in terms its relation to simpler concepts
▪ Learning complicated concepts out of simpler ones
▪ S
Ref: https://www.javatpoint.com/reinforcement-learning
11
What
is
Machine Learning (ML)?
12
Introduction to Machine Learning (ML)
RJEs: Remote job entry points
▪ Machin learning gives “ computers the ability to learn without being explicitly programmed.”
~ Arthur Samuel
Ref: https://pub.towardsai.net/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa
https://www.google.com/imgres?imgurl=https%3A%2F%2Fprutor.ai%2Fwp-content%2Fuploads%2FML-vs-Programming.png&tbnid= -https://prutor.ai/ml-what-is-machine-learning/
▪ Preparing for the exams
▪ Students feed their machine
(brain) with a good amount of
high-quality data (questions and
answers from different books or
teachers notes or online video
lectures).
▪ Training their brain with input as
well as output i.e. what kind of
approach or logic do they have to
solve a different kind of questions.
13
Introduction to Machine Learning (ML)
RJEs: Remote job entry points
▪ Machin learning gives “ computers the ability to learn without being explicitly programmed.”
~ Arthur Samuel
Ref: https://pub.towardsai.net/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa
https://www.google.com/imgres?imgurl=https%3A%2F%2Fprutor.ai%2Fwp-content%2Fuploads%2FML-vs-Programming.png&tbnid= -https://prutor.ai/ml-what-is-machine-learning/
▪ Preparing for the exams
▪ Similarly, in ML train machine with
data (both inputs and outputs are
given to model) and when the
time comes test on data (with
input only) and achieves our
model scores by comparing its
answer with the actual output
which has not been fed while
training.
14
How ML works?
RJEs: Remote job entry points
▪ Features of ML:
▪ Machine learning uses data to detect various patterns in a given dataset.
▪ It can learn from past data and improve automatically.
▪ It is a datas-driven technology.
▪ Machine learning is like data mining as it also deals with vast data.
Ref: https://www.javatpoint.com/machine-learning
▪ Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it
15
Why Machine Learning (ML)?
RJEs: Remote job entry points
▪ Machin learning gives “ computers the ability to learn without being explicitly
programmed.” ~ Arthur Samuel
▪ Why ML?
▪ Machine learning models help us in many tasks, such as:
▪ Object Recognition
▪ Summarization
▪ Prediction
▪ Classification
▪ Clustering
▪ Recommender systems
▪ And others
▪ ML refers to the scientific branch of AI
▪ Deep learning is a subset of ML
Ref: https://pub.towardsai.net/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa
https://www.google.com/imgres?imgurl=https%3A%2F%2Fprutor.ai%2Fwp-content%2Fuploads%2FML-vs-Programming.png&tbnid= -
ETheD8sGlw9TM&vet=12ahUKEwj4k9OO7NOAAxXc5TgGHQy1CN8QMygHegUIARDTAQ..i&imgrefurl=https%3A%2F%2Fprutor.ai%2Fml-what-is-machine-learning%2F&docid=-yk7-
zimRN69qM&w=571&h=223&q=What%20is%20Machine%20Learning%20(ML)%3F&ved=2ahUKEwj4k9OO7NOAAxXc5TgGHQy1CN8QMygHegUIARDTAQ
16
Basic Difference in ML and Traditional Programming?
RJEs: Remote job entry points https://prutor.ai/ml-what-is-machine-learning/
▪ What does exactly learning means for a computer?
▪ Learning from Experiences with respect to some class of Tasks, if its performance in a
given Task improves with the Experience.
▪ Learn from experience E with respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P, improves with experience E
▪ Traditional Programming: We feed in
DATA (Input) + PROGRAM (logic), run it
on machine and get output.
▪ Machine Learning: We feed in
DATA(Input) + Output, run it on machine
during training and the machine creates
its own program(logic), which can be
evaluated while testing.
17
Machine Learning in Current world
RJEs: Remote job entry points https://www.javatpoint.com/applications-of-machine-learning
18
Traditional ML vs DL
RJEs: Remote job entry points https://www.researchgate.net/figure/Comparison-between-ML-and-Dl-algorithm_fig5_344628869
19
Neural Nets vs Deep Learning
RJEs: Remote job entry points [Taken from public domain. Original authors highly acknowledged.]
▪ The concept of deep learning first originated from neural network.
▪ A good example of deep neural network is a feed forward neural network (FFNN).
▪ Backpropagation (BP) is the workhorse algorithm for learning the parameters of
FFNN.
▪ BP did not work well for networks having more than a small number of hidden
layers.
▪ Insufficient data leading to overfitting and difficulty in training of the deep networks
was the main limitation.
20
NN vs DL
RJEs: Remote job entry points https://yangxiaozhou.github.io/data/2020/09/24/intro-to-cnn.html
21
AI, ML, NN and DL
RJEs: Remote job entry points
https://www.researchgate.net/figure/Relationship-between-artificial-intelligence-machine-learning-deep-learning-and_fig2_351110482
22
AI, ML, NN and DL
RJEs: Remote job entry points
https://www.researchgate.net/figure/Relationship-between-artificial-intelligence-machine-learning-deep-learning-and_fig2_351110482
23
Information Extraction & Modeling
RJEs: Remote job entry points [Taken from public domain. Original authors highly acknowledged.]
▪ Information : Knowledge about something. Face, speaker, route.
▪ Extraction : Extract physical quantities that carry information.
▪ Feature extraction or representation learning
▪ Modeling : Invariant entity that carries the knowledge. From features
model these invariant entities.
▪ Process : In human computer interaction it refers to signal processing,
pattern recognition, machine learning and deep learning.
24
When Machine Learning and when Deep Learning
RJEs: Remote job entry points [Taken from public domain. Original authors highly acknowledged.]
▪ Problem statement well / ill defined
▪ Amount of data less / too much
▪ Domain knowledge is high / low
▪ Well meaning feature extraction possible / not possible
▪ Machine learning / deep learning
25
Classification of Machine Learning
RJEs: Remote job entry points
▪ At a broad level, machine learning can be classified into three types:
https://www.javatpoint.com/machine-learning
▪ Supervised ML models
▪ Unsupervised ML models
▪ Semi-supervised ML models
(combination of Supervised and
Unsupervised models)
▪ Reinforcement learning models
26
Regression
RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.]
▪ Objective of regression task.
▪ Univariate vs multivariate regression.
▪ Linear vs nonlinear regression.
▪ Cost function.
▪ Gradient descent method of optimization.
▪ Normal equation approach for parameter estimation.
▪ Logistic regression
28
Clustering
RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.]
▪ Objective of clustering task.
▪ Partitioning approach - k-means, fuzzy-c means.
▪ Model based approach - Gaussian mixture model (GMM).
▪ Expectation-maximization (EM) algorithm.
▪ Hierarchical clustering.
▪ Hierarchical - agglomerative clustering.
▪ Hierarchical - divisive clustering.S
29
Classification
RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.]
▪ Objective of classification task.
▪ Binary vs multiclass classification.
▪ Generative vs discriminative classification.
▪ Parametric vs nonparametric classification.
▪ Logistic regression.
▪ k-nearest neighbour classification.
▪ Support vector machine.
▪ Generative classifiers.
30
Dimensionality Reduction
RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.]
▪ Objective of dimensionality reduction task.
▪ Principal component analysis (PCA).
▪ Linear discriminant analysis (LDA).
▪ PCA based dimensionality reduction.
▪ PCA based classification.
▪ LDA based dimensionality reduction.
▪ LDA based classification.
31
Time Series Modelling
RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.]
▪ Objective of time series modelling task.
▪ Markov process and models.
▪ Observable vs hidden Markov model
▪ Hidden Markov Model (HMM).
▪ Training and testing of HMM
▪ Forward and backward variables.
▪ Viterbi algorithm for optimal state sequence.
▪ Expectation maximization (EM) approach for training.
32
Bayesian Approach
RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.]
▪ Objective of Bayesian approach.
▪ Probabilistic framework for classification.
▪ Bayesian classification.
▪ Bayesian learning.
▪ Maximum a posteriori (MAP) approach.
▪ Bayes optimal classifier.
▪ Gibbs sampling.
▪ Naive Bayes classifier.
▪ Bayesian network.
33
Types
of
Machine Learning (ML)
34
Classification of Machine Learning
RJEs: Remote job entry points
▪ At a broad level, machine learning can be classified into three types:
https://www.javatpoint.com/machine-learning
▪ Supervised ML models
▪ Unsupervised ML models
▪ Semi-supervised ML models
(combination of Supervised and
Unsupervised models)
▪ Reinforcement learning models
35
Supervised
Machine Learning (ML)
36
Supervised Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Supervised learning is a type of machine learning in which machines are trained
using well "labelled" training data, and on basis of that data, machines predict the
output.
▪ The labelled data means some input data is already tagged with the correct
output.
▪ In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly.
▪ It applies the same concept as a student learns in the supervision of the teacher.
▪ Supervised learning is a process of providing input data as well as correct output
data to the machine learning model. The aim of a supervised learning algorithm is to
find a mapping function to map the input variable(x) with the output variable(y).
37
How Supervised Learning Works?
RJEs: Remote job entry points
▪ In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of test
data (a subset of the training set), and then it predicts the output.
Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Suppose we have a dataset of different types
of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is
that we need to train the model for each
shape.
▪ If the given shape has four sides, and all the
sides are equal, then it will be labelled as a
Square.
▪ If the given shape has three sides, then it will
be labelled as a triangle.
▪ If the given shape has six equal sides then it will be labelled as hexagon.
▪ Now, after training, we test our model using the test set, and the task of the model is to identify the shape.
38
How Supervised Learning Works?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
▪ Algorithms like Decision tree, Random Forest, KNN, Logistic Regression, etc. fall under
supervised ML models
39
Steps Involved in Supervised Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•First Determine the type of training dataset
•Collect/Gather the labelled training data.
•Split the training dataset into training dataset,
test dataset, and validation dataset.
•Determine the input features of the training
dataset, which should have enough knowledge so
that the model can accurately predict the output.
•Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
•Execute the algorithm on the training dataset.
•Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output,
which means our model is accurate.
40
Types of supervised Machine Learning Algorithms
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Regression
• Regression algorithms are used if there is a relationship between the input
variable and the output variable. It is used for the prediction of continuous
variables, such as Weather forecasting, Market Trends, etc. Below are some
popular Regression algorithms which come under supervised learning:
•Linear Regression
•Regression Trees
•Non-Linear Regression
•Bayesian Linear Regression
•Polynomial Regression
Classification
• Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-No, Male-
Female, True-false, etc.
• Spam Filtering,
• Random Forest
• Decision Trees
• Logistic Regression
• Support Vector Machines
41
Advantages/Disadvantages of Supervised learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Advantages of supervised learning:
•With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
•In supervised learning, we can have an exact idea about the classes of objects.
•Supervised learning model helps us to solve various real-world problems such as fraud detection,
spam filtering, etc.
Disadvantages of supervised learning:
•Supervised learning models are not suitable for handling the complex tasks.
•Supervised learning cannot predict the correct output if the test data is different from the
training dataset.
•Training required lots of computation times.
•In supervised learning, we need enough knowledge about the classes of object.
42
Unsupervised Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Supervised machine learning in which models are trained using labeled data under the
supervision of training data. But there may be many cases in which we do not have labeled data
and need to find the hidden patterns from the given dataset. So, to solve such types of cases in
machine learning, we need unsupervised learning techniques.
▪ What is Unsupervised Learning?
▪ Unsupervised learning is a machine learning technique in which models are not supervised using
training dataset. Instead, models itself find the hidden patterns and insights from the given data.
It can be compared to learning which takes place in the human brain while learning new things. It
can be defined as:
▪ Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.
▪ Unsupervised learning cannot be directly applied to a regression or classification problem
because unlike supervised learning, we have the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
43
Unsupervised Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Suppose the unsupervised learning algorithm is
given an input dataset containing images of
different types of cats and dogs.
▪ The algorithm is never trained upon the given
dataset, which means it does not have any idea
about the features of the dataset.
▪ The task of the unsupervised learning algorithm is to
identify the image features on their own.
▪ Unsupervised learning algorithm will perform this
task by clustering the image dataset into the
groups according to similarities between images.
44
Why use Unsupervised Learning?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•Unsupervised learning is helpful for finding useful insights from the
data.
•Unsupervised learning is much similar as a human learns to think
by their own experiences, which makes it closer to the real AI.
•Unsupervised learning works on unlabelled and uncategorized data
which make unsupervised learning more important.
•In real-world, we do not always have input data with the
corresponding output so to solve such cases, we need unsupervised
learning.
45
Working of Unsupervised Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given.
▪ Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it
will interpret the raw data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.
▪ Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
46
Types of Unsupervised Learning Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•Clustering:
•Clustering is a method of grouping the objects into clusters
such that objects with most similarities remains into a group
and has less or no similarities with the objects of another group.
•Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of
those commonalities.
•Association:
•An association rule is an unsupervised learning method which is
used for finding the relationships between variables in the large
database.
•It determines the set of items that occurs together in the dataset.
• Association rule makes marketing strategy more effective.
•Such as people who buy X item (suppose a bread) are also tend
to purchase Y (Butter/Jam) item.
•A typical example of Association rule is Market Basket
Analysis.
47
Unsupervised Learning algorithms
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•K-means clustering
•KNN (k-nearest neighbors)
•Hierarchal clustering
•Anomaly detection
•Neural Networks
•Principle Component Analysis
•Independent Component Analysis
•Apriori algorithm
•Singular value decomposition
48
Advantages/ Disadvantages of Unsupervised Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•Advantages
•Unsupervised learning is used for more complex tasks as compared to supervised
learning because, in unsupervised learning, we don't have labelled input data.
•Unsupervised learning is preferable as it is easy to get unlabelled data in
comparison to labelled data.
•Disadvantages
•Unsupervised learning is intrinsically more difficult than supervised learning as it
does not have corresponding output.
•The result of the unsupervised learning algorithm might be less accurate as input
data is not labelled, and algorithms do not know the exact output in advance
49
Difference between Supervised and Unsupervised Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
50
Difference between Supervised and Unsupervised Learning
RJEs: Remote job entry points
Supervised Learning Unsupervised Learning
▪ Supervised learning algorithms are trained using labeled data.
▪ Unsupervised learning algorithms are trained using unlabeled
data.
▪ Supervised learning model takes direct feedback to check if it is
predicting correct output or not.
▪ Unsupervised learning model does not take any feedback.
▪ Supervised learning model predicts the output. ▪ Unsupervised learning model finds the hidden patterns in data.
▪ In supervised learning, input data is provided to the model along
with the output.
▪ In unsupervised learning, only input data is provided to the
model.
▪ The goal of supervised learning is to train the model so that it
can predict the output when it is given new data.
▪ The goal of unsupervised learning is to find the hidden patterns
and useful insights from the unknown dataset.
▪ Supervised learning needs supervision to train the model.
▪ Unsupervised learning does not need any supervision to train the
model.
▪ Supervised learning can be categorized in Classification and
Regression problems.
▪ Unsupervised Learning can be classified in Clustering and
Associations problems.
▪ Supervised learning can be used for those cases where we know
the input as well as corresponding outputs.
▪ Unsupervised learning can be used for those cases where we have
only input data and no corresponding output data.
▪ Supervised learning model produces an accurate result.
▪ Unsupervised learning model may give less accurate result as
compared to supervised learning.
▪ Supervised learning is not close to true Artificial intelligence as
in this, we first train the model for each data, and then only it can
predict the correct output.
▪ Unsupervised learning is more close to the true Artificial
Intelligence as it learns similarly as a child learns daily routine
things by his experiences.
▪ It includes various algorithms such as Linear Regression,
Logistic Regression, Support Vector Machine, Multi-class
▪ It includes various algorithms such as Clustering, KNN, and
Apriori algorithm.
Ref: https://www.javatpoint.com/supervised-machine-learning
51
Regression
52
Regression Analysis in Machine learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Regression analysis is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables. More specifically, Regression
analysis helps us to understand how the value of the
dependent variable is changing corresponding to an
independent variable when other independent
variables are held fixed. It predicts continuous/real
values such as temperature, age, salary, price, etc.
▪ Example: Suppose there is a marketing company A,
who does various advertisement every year and get
sales on that. The list shows the advertisement made
by the company in the last 5 years and the
corresponding sales:
53
Regression Analysis in Machine learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Now, the company wants to do the
advertisement of $200 in the year and wants to
know the prediction about the sales for this
year. So to solve such type of prediction
problems in machine learning, we need
regression analysis.
54
Regression Analysis in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Regression is a supervised learning technique which helps in
finding the correlation between variables and enables us
to predict the continuous output variable based on the one or
more predictor variables. It is mainly used for prediction,
forecasting, time series modelling, and determining the
causal-effect relationship between variables.
▪ In Regression, we plot a graph between the variables
which best fits the given datapoints, using this plot, the
machine learning model can make predictions about the
data. In simple words, "Regression shows a line or curve
that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance
between the datapoints and the regression line is
minimum." The distance between datapoints and line tells
whether a model has captured a strong relationship or not.
55
Regression Analysis in Machine learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Some examples of regression can be as:
•Prediction of rain using temperature and other factors
•Determining Market trends
•Prediction of road accidents due to rash driving.
56
Why do we use Regression Analysis?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Regression analysis helps in the prediction of a continuous variable. There are
various scenarios in the real world where we need some future predictions such
as weather condition, sales prediction, marketing trends, etc., for such case we
need some technology which can make predictions more accurately. So for such
case we need Regression analysis which is a statistical method and used in
machine learning and data science.
▪ Regression estimates the relationship between the target and the independent
variable.
▪ It is used to find the trends in data.
▪ By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
59
Types of Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
60
Linear Regression
61
Linear Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Linear regression is a statistical regression
method which is used for predictive analysis.
▪ Shows the relationship between the continuous
variables.
▪ It is used for solving the regression problem in
machine learning.
▪ Linear regression shows the linear relationship
between the independent variable (X-axis) and
the dependent variable (Y-axis), hence called
linear regression.
Y= aX+b
Here, Y = dependent variables (target
variables),
X= Independent variables (predictor
variables),
a and b are the linear coefficients
62
Linear Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ If there is only one input variable (x),
then such linear regression is called
simple linear regression. And if
there is more than one input
variables, then such linear regression
is called multiple linear regression.
▪ The relationship between variables in
the linear regression model can be
explained using the image. Here we
are predicting the salary of an
employee on the basis of the year of
experience.
Here, Y = dependent variables
(target variables),
X= Independent variables
(predictor variables),
a and b are the linear coefficients
Y= aX+b
63
Linear Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Linear regression is one of the easiest and most popular
Machine Learning algorithms. It is a statistical method that
is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as
sales, salary, age, product price, etc.
▪ Linear regression algorithm shows a linear relationship
between a dependent (y) and one or more independent (y)
variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing
according to the value of the independent variable.
▪ The linear regression model provides a sloped straight line
representing the relationship between the variables.
64
Linear Regression in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
y= a0+a1x+ ε
Here,
▪ Y= Dependent Variable (Target Variable)
▪ X= Independent Variable (Predictor Variable)
▪ a0= Intercept of the line (Gives an additional degree
of freedom)
▪ a1 = Linear regression coefficient (scale factor to
each input value).
▪ ε = random error
▪ The values for x and y variables are training
datasets for Linear Regression model
representation. 65
Types of Linear Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.
▪ Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple
Linear Regression.
66
Finding the best fit line
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ When working with linear regression, our
main goal is to find the best fit line that
means the error between predicted values
and actual values should be minimized. The
best fit line will have the least error.
▪ The different values for weights or the
coefficient of lines (a0, a1) gives a different
line of regression, so we need to calculate
the best values for a0 and a1 to find the best
fit line, so to calculate this we use cost
function.
y= a0+a1x+ ε
67
Cost function
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ The different values for weights or coefficients of
lines (a0, a1) give the different line of regression,
and the cost function is used to estimate the values
of the coefficients for the best fit line.
▪ Cost function optimizes the regression coefficients
or weights. It measures how a linear regression
model is performing.
▪ We can use the cost function to find the accuracy of
the mapping function, which maps the input
variable to the output variable. This mapping
function is also known as Hypothesis function.
y= a0+a1x+ ε
68
Cost function
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Where,
N=Total number of observations
Yi = Actual value
(a1xi+a0)= Predicted value
Residuals: The distance between the actual value and predicted values is called residual. If the
observed points are far from the regression line, then the residual will be high, and so cost function will
be high. If the scatter points are close to the regression line, then the residual will be small and hence
the cost function.
▪ For Linear Regression, we use the Mean Squared Error (MSE) cost function,
which is the average of squared error occurred between the predicted values and
actual values.
▪ For the above linear equation, MSE can be calculated as:
69
Logistic Regression
70
Logistic Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Logistic regression is another supervised learning
algorithm which is used to solve the classification
problems. In classification problems, we have
dependent variables in a binary or discrete format
such as 0 or 1.
▪ Logistic regression algorithm works with the
categorical variable such as 0 or 1, Yes or No, True
or False, Spam or not spam, etc.
▪ It is a predictive analysis algorithm which works on
the concept of probability.
▪ Logistic regression uses sigmoid function or logistic
function which is a complex cost function. This
sigmoid function is used to model the data in logistic
regression.
71
Logistic Regression
RJEs: Remote job entry points https://mathworld.wolfram.com/SigmoidFunction.html
▪ f(x) = Output between the 0 and 1 value
▪ x = input to the function
▪ e = base of natural logarithm
▪ There are three types of logistic regression:
•Binary (0/1, pass/fail)
•Multi (cats, dogs, lions)
•Ordinal (low, medium, high)
73
Polynomial Regression
74
Polynomial Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•Polynomial Regression is a type of regression which
models the non-linear dataset using a linear model.
•It is similar to multiple linear regression, but it fits a
non-linear curve between the value of x and
corresponding conditional values of y.
•Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
•In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modelled using a linear model. Which means
the data-points are best fitted using a polynomial line.
75
Polynomial Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•The equation for polynomial regression also
derived from linear regression equation that means
Linear regression equation Y=b0+b1x, is
transformed into Polynomial regression equation
Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn
•Here Y is the predicted/target output, b0, b1,... bn
are the regression coefficients. x is our
independent/input variable.
•The model is still linear as the coefficients are still
linear with quadratic
76
Support Vector Regression
77
Support Vector Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Support Vector Machine is a supervised learning
algorithm which can be used for regression as well as
classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.
▪ Support Vector Regression is a regression algorithm
which works for continuous variables. Below are some
keywords which are used in Support Vector Regression:
▪ Kernel: It is a function used to map a lower-
dimensional data into higher dimensional data.
▪ Hyperplane: In general SVM, it is a separation line
between two classes, but in SVR, it is a line which helps
to predict the continuous variables and cover most of the
datapoints.
78
Support Vector Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Boundary line: Boundary lines are the two lines
apart from hyperplane, which creates a margin for
data points.
▪ Support vectors: Support vectors are the datapoints
which are nearest to the hyperplane and opposite
class. In SVR, we always try to determine a hyperplane
with a maximum margin, so that maximum number of
datapoints are covered in that margin.
▪ The main goal of SVR is to consider the maximum
data points within the boundary lines and the
hyperplane (best-fit line) must contain a maximum
number of data points.
79
Decision Tree Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•Decision Tree is a supervised learning algorithm which can
be used for solving both classification and regression
problems.
•It can solve problems for both categorical and numerical data
•Decision Tree regression builds a tree-like structure in which
each internal node represents the "test" for an attribute,
each branch represent the result of the test, and each leaf
node represents the final decision or result.
•A decision tree is constructed starting from the root
node/parent node (dataset), which splits into left and right child
nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the
parent node of those nodes.
80
Decision Tree Regression
81
Decision Tree Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Image showing the example of Decision
Tee regression, here, the model is trying
to predict the choice of a person between
Sports cars or Luxury car.
g(x)= f0(x)+ f1(x)+ f2(x)+....
82
Random forest
83
Random forest
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Random forest is one of the most
powerful supervised learning algorithms
which is capable of performing
regression as well as classification
tasks
▪ The Random Forest regression is an
ensemble learning method which
combines multiple decision trees and
predicts the final output based on the
average of each tree output. The
combined decision trees are called as
base models
g(x)= f0(x)+ f1(x)+ f2(x)+....
84
Random forest
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Random forest uses Bagging or
Bootstrap Aggregation technique
of ensemble learning in which
aggregated decision tree runs in
parallel and do not interact with
each other.
▪ With the help of Random Forest
regression, we can prevent
Overfitting in the model by
creating random subsets of the
dataset.
85
Ridge Regression
86
Ridge Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
•Ridge regression is one of the most robust versions of linear regression
in which a small amount of bias is introduced so that we can get
better long term predictions.
•The amount of bias added to the model is known as Ridge Regression
penalty. We can compute this penalty term by multiplying with the
lambda to the squared weight of each individual features.
•The equation for ridge regression will be:
87
Ridge Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ A general linear or polynomial regression will fail if there is high
collinearity between the independent variables, so to solve such
problems, Ridge regression can be used.
▪ Ridge regression is a regularization technique, which is used to
reduce the complexity of the model. It is also called as L2
regularization.
▪ It helps to solve the problems if we have more parameters than
samples.
88
Lasso Regression
89
Lasso Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Lasso regression is another regularization technique to reduce the
complexity of the model.
▪ It is similar to the Ridge Regression except that penalty term contains only the
absolute weights instead of a square of weights.
▪ Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
▪ It is also called as L1 regularization. The equation for Lasso regression is:
90
Model Performance
RJEs: Remote job entry points
https://byjus.com/maths/coefficient-of-determination/
https://aaweg-i.medium.com/what-precautions-we-need-to-keep-in-mind-when-using-coefficient-of-determination-98625e8bdb51
▪ The Goodness of fit determines how the line of regression fits the set of observations. The
process of finding the best model out of various models is called optimization. It can be
achieved by below method:
R-squared method:
▪ It measures the strength of the relationship between the dependent and independent variables
on a scale of 0-100%.
▪ The high value of R-square determines the less difference between the predicted values and
actual values and hence represents a good model.
▪ It is also called a coefficient of determination, or coefficient of multiple determination for
multiple regression.
91
Gradient Descent
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
▪ Gradient descent is used to minimize the MSE
by calculating the gradient of the cost function.
▪ A regression model uses gradient descent to
update the coefficients of the line by reducing
the cost function.
▪ It is done by a random selection of values of
coefficient and then iteratively update the
values to reach the minimum cost function.
92
Gradient Descent
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
▪ Gradient Descent is known as one of the most commonly
used optimization algorithms to train machine learning
models by means of minimizing errors between actual
and expected results. Further, gradient descent is also
used to train Neural Networks.
▪ Optimization algorithm refers to the task of
minimizing/maximizing an objective function f(x)
parameterized by x.
▪ Similarly, in machine learning, optimization is the task of
minimizing the cost function parameterized by the model's
parameters. The main objective of gradient descent is to
minimize the convex function using iteration of parameter
updates.
93
What is Gradient Descent or Steepest Descent?
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
▪ Gradient Descent is defined as one of the most
commonly used iterative optimization algorithms of
machine learning to train the machine learning and
deep learning models. It helps in finding the local
minimum of a function.
▪ If we move towards a negative gradient or away from
the gradient of the function at the current point, it will
give the local minimum of that function.
▪ Whenever we move towards a positive gradient or
towards the gradient of the function at the current point,
we will get the local maximum of that function.
▪ The main objective of using a gradient descent
algorithm is to minimize the cost function using
iteration.
94
Gradient Descent
RJEs: Remote job entry points
https://www.javatpoint.com/gradient-descent-in-machine-learning
https://www.geeksforgeeks.org/gradient-descent-in-linear-regression/
▪ Calculates the first-order derivative of the function to
compute the gradient or slope of that function.
▪ Move away from the direction of the gradient, which means
slope increased from the current point by alpha times, where
Alpha is defined as Learning Rate. It is a tuning parameter
in the optimization process which helps to decide the length
of the steps.
95
What is Cost-function?
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
▪ The cost function is defined as the measurement
of difference or error between actual values and
expected values.
▪ It helps to increase and improve machine learning
efficiency by providing feedback to this model so that
it can minimize error and find the local or global
minimum. Further, it continuously iterates along the
direction of the negative gradient until the cost
function approaches zero.
▪ At this steepest descent point, model will stop
learning further.
96
How does Gradient Descent work?
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
▪ The starting point (shown in fig.) is used to evaluate the
performance as it is considered just as an arbitrary point.
At this starting point, we will derive the first derivative or
slope and then use a tangent line to calculate the
steepness of this slope. Further, this slope will inform the
updates to the parameters (weights and bias).
▪ The slope becomes steeper at the starting point or
arbitrary point, but whenever new parameters are
generated, then steepness gradually reduces, and at the
lowest point, it approaches the lowest point, which is
called a point of convergence.
▪ The main objective of gradient descent is to minimize
the cost function or the error between expected and
actual. 97
Gradient Descent
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
Direction & Learning Rate
▪ These two factors are used to determine the partial derivative calculation of future iteration and
allow it to the point of convergence or local minimum or global minimum.
Learning Rate:
▪ It is defined as the step size taken to reach the minimum or lowest point. This is typically a small
value that is evaluated and updated based on the behavior of the cost function. If the learning rate
is high, it results in larger steps but also leads to risks of overshooting the minimum. At the same
time, a low learning rate shows the small step sizes, which compromises overall efficiency but gives
the advantage of more precision.
98
Types of Gradient Descent
RJEs: Remote job entry points
https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/
▪ Based on the error in various training models, the Gradient Descent learning
algorithm can be divided into
▪ Batch gradient descent
▪ Mini-batch gradient descent
▪ Stochastic gradient descent
99
Batch Gradient Descent
RJEs: Remote job entry points
https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/
▪ Batch Gradient Descent:
▪ Batch gradient descent (BGD) is used to find the error for each point in the
training set and update the model after evaluating all training examples of the
batch. This procedure is known as the training epoch.
▪ Advantages of Batch gradient descent:
▪ It produces less noise in comparison to other gradient
descent.
▪ It produces stable gradient descent convergence.
▪ It is Computationally efficient as all resources are used
for all training samples.
100
Stochastic gradient descent
RJEs: Remote job entry points
▪ Stochastic gradient descent (SGD) is a type of gradient descent that runs one
training example per iteration. Or in other words, it processes a training epoch for
each example within a dataset and updates each training example's parameters
one at a time.
▪ As it requires only one training example at a time, hence it is easier to store in
allocated memory.
▪ However, it shows some computational efficiency losses in
comparison to batch gradient systems as it shows
frequent updates that require more detail and speed.
▪ Further, due to frequent updates, it is also treated as a
noisy gradient. However, sometimes it can be helpful in
finding the global minimum and also escaping the local
minimum.
https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/
101
Stochastic gradient descent
RJEs: Remote job entry points
▪ Advantages of Stochastic gradient descent:
▪ In Stochastic gradient descent (SGD), learning happens on every example, and
it consists of a few advantages over other gradient descent.
▪ It is easier to allocate in desired memory.
▪ It is relatively fast to compute than batch gradient descent.
https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/
102
MiniBatch Gradient Descent:
RJEs: Remote job entry points
▪ Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent. It divides the training datasets into small batch sizes
then performs the updates on those batches separately.
▪ Splitting training datasets into smaller batches make a balance to maintain the
computational efficiency of batch gradient descent and speed of stochastic gradient
descent.
▪ Hence, we can achieve a special type of gradient descent with higher computational
efficiency and less noisy gradient descent.
▪ Advantages of Mini Batch gradient descent:
▪ It is easier to fit in allocated memory.
▪ It is computationally efficient.
▪ It produces stable gradient descent convergence.
https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/
103
Challenges with the Gradient Descent
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
Local Minima and Saddle Point:
▪ For convex problems, gradient descent can find the global minimum
easily, while for non-convex problems, it is sometimes difficult to find
the global minimum, where the machine learning models achieve the
best results.
▪ Whenever the slope of the cost function is at zero or just close to
zero, this model stops learning further. Apart from the global
minimum, there occur some scenarios that can show this slop, which
is saddle point and local minimum. Local minima generate the shape
similar to the global minimum, where the slope of the cost function
increases on both sides of the current points.
▪ In contrast, with saddle points, the negative gradient only occurs on one side of the point, which reaches a
local maximum on one side and a local minimum on the other side. The name of a saddle point is taken by that
of a horse's saddle.
▪ The name of local minima is because the value of the loss function is minimum at that point in a local region. In
contrast, the name of the global minima is given so because the value of the loss function is minimum there,
globally across the entire domain the loss function.
104
Vanishing and Exploding Gradient
RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning
▪ Vanishing Gradients:
▪ Vanishing Gradient occurs when the gradient is smaller than expected. During backpropagation, this
gradient becomes smaller that causing the decrease in the learning rate of earlier layers than the later
layer of the network. Once this happens, the weight parameters update until they become insignificant.
▪ Exploding Gradient:
▪ Exploding gradient is just opposite to the vanishing gradient as it occurs when the Gradient is too large.
Further, in this scenario, model weight increases, and they will be represented as NaN. This problem
can be solved using the dimensionality reduction technique, which helps to minimize complexity within
the model
105
Classification Algorithm in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
What is the Classification Algorithm?
▪ The Classification algorithm is a Supervised Learning technique that is used
to identify the category of new observations on the basis of training data. In
Classification, a program learns from the given dataset or observations and
then classifies new observation into a number of classes or groups. Such as,
Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be
called as targets/labels or categories.
▪ y = f(x), where y = categorical output
▪ The best example of an ML classification algorithm is Email Spam Detector.
▪ The main goal of the Classification algorithm is to identify the category of a
given dataset, and these algorithms are mainly used to predict the output for
the categorical data.
▪ Classification algorithms can be better understood using the diagram. In the
diagram, there are two classes, class A and Class B. These classes have
features that are similar to each other and dissimilar to other classes.
106
Classification Algorithm in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ The algorithm which implements the classification on a
dataset is known as a classifier.
▪ Binary Classifier: If the classification problem has only
two possible outcomes, then it is called as Binary
Classifier.
▪ Examples: YES or NO, MALE or FEMALE, SPAM or
NOT SPAM, CAT or DOG, etc.
▪ Multi-class Classifier: If a classification problem has
more than two outcomes, then it is called as Multi-class
Classifier.
▪ Example: Classifications of types of crops,
Classification of types of music.
107
Classification
RJEs: Remote job entry points
▪ A Supervised Learning technique that is used to identify the category of new observations on the basis of training
data[1]
▪ In classification, the output is categorical unlike in regression where it was based on predicting ‘values’
▪ Types of classification[2]-
▪ Binary classification: When we have to categorize given data into 2 distinct classes. Example – On the basis
of given health conditions of a person, we have to determine whether the person has a certain disease or not
▪ Multiclass classification: The number of classes is more than 2. For Example – On the basis of data about
different species of flowers, we have to determine which specie our observation belongs
Ref: [1] https://www.javatpoint.com/classification-algorithm-in-machine-learning
[2] https://www.geeksforgeeks.org/getting-started-with-classification/?ref=lbp
108
Classification and its types
RJEs: Remote job entry points
▪ General Block diagram of classification task:
Ref: https://www.geeksforgeeks.org/getting-started-with-classification/?ref=lbp
▪ There are various types of classifiers. Some of them are:
▪ Linear Classifiers: Logistic Regression
▪ Tree-Based Classifiers: Decision Tree Classifier
▪ Support Vector Machines
▪ Artificial Neural Networks
▪ Bayesian Regression
▪ Gaussian Naive Bayes Classifiers
▪ Stochastic Gradient Descent (SGD) Classifier
▪ Ensemble Methods: Random Forests, AdaBoost, Bagging Classifier, Voting Classifier, etc.
• X: Pre-classified data
• y: label/observations for X
• y’: predicted labels for X
109
Learners in Classification Problems
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ In the classification problems, there are two types of
learners:
▪ Lazy Learners: Lazy Learner firstly stores the training
dataset and wait until it receives the test dataset. In Lazy
learner case, classification is done on the basis of the
most related data stored in the training dataset. It takes
less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
▪ Eager Learners: Eager Learners develop a classification
model based on a training dataset before receiving a test
dataset. Opposite to Lazy learners, Eager Learner takes
more time in learning, and less time in prediction.
▪ Example: Decision Trees, Naïve Bayes, ANN. 110
Types of ML Classification Algorithms
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Classification algorithms can be further divided into the mainly two category:
▪ Linear Models
▪ Logistic Regression
▪ Support Vector Machines
▪ Non-linear Models
▪ K-Nearest Neighbours
▪ Kernel SVM
▪ Naïve Bayes
▪ Decision Tree Classification
▪ Random Forest Classification
111
Evaluating a Classification model
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Log Loss or Cross-Entropy Loss:
▪ It is used for evaluating the performance of a classifier, whose output is a probability value
between the 0 and 1.
▪ For a good binary Classification model, the value of log loss should be near to 0.
▪ The value of log loss increases if the predicted value deviates from the actual value.
▪ The lower log loss represents the higher accuracy of the model.
▪ For Binary classification, cross-entropy can be calculated as:
▪ Here, pi is the probability of class 1, and (1-pi) is the probability of class 0.
▪ When the observation belongs to class 1 the first part of the formula becomes active and the
second part vanishes and vice versa 112
Confusion Matrix
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
▪ The confusion matrix provides us a
matrix/table as output and describes
the performance of the model.
▪ It is also known as the error matrix.
▪ The matrix consists of predictions
result in a summarized form, which
has a total number of correct
predictions and incorrect
predictions.
113
Accuracy
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio
between the number of correct predictions and the total number of predictions.
114
Precision
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
▪ It is a measure of correctness that is
achieved in true prediction. In simple
words, it tells us how many predictions
are actually positive out of all the
total positive predicted.
115
Recall
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
▪ It is a measure of actual
observations which are
predicted correctly, i.e. how
many observations of positive
class are actually predicted as
positive.
▪ It is also known as Sensitivity.
▪ Recall is a valid choice of
evaluation metric when we want
to capture as many positives
as possible.
116
F-measure / F1-Score
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
▪ The F1 score is a number between 0 and 1 and is the harmonic mean of
precision and recall. We use harmonic mean because it is not sensitive to
extremely large values, unlike simple averages.
117
Sensitivity & Specificity
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
118
Difference between Regression and Classification
RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
Regression Algorithm Classification Algorithm
▪ In Regression, the output variable must be of
continuous nature or real value.
▪ In Classification, the output variable must be a discrete
value.
▪ The task of the regression algorithm is to map the input
value (x) with the continuous output variable(y).
▪ The task of the classification algorithm is to map the
input value(x) with the discrete output variable(y).
▪ Regression Algorithms are used with continuous data. ▪ Classification Algorithms are used with discrete data.
▪ In Regression, we try to find the best fit line, which
can predict the output more accurately.
▪ In Classification, we try to find the decision boundary,
which can divide the dataset into different classes.
▪ Regression algorithms can be used to solve the
regression problems such as Weather Prediction,
House price prediction, etc.
▪ Classification Algorithms can be used to solve
classification problems such as Identification of spam
emails, Speech Recognition, Identification of
cancer cells, etc.
▪ The regression Algorithm can be further divided into
Linear and Non-linear Regression.
▪ The Classification algorithms can be divided into
Binary Classifier and Multi-class Classifier.
119
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Linear Regression Logistic Regression
Linear regression is used to predict the continuous
dependent variable using a given set of
independent variables.
Logistic Regression is used to predict the
categorical dependent variable using a given set
of independent variables.
Linear Regression is used for solving Regression
problem.
Logistic regression is used for solving
Classification problems.
In Linear regression, we predict the value of
continuous variables.
In logistic Regression, we predict the values of
categorical variables.
In linear regression, we find the best fit line, by
which we can easily predict the output.
In Logistic Regression, we find the S-curve by
which we can classify the samples.
Least square estimation method is used for
estimation of accuracy.
Maximum likelihood estimation method is used for
estimation of accuracy.
The output for Linear Regression must be a
continuous value, such as price, age, etc.
The output of Logistic Regression must be a
Categorical value such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship
between dependent variable and independent
variable must be linear.
In Logistic regression, it is not required to have the
linear relationship between the dependent and
independent variable.
In linear regression, there may be collinearity
between the independent variables.
In logistic regression, there should not be
collinearity between the independent variable.
120
Linear Regression vs. Logistic Regression
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ The Linear Regression is used for solving Regression problems whereas Logistic
Regression is used for solving the Classification problems.
121
Clustering in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the possible
similarities remain in a group that has less or no similarities with another group."
▪ It does it by finding some similar patterns
in the unlabelled dataset such as shape,
size, color, behavior, etc., and divides
them as per the presence and absence of
those similar patterns.
▪ It is an unsupervised learning method;
hence no supervision is provided to the
algorithm, and it deals with the unlabelled
dataset.
122
Clustering in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and
complex datasets.
▪ The clustering technique can be widely
used in various tasks.
▪ Market Segmentation
▪ Statistical data analysis
▪ Social network analysis
▪ Image segmentation
▪ Anomaly detection, etc.
123
Types of Clustering Methods
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to
another group also).
▪ Partitioning Clustering
▪ Density-Based Clustering
▪ Distribution Model-Based Clustering
▪ Hierarchical Clustering
▪ Fuzzy Clustering
124
Hierarchical Clustering in Machine Learning
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Hierarchical clustering is another unsupervised machine learning algorithm, which is
used to group the unlabeled datasets into a cluster and also known as hierarchical
cluster analysis or HCA.
▪ In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this
tree-shaped structure is known as the dendrogram.
▪ Sometimes the results of K-means clustering and hierarchical clustering may look
similar, but they both differ depending on how they work. As there is no requirement
to predetermine the number of clusters as we did in the K-Means algorithm.
▪ The hierarchical clustering technique has two approaches:
▪ Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts
with taking all data points as single clusters and merging them until one cluster is left.
▪ Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-
down approach.
125
Hierarchical Clustering
RJEs: Remote job entry points
▪ The clusters formed in this method form a tree-type structure called dendrogram based on the
hierarchy[1]
▪ New clusters are formed using the previously formed one
▪ It is divided into two categories:
▪ Agglomerative clustering: a bottom-up approach
▪ Divisive clustering: top-down approach
▪ Examples are CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing
Clustering and using Hierarchies), etc.
▪ Agglomerative based dendrogram[2]:
Ref: [1] https://www.geeksforgeeks.org/clustering-in-machine-learning/
[2] https://towardsdatascience.com/machine-learning-algorithms-part-12-hierarchical-agglomerative-clustering-example-in-python-1e18e0075019
126
Why hierarchical clustering?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ As we already have other clustering algorithms such as K-Means
Clustering, then why we need hierarchical clustering?
▪ As we have seen in the K-means clustering that there are some
challenges with this algorithm, which are a predetermined number of
clusters, and it always tries to create the clusters of the same size.
▪ To solve these two challenges, we can opt for the hierarchical
clustering algorithm
▪ In this algorithm, we don't need to have knowledge about the
predefined number of clusters.
127
Agglomerative Hierarchical clustering
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ The agglomerative hierarchical clustering algorithm is a popular example of HCA.
▪ To group the datasets into clusters, it follows the bottom-up approach. It means,
this algorithm considers each dataset as a single cluster at the beginning, and
then start combining the closest pair of clusters together.
▪ It does this until all the
clusters are merged into a
single cluster that contains all
the datasets.
▪ This hierarchy of clusters is
represented in the form of the
dendrogram.
128
How the Agglomerative Hierarchical clustering Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Step-1: Create each data point as a single cluster.
Let's say there are N data points, so the number of
clusters will also be N.
▪ Step-2: Take two closest data points or clusters
and merge them to form one cluster. So, there will
now be N-1 clusters.
▪ Step-3: Again, take the two closest clusters and
merge them together to form one cluster. There will
be N-2 clusters.
129
How the Agglomerative Hierarchical clustering Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters.
▪ Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the problem.
130
Measure for the distance between two clusters
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ As we have seen, the closest distance between the two clusters is crucial for
the hierarchical clustering. There are various ways to calculate the distance
between two clusters, and these ways decide the rule for clustering.
▪ These measures are called Linkage methods.
▪ Single Linkage: It is the Shortest Distance between the closest points of the
clusters.
131
Measure for the distance between two clusters
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Complete Linkage: It is the farthest distance between
the two points of two different clusters. It is one of the
popular linkage methods as it forms tighter clusters than
single-linkage.
▪ Average Linkage: It is the linkage method in which the
distance between each pair of datasets is added up and
then divided by the total number of datasets to calculate
the average distance between two clusters.
▪ Centroid Linkage: It is the linkage method in which the
distance between the centroid of the clusters is
calculated.
132
Density-Based Clustering
RJEs: Remote job entry points
▪ This method connects the highly-dense areas into clusters [1]
▪ These methods have good accuracy and the ability to merge two clusters [2]
▪ Type of clustering algorithms play a crucial role in evaluating and finding non-linear shape
structures based on density [3]
▪ The most popular density-based algorithm is DBSCAN which allows spatial clustering of data
with noise
▪ It makes use of two concepts – Data Reachability and Data Connectivity
Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning, [4] https://www.kdnuggets.com/2020/04/dbscan-clustering-algorithm-machine-learning.html
[2] https://www.geeksforgeeks.org/clustering-in-machine-learning/, [3] https://www.geeksforgeeks.org/clustering-in-machine-learning/
▪ Density-based spatial clustering of applications with noise (DBSCAN):
▪ Based on the idea that a cluster in data space is a contiguous region of
high point density, separated from other such clusters by contiguous
regions of low point density [4]
▪ No need to explicitly define the number of clusters (K) like in K-Means
▪ The DBSCAN algorithm uses two parameters: 1) minPts: The minimum
number of points (a threshold) clustered together for a region to be
considered dense, 2) eps (ε): A distance measure that will be used to
locate the points in the neighborhood of any point
▪ There are three types of points after the DBSCAN clustering is
complete: 1) Core points 2) Border points 3) Noise points 133
Distribution Model-Based Clustering
RJEs: Remote job entry points
▪ Here, the data is divided based on the probability of how a dataset belongs to a
particular distribution [1]
▪ The grouping is done by assuming some distributions commonly Gaussian Distribution
▪ The data observed arises from a distribution consisting of a mixture of two or more cluster
components [2]
▪ Furthermore, each component cluster has a density function having an associated
probability or weight in this mixture
▪ The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM) [1]. Two different examples of EM clustering are
represented below:
Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning
[2] https://data-flair.training/blogs/clustering-in-machine-learning/
134
Partition Clustering
RJEs: Remote job entry points
▪ It is a type of clustering that divides the data into non-hierarchical groups [1]
▪ It is also known as the centroid-based method
▪ These methods partition the objects into k clusters and each partition forms one cluster[2]
▪ This method is used to optimize an objective criterion similarity function
▪ The most common example of partitioning clustering is the K-Means Clustering
algorithm [1]
Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning
[2] https://www.geeksforgeeks.org/clustering-in-machine-learning/
▪ K-Means Clustering:
▪ It groups the unlabeled dataset into K clusters
▪ Main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding
clusters
▪ The algorithm takes the unlabeled dataset as input, divides
the dataset into k-number of clusters, and repeats the
process until it does not find the best clusters
▪ It determines the best value for K center points or centroids
by an iterative process
135
Fuzzy Clustering
RJEs: Remote job entry points
▪ Fuzzy clustering is a type of soft method in which a
data object may belong to more than one group or
cluster [1]
▪ Each dataset has a set of membership coefficients,
which depend on the degree of membership to be in a
cluster
▪ Fuzzy C-means algorithm is the example of this type
of clustering; it is sometimes also known as the Fuzzy
k-means algorithm
Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning
[2] https://2-bitbio.com/post/clustering-rnaseq-data-using-fuzzy-c-means-clustering/
▪ In the adjacent image, K-means clustering
produces output based on minimum distance
calculation and is an example of hard clustering[2]
▪ Fuzzy c-means perform soft clustering by giving a
membership coefficient to the data points
▪ Fuzzy clustering is used to solve multiclass or
ambiguous clustering problems. 136
Applications of Clustering
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ In Identification of Cancer Cells: The clustering algorithms are widely used for
the identification of cancerous cells. It divides the cancerous and non-cancerous
data sets into different groups.
▪ In Search Engines: Search engines also work on the clustering technique. The
search result appears based on the closest object to the search query. It does it
by grouping similar data objects in one group that is far from the other dissimilar
objects. The accurate result of a query depends on the quality of the clustering
algorithm used.
▪ Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
▪ In Biology: It is used in the biology stream to classify different species of plants
and animals using the image recognition technique.
137
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ K-Means Clustering is an Unsupervised Learning
algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-
defined clusters that need to be created in the
process, as if K=3, there will be three clusters, and for
K=4, there will be four clusters, and so on.
▪ It is an iterative algorithm that divides the unlabelled
dataset into k different clusters in such a way that each
dataset belongs only one group that has similar
properties.
▪ It allows us to cluster the data into different groups and
a convenient way to discover the categories of groups
in the unlabelled dataset on its own without the need
for any training.
138
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ It is a centroid-based algorithm, where each cluster is associated with a centroid.
The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
▪ The algorithm takes the unlabelled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
▪ The k-means clustering algorithm mainly performs two
tasks:
▪ Determines the best value for K center points or
centroids by an iterative process.
▪ Assigns each data point to its closest k-center. Those
data points which are near to the particular k-center,
create a cluster.
▪ Hence each cluster has datapoints with some
commonalities, and it is away from other clusters. 139
How does the K-Means Algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Step-1: Select the number K to decide the number of clusters.
▪ Step-2: Select random K points or centroids. (It can be other from the input dataset).
▪ Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.
▪ Step-4: Calculate the variance and place a new centroid of each cluster.
▪ Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
▪ Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
▪ Step-7: The model is ready.
140
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Let's take number k of clusters, i.e., K=2, to identify the
dataset and to put them into different clusters. It means here
we will try to group these datasets into two different clusters.
We need to choose some random k points or centroid to
form the cluster. These points can be either the points from
the dataset or any other point. So, here we are selecting the
below two points as k points, which are not the part of our
dataset.
▪ Now we will assign each data point of the scatter plot to its
closest K-point or centroid. We will compute it by applying
some mathematics that we have studied to calculate the
distance between two points. So, we will draw a median
between both the centroids.
141
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Points left side of the line is near to the K1 or
blue centroid, and points to the right of the line
are close to the yellow centroid.
▪ Let's color them as blue and yellow for clear
visualization.
142
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ As we need to find the closest cluster, so we will
repeat the process by choosing a new centroid. To
choose the new centroids, we will compute the center
of gravity of these centroids, and will find new
centroids as below
▪ Next, we will reassign each datapoint to the new
centroid. For this, we will repeat the same process
of finding a median line.
143
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ We can see, one yellow point is on the left side of the line,
and two blue points are right to the line. So, these three
points will be reassigned to new centroids
▪ As reassignment has taken place, so
we will again go to the step-4, which is
finding new centroids or K-points.
▪ We will repeat the process by finding
the center of gravity of centroids, so the
new centroids will be as shown in the
image
144
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ As we got the new centroids so again will draw the
median line and reassign the data points.
▪ We can see in the image; there are no dissimilar
data points on either side of the line, which means
our model is formed.
145
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ As our model is ready, so we can now remove the
assumed centroids, and the two final clusters will be
as shown in the below image
146
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ How to choose the value of "K number of clusters" in K-means
Clustering?
▪ The performance of the K-means clustering algorithm depends upon highly
efficient clusters that it forms. But choosing the optimal number of clusters is a
big task.
▪ There are some different ways to find the optimal number of clusters, but here
we are discussing the most appropriate method to find the number of clusters
or value of K.
▪ Elbow Method
147
Elbow Method
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ The Elbow method is one of the most popular ways to find the optimal number of
clusters.
▪ This method uses the concept of WCSS value. WCSS stands for Within Cluster
Sum of Squares, which defines the total variations within a cluster. The formula
to calculate the value of WCSS (for 3 clusters) is given below
▪ ∑Pi in Cluster1 distance (Pi C1)2: It is the sum of the square of the distances between
each data point and its centroid within a cluster1 and the same for the other two
terms.
148
K-Means Clustering Algorithm
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ To measure the distance between data points and centroid, we can use any
method such as Euclidean distance or Manhattan distance.
▪ To find the optimal value of clusters, the elbow method follows the below steps:
▪ It executes the K-means clustering on a given dataset for different K values
(ranges from 1-10).
▪ For each value of K, calculates the WCSS value.
▪ Plots a curve between calculated WCSS values and the number of clusters K.
▪ The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K.
▪ Since the graph shows the sharp bend, which looks like
an elbow, hence it is known as the elbow method.
149
Decision Tree Classification Algorithm
RJEs: Remote job entry points https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm
▪ Decision Tree is a Supervised learning technique that
can be used for both classification and Regression
problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a
dataset, branches represent the decision rules and
each leaf node represents the outcome.
▪ In a Decision tree, there are two nodes, which are the
Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
▪ The decisions or the test are performed on the basis of
features of the given dataset.
▪ It is a graphical representation for getting all the
possible solutions to a problem/decision based on
given conditions.
150
Decision Tree Classification Algorithm
RJEs: Remote job entry points https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm
▪ It is called a decision tree because, similar to a
tree, it starts with the root node, which expands
on further branches and constructs a tree-like
structure.
▪ In order to build a tree, we use the CART
algorithm, which stands for Classification and
Regression Tree algorithm.
▪ A decision tree simply asks a question, and
based on the answer (Yes/No), it further split
the tree into subtrees.
151
Why use Decision Trees?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Decision Trees usually mimic human thinking ability while making a decision, so it is easy
to understand.
▪ The logic behind the decision tree can be easily understood because it shows a tree-like
structure.
▪ Decision Tree Terminologies
▪ Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
▪ Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
▪ Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
▪ Branch/Sub Tree: A tree formed by splitting the tree.
▪ Pruning: Pruning is the process of removing the unwanted branches from the tree.
▪ Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes. 152
How does the Decision Tree algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ In a decision tree, for predicting the class of the given dataset, the algorithm starts from
the root node of the tree. This algorithm compares the values of root attribute with the
record (real dataset) attribute and, based on the comparison, follows the branch and
jumps to the next node.
▪ For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:
▪ Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
▪ Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
▪ Step-3: Divide the S into subsets that contains possible values for the best attributes.
▪ Step-4: Generate the decision tree node, which contains the best attribute.
▪ Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.
153
How does the Decision Tree algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Example: Suppose there is a candidate who
has a job offer and wants to decide whether
he should accept the offer or Not.
▪ So, to solve this problem, the decision tree
starts with the root node (Salary attribute by
ASM).
▪ The root node splits further into the next
decision node (distance from the office) and
one leaf node based on the corresponding
labels.
▪ The next decision node further gets split into
one decision node (Cab facility) and one leaf
node. Finally, the decision node splits into two
leaf nodes (Accepted offers and Declined
offer).
154
How does the Decision Tree algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Attribute Selection Measures
▪ While implementing a Decision tree, the
main issue arises that how to select the
best attribute for the root node and for sub-
nodes. So, to solve such problems there is
a technique which is called as Attribute
selection measure or ASM. By this
measurement, we can easily select the
best attribute for the nodes of the tree.
There are two popular techniques for ASM,
which are:
▪ Information Gain
▪ Gini Index
155
How does the Decision Tree algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Information Gain:
▪ Information gain is the measurement of
changes in entropy after the segmentation of
a dataset based on an attribute.
▪ It calculates how much information a feature
provides us about a class.
▪ According to the value of information gain,
we split the node and build the decision tree.
▪ A decision tree algorithm always tries to
maximize the value of information gain, and a
node/attribute having the highest information
gain is split first. It can be calculated using
the below formula:
Information Gain= Entropy (S) - [(Weighted Avg) *Entropy(each feature)]
156
How does the Decision Tree algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Entropy: Entropy is a metric to measure the
impurity in a given attribute. It specifies
randomness in data.
Entropy(S)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
•S= Total number of samples
•P (yes)= probability of yes
•P (no)= probability of no
157
How does the Decision Tree algorithm Work?
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Gini Index:
•Gini index is a measure of impurity or purity used while creating a decision tree in
the CART (Classification and Regression Tree) algorithm.
•An attribute with the low Gini index should be preferred as compared to the high
Gini index.
•It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
•Gini index can be calculated using the below formula:
158
Pruning: Getting an Optimal Decision tree
RJEs: Remote job entry points
Ref: https://www.javatpoint.com/supervised-machine-learning, https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/
▪ Pruning is a process of deleting the unnecessary nodes from a tree in order to get
the optimal decision tree.
▪ A too-large tree increases the risk of overfitting, and a small tree may not capture
all the important features of the dataset.
▪ Therefore, a technique that decreases the size of the learning tree without
reducing accuracy is known as Pruning.
▪ There are mainly two types of tree pruning technology used:
▪ Cost Complexity Pruning
▪ Reduced Error Pruning
159
Pruning: Getting an Optimal Decision tree
RJEs: Remote job entry points
Ref: https://www.javatpoint.com/supervised-machine-learning, https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/
160
Pruning: Getting an Optimal Decision tree
RJEs: Remote job entry points
Ref: https://www.javatpoint.com/supervised-machine-learning, https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/ [2]
161
Advantages/Disadvantages of the Decision Tree
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Advantages of the Decision Tree
▪ It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
▪ It can be very useful for solving decision-related problems.
▪ It helps to think about all the possible outcomes for a problem.
▪ There is less requirement of data cleaning compared to other algorithms.
▪ Disadvantages of the Decision Tree
▪ The decision tree contains lots of layers, which makes it complex.
▪ It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
▪ For more class labels, the computational complexity of the decision tree may
increase.
162
Python Implementation of Decision Tree
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Data Pre-processing step
▪ Fitting a Decision-Tree algorithm to the Training set
▪ Predicting the test result
▪ Test accuracy of the result (Creation of Confusion matrix)
▪ Visualizing the test set result
163
Data Pre-Processing Step
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
1.# importing libraries
2.import numpy as nm
3.import matplotlib.pyplot as mtp
4.import pandas as pd
5.#importing datasets
6.data_set= pd.read_csv('user_data.csv')
7.#Extracting Independent and dependent Variable
8.x= data_set.iloc[:, [2,3]].values
9.y= data_set.iloc[:, 4].values
10.# Splitting the dataset into training and test set.
11.from sklearn.model_selection import train_test_split
12.x_train, x_test, y_train, y_test= train_test_split (x, y, test_size= 0.25, random_state=0)
13. #feature Scaling
14.from sklearn.preprocessing import StandardScaler
15.st_x= StandardScaler()
16.x_train= st_x.fit_transform(x_train)
17.x_test= st_x.transform(x_test)
164
Data Pre-Processing Step
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
165
Fitting a Decision-Tree algorithm to the Training set
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Now we will fit the model to the training set. For this, we will import the
DecisionTreeClassifier class from sklearn.tree library. Below is the code for
it:
1.#Fitting Decision Tree classifier to the training set
2.From sklearn.tree import DecisionTreeClassifier
3.classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
4.classifier.fit(x_train, y_train)
▪ In the above code, we have created a classifier object, in which we have
passed two main parameters;
▪ "criterion='entropy': Criterion is used to measure the quality of split, which is
calculated by information gain given by entropy.
▪ random_state=0": For generating the random states.
166
Fitting a Decision-Tree algorithm to the Training set
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Now we will fit the model to the training set. For this, we will import
the DecisionTreeClassifier class from sklearn.tree library. Below is the code for it:
1.#Fitting Decision Tree classifier to the training set
2.From sklearn.tree import DecisionTreeClassifier
3.classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
4.classifier.fit(x_train, y_train)
▪ In the above code, we have created a classifier
object, in which we have passed two main
parameters;
▪ "criterion='entropy': Criterion is used to
measure the quality of split, which is calculated
by information gain given by entropy.
▪ random_state=0": For generating the random
states.
Out[8]:
DecisionTreeClassifier(class_weight=
None, criterion='entropy',
max_depth=None,
max_features=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
presort=False, random_state=0,
167
Fitting a Decision-Tree algorithm to the Training set
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Out[8]: DecisionTreeClassifier(class_weight =None, criterion='entropy',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False,
random_state=0, splitter='best')
168
Predicting the test result
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
Now we will predict the test set result. We will create a new prediction
vector y_pred. Below is the code for it:
1.#Predicting the test set result
2.y_pred= classifier.predict(x_test)
In the below output image, the
predicted output and real test output
are given. We can clearly see that
there are some values in the
prediction vector, which are different
from the real vector values. These are
prediction errors.
169
Predicting the test result
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
170
Test accuracy of the result (Creation of Confusion matrix)
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ In the above output, we have seen that there were
some incorrect predictions, so if we want to know the
number of correct and incorrect predictions, we need
to use the confusion matrix. Below is the code for it:
1.#Creating the Confusion matrix
2.from sklearn.metrics import confusion_matrix
3.cm= confusion_matrix(y_test, y_pred)
▪ In the above output image, we can see the confusion
matrix, which has 6+3= 9 incorrect predictions and
62+29=91 correct predictions.
171
Visualizing the training set result:
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Here we will visualize the training set result. To visualize the training set result we
will plot a graph for the decision tree classifier. The classifier will predict yes or No
for the users who have either Purchased or Not purchased the SUV through
Logistic Regression. Below is the code for it:
▪ The above output is completely different from
the rest classification models. It has both
vertical and horizontal lines that are splitting the
dataset according to the age and estimated
salary variable.
▪ As we can see, the tree is trying to capture each
dataset, which is the case of overfitting.
172
Visualizing the training set result:
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
1.#Visulaizing the trianing set result
2.from matplotlib.colors import ListedColormap
3.x_set, y_set = x_train, y_train
4.x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
5.nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.
ravel()]).T).reshape(x1.shape),
7.alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8.mtp.xlim(x1.min(), x1.max())
9.mtp.ylim(x2.min(), x2.max())
10.fori, j in enumerate(nm.unique(y_set)):
11.mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13.mtp.title('Decision Tree Algorithm (Training set)')
14.mtp.xlabel('Age')
15.mtp.ylabel('Estimated Salary')
16.mtp.legend()
17.mtp.show()
173
Visualizing the test set result:
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ Visualization of test set result will
be similar to the visualization of
the training set except that the
training set will be replaced with
the test set.
1.#Visulaizing the test set result
2.from matplotlib.colors import ListedColormap
3.x_set, y_set = x_test, y_test
4.x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
5.nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.
ravel()]).T).reshape(x1.shape),
7.alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8.mtp.xlim(x1.min(), x1.max())
9.mtp.ylim(x2.min(), x2.max())
10.fori, j in enumerate(nm.unique(y_set)):
11.mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13.mtp.title('Decision Tree Algorithm(Test set)')
14.mtp.xlabel('Age')
15.mtp.ylabel('Estimated Salary')
16.mtp.legend()
17.mtp.show()
174
Visualizing the test set result:
RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning
▪ As we can see in the above
image that there are some
green data points within the
purple region and vice
versa.
▪ So, these are the incorrect
predictions which we have
discussed in the confusion
matrix.
175
K-Nearest Neighbor (KNN) Algorithm
176
K-Nearest Neighbor (KNN) Algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique.
▪ K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the
category that is most similar to the available categories.
▪ K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new
data appears then it can be easily classified into a well suite category by using K- NN algorithm.
▪ K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems.
▪ K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
▪ It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action on the dataset.
▪ KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a
category that is much similar to the new data.
▪ Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or
dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the
similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either
cat or dog category.
177
K-Nearest Neighbor (KNN) Algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
178
Why do we need a K-NN Algorithm?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Suppose there are two categories, i.e., Category A and Category B, and we have
a new data point x1, so this data point will lie in which of these categories. To
solve this type of problem, we need a K-NN algorithm.
▪ With the help of K-NN, we can easily identify the category or class of a particular
dataset.
179
How does K-NN work?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Step-1: Select the number K of the
neighbors
▪ Step-2: Calculate the Euclidean distance
of K number of neighbors
▪ Step-3: Take the K nearest neighbors as
per the calculated Euclidean distance.
▪ Step-4: Among these k neighbors, count
the number of the data points in each
category.
▪ Step-5: Assign the new data points to
that category for which the number of the
neighbor is maximum.
180
How does K-NN work?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Suppose we have a new data point and
we need to put it in the required category.
Consider the image:
▪ Firstly, we will choose the number of
neighbors, so we will choose the k=5.
▪ Next, we will calculate the Euclidean
distance between the data points.
▪ The Euclidean distance is the distance
between two points, It can be calculated
as follows:
181
How does K-NN work?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ By calculating the Euclidean distance we
got the nearest neighbors, as three nearest
neighbors in category A and two nearest
neighbors in category B. Consider the
below image:
▪ As we can see the 3 nearest neighbors are
from category A, hence this new data point
must belong to category A.
182
How to select the value of K in the K-NN Algorithm?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Below are some points to remember while selecting the value of K in
the K-NN algorithm:
▪ There is no particular way to determine the best value for "K", so
we need to try some values to find the best out of them.
▪ The most preferred value for K is 5.
▪ A very low value for K such as K=1 or K=2, can be noisy and lead to
the effects of outliers in the model.
183
Advantages / Disadvantages of KNN Algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Advantages of KNN Algorithm
▪ It is simple to implement
▪ It is robust to the noisy training data
▪ It can be more effective if the training data is large.
▪ Disadvantages of KNN Algorithm
▪ Always needs to determine the value of K which may be complex some time.
▪ The computation cost is high because of calculating the distance
between the data points for all the training samples.
184
Python implementation of the KNN algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Problem statement: There is a Car
manufacturer company that has
manufactured a new SUV car.
▪ The company wants to give the ads to the
users who are interested in buying that
SUV.
▪ So for this problem, we have a dataset that
contains multiple user's information
through the social network.
▪ The dataset contains lots of information but
the Estimated Salary and Age we will
consider for the independent variable and
the Purchased variable is for the
dependent variable.
185
Steps to implement the K-NN algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Data Pre-processing step
▪ Fitting the K-NN algorithm to the Training set
▪ Predicting the test result
▪ Test accuracy of the result(Creation of Confusion matrix)
▪ Visualizing the test set result.
186
Data Pre-Processing Step
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
1.# importing libraries
2.import numpy as nm
3.import matplotlib.pyplot as mtp
4.import pandas as pd
5. #importing datasets
6.data_set= pd.read_csv('user_data.csv')
7. #Extracting Independent and dependent Variable
8.x= data_set.iloc[:, [2,3]].values
9.y= data_set.iloc[:, 4].values
10. # Splitting the dataset into training and test set.
11.from sklearn.model_selection import train_test_split
12.x_train, x_test, y_train, y_test= train_test_split (x, y, test_size= 0.25, random_state=0)
13. #feature Scaling
14.from sklearn.preprocessing import StandardScaler
15.st_x= StandardScaler()
16.x_train= st_x.fit_transform(x_train)
17.x_test= st_x.transform(x_test)
187
Data Pre-Processing Step
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ By executing the
above code, our
dataset is imported
to our program and
well pre-
processed. After
feature scaling our
test dataset will
look like
▪ From the above
output image, we
can see that our
data is
successfully scaled
188
Fitting K-NN classifier to the Training data
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Now we will fit the K-NN classifier to the training data.
▪ To do this we will import the KNeighborsClassifier class of Sklearn Neighbors library.
▪ After importing the class, we will create the Classifier object of the class.
▪ The Parameter of this class will ben_neighbors: To define the required neighbors of the
algorithm. Usually, it takes 5.
▪ metric='minkowski': This is the default parameter and it decides the distance between the
points.
▪ p=2: It is equivalent to the standard Euclidean metric.
▪ And then we will fit the classifier to the training data.
1.#Fitting K-
NN classifier to the training set
2.from sklearn.neighbors import KNeigh
borsClassifier
3.classifier= KNeighborsClassifier(n_nei
ghbors=5, metric='minkowski', p=2 )
4.classifier.fit(x_train, y_train)
189
Python implementation of the KNN algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
Output: By executing the above code, we will get
the output as:
Out[10]:
KNeighborsClassifier
(algorithm='auto', leaf_size=30,
metric='minkowski',
metric_params=None,
n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
▪ Predicting the Test Result: To predict the
test set result
1.#Predicting the test set result
2.y_pred= classifier.predict (x_test)
190
Creating the Confusion Matrix
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Now we will create the Confusion Matrix for our
K-NN model to see the accuracy of the classifier.
▪ #Creating the Confusion matrix
▪ from sklearn.metrics import confusion_matrix
▪ cm= confusion_matrix (y_test, y_pred)
▪ In above code, we have imported the
confusion_matrix function and called it using the
variable cm.
▪ Output: By executing the above code, we will get
the matrix as shown in the image:
191
Confusion Matrix
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ In the image, we can see
there are 64+29= 93 correct
predictions
▪ 3+4= 7 incorrect predictions
192
Visualizing the Training set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
1.#Visulaizing the trianing set result
2.from matplotlib.colors import ListedColormap
3.x_set, y_set = x_train, y_train
4.x1, x2 = nm.meshgrid (nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
5.nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.
ravel()]).T).reshape(x1.shape),
7.alpha = 0.75, cmap = ListedColormap(('red','green' )))
8.mtp.xlim(x1.min(), x1.max())
9.mtp.ylim(x2.min(), x2.max())
10.for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13.mtp.title('K-NN Algorithm (Training set)')
14.mtp.xlabel('Age')
15.mtp.ylabel('Estimated Salary')
16.mtp.legend()
17.mtp.show()
193
Visualizing the Training set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ The above graph is showing the output
for the test data set.
▪ As we can see in the graph, the
predicted output is well good as most of
the red points are in the red region and
most of the green points are in the green
region
▪ However, there are few green points in
the red region and a few red points in the
green region. So these are the incorrect
observations that we have observed in
the confusion matrix (7 Incorrect output).
194
Support Vector Machine
195
Support Vector Machine
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Support Vector Machine or SVM is one of
the most popular Supervised Learning
algorithms, which is used for Classification
as well as Regression problems.
▪ However, primarily, it is used for
Classification problems in Machine
Learning.
▪ The goal of the SVM algorithm is to create
the best line or decision boundary that can
segregate n-dimensional space into
classes so that we can easily put the new
data point in the correct category in the
future.
196
Support Vector Machine
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ This best decision boundary is called a
hyperplane.
▪ SVM chooses the extreme
points/vectors that help in creating the
hyperplane.
▪ These extreme cases are called as
support vectors, and hence algorithm is
termed as Support Vector Machine.
▪ Consider the diagram in which there are
two different categories that are
classified using a decision boundary or
hyperplane:
197
Support Vector Machine
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Example: Suppose we see a strange cat that also has some features of dogs, so
if we want a model that can accurately identify whether it is a cat or dog, so such
a model can be created by using the SVM algorithm.
▪ We will first train our model with lots of images of cats and dogs so that it can
learn about different features of cats and dogs, and then we test it with this
strange creature.
▪ So as support vector creates a decision
boundary between these two data (cat
and dog) and choose extreme cases
(support vectors), it will see the extreme
case of cat and dog. On the basis of the
support vectors, it will classify it as a cat.
Consider the below diagram:
198
Types of SVM
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
SVM can be of two types:
▪ Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
▪ Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier used is called as Non-linear
SVM classifier.
199
Hyperplane and Support Vectors in the SVM algorithm
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space, but we need to find out the best decision boundary
that helps to classify the data points. This best boundary is known as the
hyperplane of SVM.
▪ The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be a
straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane.
▪ We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
▪ The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors
support the hyperplane, hence called a Support vector.
200
How does SVM works?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Linear SVM:
▪ The working of the SVM algorithm can be understood by using an example.
▪ Suppose we have a dataset that has two tags (green and blue), and the dataset
has two features x1 and x2.
▪ We want a classifier that can classify the pair (x1,
x2) of coordinates in either green or blue. Consider
the image:
▪ So as it is 2-d space so by just using a straight line,
we can easily separate these two classes. But there
can be multiple lines that can separate these
classes. Consider the below image:
201
How does SVM works?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Hence, the SVM algorithm helps to find the best
line or decision boundary; this best boundary or
region is called as a hyperplane.
▪ SVM algorithm finds the closest point of the lines
from both the classes. These points are called
support vectors.
▪ The distance between the vectors and the
hyperplane is called as margin.
▪ And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called
the optimal hyperplane.
202
How does SVM works?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Non-Linear SVM:
▪ If data is linearly arranged, then we can separate it
by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the
below image:
▪ So to separate these data points, we need to add
one more dimension.
▪ For linear data, we have used two dimensions x and
y, so for non-linear data, we will add a third
dimension z. It can be calculated as:
z = x2 +y2
203
How does SVM works?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ By adding the third dimension, the
sample space will become as below
image:
▪ So now, SVM will divide the datasets into
classes in the following way.
z = x2 +y2
204
How does SVM works?
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Since we are in 3-d Space, hence it is
looking like a plane parallel to the x-axis.
▪ If we convert it in 2d space with z=1, then
it will become as:
z = x2 +y2
205
Working of Unsupervised Learning
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Hence we get a circumference of radius 1 in case of non-linear data.
z = x2 +y2
206
Data Pre-processing step
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
1.#Data Pre-processing Step
2.# importing libraries
3.import numpy as nm
4.import matplotlib.pyplot as mtp
5.import pandas as pd
6. #importing datasets
7.data_set= pd.read_csv ('user_data.csv')
8. #Extracting Independent and dependent Variable
9.x= data_set.iloc [:, [2,3]].values
10.y= data_set.iloc [:, 4].values
11.# Splitting the dataset into training and test set.
12.from sklearn.model_selection import train_test_split
13.x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
14.#feature Scaling
15.from sklearn.preprocessing import StandardScaler
16.st_x= StandardScaler()
17.x_train= st_x.fit_transform(x_train)
18.x_test= st_x.transform(x_test)
207
Data Pre-processing step
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ After executing the above code,
we will pre-process the data. The
code will give the dataset as:
208
Data Pre-processing step
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
The scaled output for the test set will be:
209
Fitting the SVM classifier to the training set
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Now the training set will be fitted to the SVM classifier.
▪ To create the SVM classifier, we will import SVC class from Sklearn.svm library.
▪ Below is the code for it:
▪ from sklearn.svm import SVC # "Support vector classifier"
▪ classifier = SVC(kernel='linear', random_state=0)
▪ classifier.fit(x_train, y_train)
▪ In the above code, we have used kernel='linear', as here we are creating SVM
for linearly separable data. However, we can change it for non-linear data. And
then we fitted the classifier to the training dataset(x_train, y_train)
210
Output
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
Out[8]:
SVC (C=1.0,
cache_size=200,
class_weight=None,
coef0=0.0,
decision_function_shape='ovr',
degree=3,
gamma='auto_deprecated',
kernel='linear',
max_iter=-1,
probability=False,
random_state=0,
shrinking=True,
tol=0.001,
verbose=False)
▪ The model performance can be altered
by changing the value of
▪ C (Regularization factor),
▪ gamma,
▪ kernel.
211
Predicting the test set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Now, we will predict the output for test set.
For this, we will create a new vector y_pred.
Below is the code for it:
▪ #Predicting the test set result
▪ y_pred= classifier.predict (x_test)
▪ After getting the y_pred vector, we can
compare the result of y_pred and y_test to
check the difference between the actual
value and predicted value.
▪ Output: Image is the output for the
prediction of the test set:
212
Creating the confusion matrix
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Now we will see the performance of the SVM
classifier that how many incorrect predictions are
there.
▪ To create the confusion matrix, we need to import
the confusion_matrix function of the sklearn library.
▪ After importing the function, we will call it using a new
variable cm.
▪ The function takes two parameters, mainly y_true
(the actual values) and y_pred (the targeted value
return by the classifier).
1.#Creating the Confusion matrix
2.from sklearn.metrics import confusion_matrix
3.cm= confusion_matrix(y_test, y_pred)
213
Creating the confusion matrix
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ As we can see in the output
image, there are 66+24= 90
correct predictions and 8+2= 10
correct predictions.
214
Visualizing the training set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
1.from matplotlib.colors import ListedColormap
2.x_set, y_set = x_train, y_train
3.x1, x2 = nm.meshgrid (nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
4.nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
5.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
6.alpha = 0.75, cmap = ListedColormap(('red', 'green')))
7.mtp.xlim(x1.min(), x1.max())
8.mtp.ylim(x2.min(), x2.max())
9.for i, j in enumerate(nm.unique(y_set)):
10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
11. c = ListedColormap(('red', 'green'))(i), label = j)
12.mtp.title('SVM classifier (Training set)')
13.mtp.xlabel('Age')
14.mtp.ylabel('Estimated Salary')
15.mtp.legend()
16.mtp.show()
215
Visualizing the training set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Output
▪ As we can see, the above output is appearing
similar to the Logistic regression output.
▪ In the output, we got the straight line as
hyperplane because we have used a linear
kernel in the classifier.
▪ We have also discussed above that for the 2d
space, the hyperplane in SVM is a straight
line.
216
Visualizing the test set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
1.#Visulaizing the test set result
2.from matplotlib.colors import ListedColormap
3.x_set, y_set = x_test, y_test
4.x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
5.nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()
]).T).reshape(x1.shape),
7.alpha = 0.75, cmap = ListedColormap(('red','green' )))
8.mtp.xlim(x1.min(), x1.max())
9.mtp.ylim(x2.min(), x2.max())
10.for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13.mtp.title('SVM classifier (Test set)')
14.mtp.xlabel('Age')
15.mtp.ylabel('Estimated Salary')
16.mtp.legend()
17.mtp.show()
217
Visualizing the test set result
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ As we can see in the above output
image, the SVM classifier has divided the
users into two regions (Purchased or Not
purchased).
▪ Users who purchased the SUV are in the
red region with the red scatter points.
▪ Users who did not purchase the SUV are
in the green region with green scatter
points.
▪ The hyperplane has divided the two
classes into Purchased and not
purchased variable.
218
Semi-Supervised Learning
219
Semi-Supervised Learning
RJEs: Remote job entry points
▪ Also known as transductive learning
▪ Uses both labeled and unlabeled data to perform an otherwise supervised learning or unsupervised
learning task
▪ Initially motivated by its practical value in learning faster, better, and cheaper
▪ Has applications in cognitive psychology as a computational model for human learning
▪ It comprises a smaller part of the supervised learning method and a larger part of the unlabeled
component
▪ Some of the applications are in text classification, iterative co-training-based applications such as
webpage classification, lane finding on GPS data, etc.
Ref: https://pages.cs.wisc.edu/~jerryzhu/pub/SSL_EoML.pdf
220
Algorithm Flow
RJEs: Remote job entry points
▪ Semi-Supervised learning Algorithm Flow
Ref: https://www.cs.cmu.edu/~ninamf/courses/401sp18/lectures/ssl-04-18.pdf
▪ The models based on this are semi-supervised SVM, graph-based models, generative models, etc.
221
Major Kernel Functions in Support
Vector Machine
222
Major Kernel Functions in Support Vector Machine
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
What is Kernel Method?
▪ A set of techniques known as kernel methods are used in machine learning to address
classification, regression, and other prediction issues. They are built around the idea of kernels,
which are functions that gauge how similar two data points are to one another in a high-
dimensional feature space.
▪ Kernel methods' fundamental premise is used to convert the input data into a high-dimensional
feature space, which makes it simpler to distinguish between classes or generate predictions.
Kernel methods employ a kernel function to implicitly map the data into the feature space, as
opposed to manually computing the feature space.
▪ The most popular kind of kernel approach is the Support Vector Machine (SVM), a binary
classifier that determines the best hyperplane that most effectively divides the two groups. In
order to efficiently locate the ideal hyperplane, SVMs map the input into a higher-dimensional
space using a kernel function.
▪ Other examples of kernel methods include kernel ridge regression, kernel PCA, and Gaussian
processes. Since they are strong, adaptable, and computationally efficient, kernel approaches
are frequently employed in machine learning. They are resilient to noise and outliers and can
handle sophisticated data structures like strings and graphs.
223
Kernel Method in SVMs
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Support Vector Machines (SVMs) use kernel methods to transform the input data into a higher-
dimensional feature space, which makes it simpler to distinguish between classes or generate
predictions.
▪ Kernel approaches in SVMs work on the fundamental principle of implicitly mapping input data into
a higher-dimensional feature space without directly computing the coordinates of the data points in
that space.
▪ The kernel function in SVMs is essential in determining the decision boundary that divides the
various classes.
▪ In order to calculate the degree of similarity between any two points in the feature space, the
kernel function computes their dot product.
▪ The most commonly used kernel function in SVMs is the Gaussian or radial basis function (RBF)
kernel. The RBF kernel maps the input data into an infinite-dimensional feature space using a
Gaussian function. This kernel function is popular because it can capture complex nonlinear
relationships in the data.
224
Kernel Method in SVMs
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Other types of kernel functions that can be used in SVMs include the polynomial kernel, the
sigmoid kernel, and the Laplacian kernel. The choice of kernel function depends on the specific
problem and the characteristics of the data.
▪ Basically, kernel methods in SVMs are a powerful technique for solving classification and
regression problems, and they are widely used in machine learning because they can handle
complex data structures and are robust to noise and outliers.
225
Characteristics of Kernel Function
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Mercer's condition: A kernel function must satisfy Mercer's condition to be valid.
This condition ensures that the kernel function is positive semi definite, which
means that it is always greater than or equal to zero.
▪ Positive definiteness: A kernel function is positive definite if it is always greater
than zero except for when the inputs are equal to each other.
▪ Non-negativity: A kernel function is non-negative, meaning that it produces non-
negative values for all inputs.
▪ Symmetry: A kernel function is symmetric, meaning that it produces the same
value regardless of the order in which the inputs are given.
226
Characteristics of Kernel Function
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Reproducing property: A kernel function satisfies the reproducing property if it
can be used to reconstruct the input data in the feature space.
▪ Smoothness: A kernel function is said to be smooth if it produces a smooth
transformation of the input data into the feature space.
▪ Complexity: The complexity of a kernel function is an important consideration,
as more complex kernel functions may lead to over fitting and reduced
generalization performance.
227
Selecting an appropriate kernel function
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ Basically, the choice of kernel function depends on the specific problem and the characteristics of
the data, and selecting an appropriate kernel function can significantly impact the performance of
machine learning algorithms.
▪ Major Kernel Function in Support Vector Machine
▪ In Support Vector Machines (SVMs), there are several types of kernel functions that can be used
to map the input data into a higher-dimensional feature space. The choice of kernel function
depends on the specific problem and the characteristics of the data.
228
Linear Kernel
RJEs: Remote job entry points https://www.javatpoint.com/
▪ A linear kernel is a type of kernel function used in machine learning, including in SVMs (Support
Vector Machines). It is the simplest and most commonly used kernel function, and it defines the dot
product between the input vectors in the original feature space.
▪ The linear kernel can be defined as:
K(x, y) = x.y
▪ Where x and y are the input feature vectors.
▪ The dot product of the input vectors is a measure of their similarity or distance in the original feature
space.
▪ When using a linear kernel in an SVM, the decision boundary is a linear hyperplane that separates
the different classes in the feature space.
▪ This linear boundary can be useful when the data is already separable by a linear decision boundary
or when dealing with high-dimensional data, where the use of more complex kernel functions may
lead to overfitting 229
Polynomial Kernel
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ It is a nonlinear kernel function that employs polynomial functions to transfer the input data into a
higher-dimensional feature space.
▪ One definition of the polynomial kernel is:
▪ Where x and y are the input feature vectors, c is a constant term, and d is the degree of the
polynomial,
K(x, y) = (x. y + c)d.
▪ The constant term is added to, and the dot product of the input vectors elevated to the degree of
the polynomial.
▪ The decision boundary of an SVM with a polynomial kernel might capture more intricate
correlations between the input characteristics because it is a nonlinear hyperplane.
▪ The degree of nonlinearity in the decision boundary is determined by the degree of the polynomial
230
Polynomial Kernel
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ The polynomial kernel has the benefit of being able to detect both linear and nonlinear correlations in the data.
▪ It can be difficult to select the proper degree of the polynomial, though, as a larger degree can result in overfitting
while a lower degree cannot adequately represent the underlying relationships in the data.
▪ In general, the polynomial kernel is an effective tool for converting the input data into a higher-dimensional feature
space in order to capture nonlinear correlations between the input characteristics.
Gaussian (RBF) Kernel
The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a popular kernel function used in machine
learning, particularly in SVMs (Support Vector Machines). It is a nonlinear kernel function that maps the input data into a
higher-dimensional feature space using a Gaussian function.
The Gaussian kernel can be defined as:
1.K(x, y) = exp(-gamma * ||x - y||^2)
231
Gaussian (RBF) Kernel
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
K(x, y) = exp(-gamma * ||x - y||^2)
▪ One advantage of the Gaussian kernel is its ability to capture complex relationships in the data
without the need for explicit feature engineering.
▪ However, the choice of the gamma parameter can be challenging, as a smaller value may result in
under fitting, while a larger value may result in over fitting.
232
Laplace Kernel
RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
▪ The Laplacian kernel, also known as the Laplace kernel or the exponential kernel, is a type of kernel function used in
machine learning, including in SVMs (Support Vector Machines). It is a non-parametric kernel that can be used to
measure the similarity or distance between two input feature vectors.
▪ The Laplacian kernel can be defined as:
K(x, y) = exp(-gamma * ||x - y||)
▪ Where x and y are the input feature vectors, gamma is a parameter that controls the width of the Laplacian function, and
||x - y|| is the L1 norm or Manhattan distance between the input vectors.
▪ When using a Laplacian kernel in an SVM, the decision boundary is a nonlinear hyperplane that can capture complex
relationships between the input features. The width of the Laplacian function, controlled by the gamma parameter,
determines the degree of nonlinearity in the decision boundary.
▪ One advantage of the Laplacian kernel is its robustness to outliers, as it places less weight on large distances between the
input vectors than the Gaussian kernel. However, like the Gaussian kernel, choosing the correct value of the gamma
parameter can be challenging.
233
Reinforcement Learning
234
Reinforcement Learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
237
▪ What is Reinforcement Learning?
▪ Terms used in Reinforcement Learning.
▪ Key features of Reinforcement Learning.
▪ Elements of Reinforcement Learning.
▪ Approaches to implementing Reinforcement Learning.
▪ How does Reinforcement Learning Work?
▪ The Bellman Equation.
▪ Types of Reinforcement Learning.
▪ Reinforcement Learning Algorithm.
▪ Markov Decision Process.
▪ What is Q-Learning?
▪ Difference between Supervised Learning and Reinforcement Learning.
▪ Applications of Reinforcement Learning.
▪ Conclusion.
Reinforcement Learning Tutorial
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
238
▪ Reinforcement Learning is a feedback-based
Machine learning technique in which an agent
learns to behave in an environment by
performing the actions and seeing the results
of actions. For each good action, the agent
gets positive feedback, and for each bad
action, the agent gets negative feedback or
penalty.
▪ In Reinforcement Learning, the agent learns
automatically using feedbacks without any
labeled data, unlike supervised learning.
▪ Since there is no labeled data, so the agent is
bound to learn by its experience only.
Reinforcement Learning Tutorial
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
239
▪ RL solves a specific type of problem where
decision making is sequential, and the goal is
long-term, such as game-playing, robotics,
etc.
▪ The agent interacts with the environment and
explores it by itself.
▪ The primary goal of an agent in reinforcement
learning is to improve the performance by
getting the maximum positive rewards.
Reinforcement Learning Tutorial
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
240
▪ The agent learns with the process of hit and trial, and based on the experience, it
learns to perform the task. Hence, we can say that "Reinforcement learning is a
type of machine learning method where an intelligent agent (computer
program) interacts with the environment and learns to act within that."
▪ It is a core part of Artificial intelligence, and all AI
agent works on the concept of reinforcement learning.
Here we do not need to pre-program the agent, as it
learns from its own experience without any human
intervention.
▪ Example: Suppose there is an AI agent present within a
maze environment, and his goal is to find the diamond.
The agent interacts with the environment by performing
some actions, and based on those actions, the state of
the agent gets changed, and it also receives a reward or
penalty as feedback.
Reinforcement Learning Tutorial
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
241
▪ The agent continues doing these three
things (take action, change state/remain
in the same state, and get feedback),
and by doing these actions, he learns and
explores the environment.
▪ The agent learns that what actions lead to
positive feedback or rewards and what
actions lead to negative feedback penalty.
As a positive reward, the agent gets a
positive point, and as a penalty, it gets a
negative point.
Terms used in Reinforcement Learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
242
▪ Agent(): An entity that can perceive/explore the environment and act upon it.
▪ Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the
stochastic environment, which means it is random in nature.
▪ Action(): Actions are the moves taken by an agent within the environment.
▪ State(): State is a situation returned by the environment after each action taken by the agent.
▪ Reward(): A feedback returned to the agent from the environment to evaluate the action.
▪ Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
▪ Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
reward.
▪ Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action.
Key Features of Reinforcement Learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
243
▪ In RL, the agent is not instructed about the
environment and what actions need to be taken.
▪ It is based on the hit and trial process.
▪ The agent takes the next action and changes
states according to the feedback of the previous
action.
▪ The agent may get a delayed reward.
▪ The environment is stochastic, and the agent
needs to explore it to reach to get the maximum
positive rewards.
Approaches to implement Reinforcement Learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
244
▪ Value-based:
The value-based approach is about to find the optimal value function, which is the maximum value
at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under
policy π.
▪ Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without using
the value function. In this approach, the agent tries to apply such a policy that the action
performed in each step helps to maximize the future reward.
▪ The policy-based approach has mainly two types of policy:
▪ Deterministic: The same action is produced by the policy (π) at any state.
▪ Stochastic: In this policy, probability determines the produced action.
▪ Model-based: In the model-based approach, a virtual model is created for the environment, and
the agent explores that environment to learn it. There is no particular solution or algorithm for this
approach because the model representation is different for each environment.
Elements of Reinforcement Learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
245
1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment
Policy
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
246
▪ A policy can be defined as a way how an agent behaves at a given time.
▪ It maps the perceived states of the environment to the actions taken on those
states.
▪ A policy is the core element of the RL as it alone can define the behavior of the
agent.
▪ In some cases, it may be a simple function or a lookup table, whereas, for other
cases, it may involve general computation as a search process.
▪ It could be deterministic or a stochastic policy:
For deterministic policy: a = π(s)
For stochastic policy: π(a | s) = P[At =a | St = s]
Reward Signal
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
247
▪ The goal of reinforcement learning is defined by the
reward signal.
▪ At each state, the environment sends an immediate
signal to the learning agent, and this signal is known as
a reward signal.
▪ These rewards are given according to the good and bad
actions taken by the agent.
▪ The agent's main objective is to maximize the total
number of rewards for good actions.
▪ The reward signal can change the policy, such as if an
action selected by the agent leads to low reward, then
the policy may change to select other actions in the
future.
Value Function
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
248
▪ The value function gives information about
how good the situation and action are and how
much reward an agent can expect.
▪ A reward indicates the immediate signal for
each good and bad action, whereas a value
function specifies the good state and action
for the future.
▪ The value function depends on the reward
as, without reward, there could be no value.
The goal of estimating values is to achieve
more rewards.
Model
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
249
▪ Model mimics the behavior of the environment.
With the help of the model, one can make
inferences about how the environment will behave.
Such as, if a state and an action are given, then a
model can predict the next state and reward.
▪ The model is used for planning, which means it
provides a way to take a course of action by
considering all future situations before actually
experiencing those situations. The approaches for
solving the RL problems with the help of the
model are termed as the model-based approach.
Comparatively, an approach without using a
model is called a model-free approach.
How does Reinforcement Learning Work?
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
250
▪ To understand the working process
of the RL, we need to consider two
main things:
▪ Environment: It can be anything
such as a room, maze, football
ground, etc.
▪ Agent: An intelligent agent such as
AI robot. Let's take an example of a
maze environment that the agent
needs to explore.
How does Reinforcement Learning Work?
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
251
▪ In the above image, the agent is at the very first
block of the maze. The maze is consisting of an
S6 block, which is a wall, S8 a fire pit, and
S4 a diamond block.
▪ The agent cannot cross the S6 block, as it is a solid
wall. If the agent reaches the S4 block, then get
the +1 reward; if it reaches the fire pit, then gets -1
reward point. It can take four actions: move up,
move down, move left, and move right.
▪ The agent can take any path to reach to the final
point, but he needs to make it in possible fewer
steps. Suppose the agent considers the path S9-S5-
S1-S2-S3, so he will get the +1-reward point.
▪ The agent will try to remember the preceding steps
that it has taken to reach the final step. To memorize
the steps, it assigns 1 value to each previous step.
How does Reinforcement Learning Work?
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
252
▪ Now, the agent has successfully stored the
previous steps assigning the 1 value to each
previous block.
▪ But what will the agent do if he starts moving
from the block, which has 1 value block on both
sides?
▪ It will be a difficult condition for the agent whether
he should go up or down as each block has the
same value. So, the above approach is not suitable
for the agent to reach the destination. Hence to
solve the problem, we will use the Bellman
equation, which is the main concept behind
reinforcement learning.
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
253
▪ The Bellman equation was introduced by the Mathematician Richard Ernest
Bellman in the year 1953, and hence it is called as a Bellman equation. It is
associated with dynamic programming and used to calculate the values of a
decision problem at a certain point by including the values of previous states.
▪ It is a way of calculating the value functions in dynamic programming or
environment that leads to modern reinforcement learning.
▪ The key-elements used in Bellman equations are:
▪ Action performed by the agent is referred to as "a"
▪ State occurred by performing the action is "s."
▪ The reward/feedback obtained for each good and bad action is "R."
▪ A discount factor is Gamma "γ.“
▪ The Bellman equation can be written as: V(s) = max [R(s,a) + γV(s`)]
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
254
▪ The Bellman equation can be written as:
V(s) = max [R(s,a) + γV(s`)]
Where,
▪ V(s)= value calculated at a particular point.
▪ R(s,a) = Reward at a particular state s by performing an action.
▪ γ = Discount factor
▪ V(s`) = The value at the previous state.
▪ In the above equation, we are taking the max of the complete values because the
agent tries to find the optimal solution always.
▪ So now, using the Bellman equation, we will find value at each state of the given
environment. We will start from the block, which is next to the target block.
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
255
▪ For 1st block:
▪ V(s3) = max [R(s,a) + γV(s`)],
▪ here V(s')= 0
▪ because there is no further state to move.
▪ V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
256
▪ For 2nd block:
▪ V(s2) = max [R(s,a) + γV(s`)],
▪ here γ= 0.9 (lets), V(s')= 1, and R(s, a)= 0,
▪ because there is no reward at this state.
▪ V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
257
▪ For 3rd block:
▪ V(s1) = max [R(s,a) + γV(s`)],
▪ here γ= 0.9 (lets),
▪ V(s')= 0.9, and R(s, a)= 0, because there is no
reward at this state also.
▪ V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
258
▪ For 4th block:
▪ V(s5) = max [R(s,a) + γV(s`)],
▪ here γ= 0.9 (lets),
▪ V(s')= 0.81, and
▪ R(s, a) = 0, because there is no reward at this state also.
▪ V(s5)= max[0.9(0.81)]=> V(s5)= max[0.9*0.81]=> V(s5) =0.73
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
259
▪ For 5th block:
▪ V(s9) = max [R(s,a) + γV(s`)],
▪ here γ= 0.9(lets),
▪ V(s')= 0.73, and R(s, a)= 0,
▪ because there is no reward at this state also.
▪ V(s9)= max[0.9(0.73)]=> V(s4)= max[0.9*0.73]=> V(s4) =0.66
The Bellman Equation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
260
▪ Now, the agent has three options to move
▪ if he moves to the blue box, then he will feel a
bump if he moves to the fire pit, then he will
get the -1 reward.
▪ But here we are taking only positive rewards,
so for this, he will move to upwards only.
▪ The complete block values will be calculated
using this formula.
Types of Reinforcement learning
261
Types of Reinforcement learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
262
▪ There are mainly two types of reinforcement learning, which are:
▪ Positive Reinforcement
▪ Negative Reinforcement
Positive Reinforcement
RJEs: Remote job entry points
https://www.javatpoint.com/reinforcement-learning, https://www.verywellmind.com/what-is-positive-reinforcement-2795412
263
▪ The positive reinforcement learning means adding something to increase the
tendency that expected behavior would occur again. It impacts positively on the
behavior of the agent and increases the strength of the behavior.
▪ This type of reinforcement can sustain the changes for a long time, but too much
positive reinforcement may lead to an overload of states that can reduce the
consequences.
Negative Reinforcement
RJEs: Remote job entry points
264
▪ The negative reinforcement
learning is opposite to the
positive reinforcement as it
increases the tendency that the
specific behavior will occur again
by avoiding the negative
condition.
▪ It can be more effective than the
positive reinforcement depending
on situation and behavior, but it
provides reinforcement only to
meet minimum behavior.
https://www.javatpoint.com/reinforcement-learning, https://www.parentingforbrain.com/negative-reinforcement/
Markov Decision Process (MDP)
265
How to represent the agent state?
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
266
▪ We can represent the agent state using the Markov State that contains all the
required information from the history.
▪ The State St is Markov state if it follows the given condition:
P[St+1 | St ] = P[St +1 | S1,......, St]
▪ The Markov state follows the Markov property, which says that the future is
independent of the past and can only be defined with the present.
▪ The RL works on fully observable environments, where the agent can observe
the environment and act for the new state. The complete process is known as
Markov Decision process
Markov Decision Process
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
267
▪ Markov Decision Process or MDP, is used
to formalize the reinforcement learning
problems. If the environment is completely
observable, then its dynamic can be
modeled as a Markov Process.
▪ In MDP, the agent constantly interacts
with the environment and performs
actions; at each action, the environment
responds and generates a new state.
▪ MDP is used to describe the
environment for the RL, and almost all the
RL problem can be formalized using MDP.
Markov Decision Process
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
268
▪ MDP contains a tuple of four elements
(S, A, Pa, Ra):
▪ A set of finite States S
▪ A set of finite Actions A
▪ R - Rewards received after transitioning
from state S to state S', due to action a.
▪ Probability Pa.
▪ MDP uses Markov property
Markov Property
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
269
▪ It says that "If the agent is present in the current
state S1, performs an action a1 and move to the
state s2, then the state transition from s1 to s2
only depends on the current state and future
action and states do not depend on past actions,
rewards, or states.“
▪ Or, in other words, as per Markov Property, the
current state transition does not depend on any
past action or state.
▪ Hence, MDP is an RL problem that satisfies the
Markov property. Such as in a Chess game, the
players only focus on the current state and do
not need to remember past actions or states.
Finite MDP
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
270
▪ A finite MDP is when there are finite states, finite rewards, and finite actions.
▪ In RL, we consider only the finite MDP.
Markov Process
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
271
▪ Markov Process is a memoryless process with a
sequence of random states S1, S2, ....., St that
uses the Markov Property.
▪ Markov process is also known as Markov chain,
which is a tuple (S, P) on state S and transition
function P.
▪ These two components (S and P) can define the
dynamics of the system.
Q-Learning
272
Q-Learning:
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
273
▪ Q-learning is an Off policy RL algorithm,
which is used for the temporal difference
Learning.
▪ The temporal difference learning methods are
the way of comparing temporally successive
predictions.
▪ It learns the value function Q (S, a), which
means how good to take action "a" at a
particular state "s.“
▪ The below flowchart explains the working of Q-
learning
State Action Reward State Action (SARSA)
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
274
▪ SARSA stands for State Action Reward State Action, which is an on-policy temporal difference
learning method. The on-policy control method selects the action for each state while learning
using a specific policy.
▪ The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and all
pairs of (s-a).
▪ The main difference between Q-learning and SARSA algorithms is that unlike Q-learning, the
maximum reward for the next state is not required for updating the Q-value in the table.
▪ In SARSA, new action and reward are selected using the same policy, which has
determined the original action.
▪ The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where,
s: Original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
Deep Q Neural Network (DQN)
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
275
▪ As the name suggests, DQN is a Q-learning using Neural networks.
▪ For a big state space environment, it will be a challenging and complex task to
define and update a Q-table.
▪ To solve such an issue, we can use a DQN algorithm. Where, instead of defining a
Q-table, neural network approximates the Q-values for each action and state.
Q-Learning Explanation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
276
▪ Q-learning is a popular model-free reinforcement learning algorithm based on the
Bellman equation.
▪ The main objective of Q-learning is to learn the policy which can inform the
agent that what actions should be taken for maximizing the reward under what
circumstances.
▪ It is an off-policy RL that attempts to find the best action to take at a current state.
▪ The goal of the agent in Q-learning is to maximize the value of Q.
▪ The value of Q-learning can be derived from the Bellman equation. Consider the
Bellman equation given below:
Q-Learning Explanation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
277
▪ In the equation, we have various components, including reward, discount factor (γ),
probability, and end states s’.
▪ But there is no any Q-value is given
▪ In the image, we can see there is an agent who has
three values options, V(s1), V(s2), V(s3). As this is
MDP, so agent only cares for the current state and
the future state. The agent can go to any direction
(Up, Left, or Right), so he needs to decide where to
go for the optimal path. Here agent will take a move
as per probability bases and changes the state. But
if we want some exact moves, so for this, we need
to make some changes in terms of Q-value.
Q-Learning Explanation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
278
▪ Q - represents the quality of the actions at each state.
▪ So instead of using a value at each state, we will use a pair of
state and action, i.e., Q(s, a).
▪ Q-value specifies that which action is better than others, and
according to the best Q-value, the agent takes his next move.
The Bellman equation can be used for deriving the Q-value.
▪ To perform any action, the agent will get a reward R(s, a), and
also he will end up on a certain state, so the Q -value equation
will be:
▪ Hence, we can say that, V(s) = max [Q(s, a)]
Q-Learning Explanation
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
279
▪ The Q stands for quality in Q-learning, which means it specifies the quality of
an action taken by the agent.
Q-table
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
280
▪ A Q-table or matrix is created while performing the Q-learning.
▪ The table follows the state and action pair, i.e., [s, a], and initializes the values
to zero.
▪ After each action, the table is updated, and the q-values are stored within the
table.
▪ The RL agent uses this Q-table as a reference table to select the best action
based on the q-values.
Difference Between Reinforcement Learning and Supervised Learning
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
281
Reinforcement Learning Supervised Learning
▪ RL works by interacting with the
environment.
▪ Supervised learning works on the existing
dataset.
▪ The RL algorithm works like the human
brain works when making some
decisions.
▪ Supervised Learning works as when a
human learns things in the supervision of
a guide.
▪ There is no labeled dataset is present ▪ The labeled dataset is present.
▪ No previous training is provided to the
learning agent.
▪ Training is provided to the algorithm so
that it can predict the output.
▪ RL helps to take decisions sequentially. ▪ In Supervised learning, decisions are
made when input is given.
Reinforcement Learning
RJEs: Remote job entry points
▪ There are various applications based on the concept of RL.
Ref: [1]https://www.javatpoint.com/reinforcement-learning;
[2]https://medium.com/@yuxili/rl-applications-73ef685c07eb
[1] [2]
282
Gaussian Mixture Model (GMM)
283
Gaussian Mixture Model (GMM)
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
284
▪ k-means exploits only mean of the cluster or distribution as
representation for class-specific information.
▪ Second order moment like variance also contains class-specific
information.
▪ Gaussian distribution can exploit both mean and variance.
▪ In case of scalar it is univariate and in case of vector it is multivariate
Gaussian distribution.
Univariate vs Multivariate Gaussian Distribution
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
285
Univariate vs Multivariate Gaussian Distribution
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
286
Univariate vs Multivariate Gaussian Distribution
RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning
287
Clustering using Multivariate Gaussian Distribution
RJEs: Remote job entry points
288
Gaussian Mixture Model (GMM)
RJEs: Remote job entry points
289
Expectation-Maximization (EM) Algorithm
RJEs: Remote job entry points
290
Implementation of EM Algorithm
RJEs: Remote job entry points
291
Re-estimation in EM Algorithm
RJEs: Remote job entry points
292
Clustering using GMM
RJEs: Remote job entry points
293
What is Gaussian Mixture Model (GMMs)?
RJEs: Remote job entry points
294
A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a
mixture of a finite number of Gaussian distributions with unknown parameters. One can think of a mixture
model as a generalization of a k-means clustering algorithm, as it can be used for density estimation and
classification.
In a Gaussian mixture model, each cluster is associated with a multivariate Gaussian distribution, and the
mixture model is a weighted sum of these distributions. The weights indicate the probability that a data point
belongs to a particular cluster, and the Gaussian distributions describe the distribution of the data within each
cluster.
The parameters of a Gaussian mixture model can be estimated using the expectation-maximization (EM)
algorithm. This involves alternating between estimating the parameters of the Gaussian distributions and the
weights of the mixture model until convergence is reached.
Univariate vs Multivariate Gaussian Distribution
RJEs: Remote job entry points
295
https://www.shiksha.com/online-
courses/articles/understanding-gaussian-mixture-models/
The above example code generates a dataset X which
contains 200 samples drawn from two 2D Gaussian
distributions which have different means. The Gaussian
mixture model is then fit to the data, with n_components=2
indicating that there are two mixture components (i.e., two
clusters). The covariance_type parameter specifies the
type of covariance matrix to use for the Gaussian
distributions. In the above example, the covariance_type
value is ‘full’.
Once the model is fit, the prediction method can be used
to predict the cluster labels for the data points in X. The
resulting cluster labels are stored in the predictions array.
Univariate vs Multivariate Gaussian Distribution
RJEs: Remote job entry points
296
To plot the data and the predicted cluster labels,
the matplotlib is used, as follows:
The above output is a scattered plot of data,
having points coloured according to their
predicted cluster label.
Real-Life Examples of Gaussian mixture models
RJEs: Remote job entry points
297
▪ Gaussian mixture models (GMMs) as already stated above are statistical models that can be
used to represent the probability distribution of a multi-dimensional continuous variable as a
weighted sum of multiple multivariate normal distributions. GMMs are often used in a variety of
applications, including clustering, density estimation, and anomaly detection. Here are a few
examples of how GMMs could be used in real life:
▪ Clustering: GMMs can be used to identify patterns and group similar observations together. For
example, a GMM could be used to cluster customers into different segments based on their
purchase history and demographic data.
▪ Density estimation: GMMs can be used to estimate the probability density function (PDF) of a
given dataset. This can be useful for tasks such as density-based anomaly detection, where
GMMs can be used to identify observations that are significantly different from the rest of the
data.
▪ Anomaly detection: GMMs can be used to detect anomalous observations in a dataset. For
example, a GMM could be trained on normal network traffic data, and then used to identify
unusual traffic patterns that may indicate an intrusion attempt.
▪ Computer vision: GMMs can be used in computer vision applications to model the appearance
of objects in an image. For example, a GMM could be used to model the appearance of different
types of vehicles in a traffic surveillance system.
Advantages of Gaussian Mixture Models
RJEs: Remote job entry points
298
▪ Flexibility- Gaussian Mixture Models have the ability to model a wide range of probability
distributions, as they can approximate any distribution that can be represented as a weighted sum
of multiple normal distributions. Hence, very flexible in nature.
▪ Robustness- Gaussian Mixture Models are relatively robust to the outliers which are present in the
data, as they can accommodate the presence of multiple modes called “peaks” in the distribution.
▪ Speed- Gaussian Mixture Models are relatively fast to fit a dataset, especially when using an
efficient optimization algorithm such as the expectation-maximization (EM) algorithm.
▪ To Handle Missing Data- Gaussian Mixture Models have the ability to handle missing data by
marginalizing the missing variables, which can be useful in situations where some observations are
incomplete.
▪ Interpretability- The parameters of a Gaussian Mixture Model (i.e., the weights, means, and
covariances of the components) have a clear interpretation, which can be useful for understanding
the underlying structure of the data.
Disadvantages of Gaussian Mixture Models
RJEs: Remote job entry points
299
•Sensitivity To Initialization- Gaussian Mixture Models can be sensitive to the initial values of the model
parameters, especially when there are too many components in the mixture. This can sometimes lead to poor
convergence to the true maximum likelihood solution.
•Assumption Of Normality- Gaussian Mixture Models assume that the data are generated from a mixture of
normal distributions, which may not always be the case in practice. If the data deviate significantly from
normality, GMMs may not be the most appropriate model.
•Number Of Components- Choosing the appropriate number of components in a Gaussian Mixture Model
can be challenging, as adding too many components may overfit the data, while using too few components
may underfit the data. The extremes of both points result in a challenging task, which becomes tough to be
handled.
•High-dimensional data- Gaussian Mixture Models can be computationally expensive to fit when working
with high-dimensional data, as the number of model parameters increases quadratically with the number of
dimensions.
•Limited expressive power- Gaussian Mixture Models can only represent distributions that can be
expressed as a weighted sum of normal distributions. This means that they may not be suitable for modelling
more complex distributions.
Hidden Markov Model in Machine Learning
Hidden Markov Model in Machine Learning
RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning
▪ Hidden Markov Models (HMMs) are a type of probabilistic model that
are commonly used in machine learning for tasks such as
▪ Speech recognition
▪ Natural language processing
▪ Bioinformatics
▪ They are a popular choice for modelling sequences of data because
they can effectively capture the underlying structure of the data,
even when the data is noisy or incomplete.
What are Hidden Markov Models?
RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning
▪ A Hidden Markov Model (HMM) is a probabilistic model that consists of a sequence of
hidden states, each of which generates an observation. The hidden states are usually not
directly observable, and the goal of HMM is to estimate the sequence of hidden states based on a
sequence of observations. An HMM is defined by the following components:
▪ A set of N hidden states, S = {s1, s2, ..., sN}.
▪ A set of M observations, O = {o1, o2, ..., oM}.
▪ An initial state probability distribution, ? = {?1, ?2, ..., ?N}, which specifies the probability of
starting in each hidden state.
▪ A transition probability matrix, A = [aij], defines the probability of moving from one hidden state
to another.
▪ An emission probability matrix, B = [bjk], defines the probability of emitting an observation
from a given hidden state.
▪ The basic idea behind an HMM is that the hidden states generate the observations, and the
observed data is used to estimate the hidden state sequence. This is often referred to as
the forward-backwards algorithm.
Applications of Hidden Markov Models
RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning
▪ Speech Recognition
One of the most well-known applications of HMMs is speech recognition. In this field, HMMs are used to
model the different sounds and phones that makeup speech. The hidden states, in this case, correspond to
the different sounds or phones, and the observations are the acoustic signals that are generated by the
speech. The goal is to estimate the hidden state sequence, which corresponds to the transcription of the
speech, based on the observed acoustic signals. HMMs are particularly well-suited for speech recognition
because they can effectively capture the underlying structure of the speech, even when the data is noisy or
incomplete. In speech recognition systems, the HMMs are usually trained on large datasets of speech signals,
and the estimated parameters of the HMMs are used to transcribe speech in real time.
▪ Natural Language Processing
Another important application of HMMs is natural language processing. In this field, HMMs are used for tasks
such as part-of-speech tagging, named entity recognition, and text classification. In these applications,
the hidden states are typically associated with the underlying grammar or structure of the text, while the
observations are the words in the text. The goal is to estimate the hidden state sequence, which corresponds
to the structure or meaning of the text, based on the observed words. HMMs are useful in natural language
processing because they can effectively capture the underlying structure of the text, even when the data is
noisy or ambiguous. In natural language processing systems, the HMMs are usually trained on large datasets
of text, and the estimated parameters of the HMMs are used to perform various NLP tasks, such as text
classification, part-of-speech tagging, and named entity recognition.
Applications of Hidden Markov Models
RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning
▪ Bioinformatics
HMMs are also widely used in bioinformatics, where they are used to model sequences of DNA, RNA, and
proteins. The hidden states, in this case, correspond to the different types of residues, while the observations
are the sequences of residues. The goal is to estimate the hidden state sequence, which corresponds to the
underlying structure of the molecule, based on the observed sequences of residues. HMMs are useful in
bioinformatics because they can effectively capture the underlying structure of the molecule, even when the
data is noisy or incomplete. In bioinformatics systems, the HMMs are usually trained on large datasets of
molecular sequences, and the estimated parameters of the HMMs are used to predict the structure or function
of new molecular sequences.
▪ Finance
Finally, HMMs have also been used in finance, where they are used to model stock prices, interest rates, and
currency exchange rates. In these applications, the hidden states correspond to different economic states, such
as bull and bear markets, while the observations are the stock prices, interest rates, or exchange rates. The
goal is to estimate the hidden state sequence, which corresponds to the underlying economic state, based on
the observed prices, rates, or exchange rates. HMMs are useful in finance because they can effectively capture
the underlying economic state, even when the data is noisy or incomplete. In finance systems, the HMMs are
usually trained on large datasets of financial data, and the estimated parameters of the HMMs are used to
make predictions about future market trends or to develop investment strategies.
Limitations of Hidden Markov Models
RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning
▪ Limited Modeling Capabilities
One of the key limitations of HMMs is that they are relatively limited in their modelling
capabilities. HMMs are designed to model sequences of data, where the underlying
structure of the data is represented by a set of hidden states. However, the structure of
the data can be quite complex, and the simple structure of HMMs may not be enough to
accurately capture all the details. For example, in speech recognition, the complex
relationship between the speech sounds and the corresponding acoustic signals may
not be fully captured by the simple structure of an HMM.
▪ Overfitting
Another limitation of HMMs is that they can be prone to overfitting, especially when the
number of hidden states is large or the amount of training data is limited. Overfitting
occurs when the model fits the training data too well and is unable to generalize to new
data. This can lead to poor performance when the model is applied to real-world data
and can result in high error rates. To avoid overfitting, it is important to carefully choose
the number of hidden states and to use appropriate regularization techniques.
Limitations of Hidden Markov Models
RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning
▪ Lack of Robustness
HMMs are also limited in their robustness to noise and variability in the data. For example, in
speech recognition, the acoustic signals generated by speech can be subjected to a variety of
distortions and noise, which can make it difficult for the HMM to accurately estimate the
underlying structure of the data. In some cases, these distortions and noise can cause the HMM
to make incorrect decisions, which can result in poor performance. To address these limitations,
it is often necessary to use additional processing and filtering techniques, such as noise
reduction and normalization, to pre-process the data before it is fed into the HMM.
▪ Computational Complexity
Finally, HMMs can also be limited by their computational complexity, especially when dealing
with large amounts of data or when using complex models. The computational complexity of
HMMs is due to the need to estimate the parameters of the model and to compute the likelihood
of the data given in the model. This can be time-consuming and computationally expensive,
especially for large models or for data that is sampled at a high frequency. To address this
limitation, it is often necessary to use parallel computing techniques or to use approximations
that reduce the computational complexity of the model.
Naïve Bayes Classifier
RJEs: Remote job entry points
▪ Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem[1]
▪ It is mainly used in text classification that includes a high-dimensional training dataset[2]
▪ It is a probabilistic classifier, which means it predicts on the basis of the probability of an object
▪ Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the
occurrence of other features. Another strong assumption it does is, that all features are equal i.e., are given the
same weight/importance.
▪ Bayes: It is based on Bayes’ Theorem so is called Bayes. Bayes’ Theorem finds the probability of an event
occurring given the probability of another event that has already occurred. It is mathematically given as:
𝑃
𝐴
𝐵
=
𝑃
𝐵
𝐴
𝑃 𝐴
𝑃 𝐵
where, 𝑃
𝐴
𝐵
is Posterior Probability, P(A) is Prior Probability
𝑃(
𝐵
𝐴
) is Likelihood Probability, P(B) is Marginal Probability
Ref: [1] https://www.geeksforgeeks.org/naive-bayes-classifiers/?ref=leftbar-rightbar
[2] https://www.javatpoint.com/machine-learning-naive-bayes-classifier
307
Naïve Bayes Classifier
RJEs: Remote job entry points
Ref: [1] https://www.geeksforgeeks.org/naive-bayes-classifiers/?ref=leftbar-rightbar
[2] https://www.tutorialspoint.com/machine_learning_with_python/classification_algorithms_naive_bayes.htm, [3] https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
▪ There are primarily three types of Naïve Bayes Classifiers:
▪ Gaussian Naïve Bayes – In Gaussian Naive Bayes, continuous values associated with each feature are
assumed to be distributed according to a Gaussian distribution. The likelihood of the features is assumed to
be Gaussian[1]
▪ Multinomial Naïve Bayes – In this features are assumed to be drawn from a simple Multinomial distribution.
Such kinds of Naïve Bayes are most appropriate for the features that represent discrete counts[2]
▪ Bernoulli Naïve Bayes – Here the features are assumed to be binary (0s and 1s). Text classification with ‘bag
of words’ model can be an application of Bernoulli Naïve Bayes
▪ The adjacent figure, shows an
example of a Naïve Bayes classifier
of classifying the probability of play
or no play based on likelihood
estimation [3]
308
Naïve Bayes Classifier
RJEs: Remote job entry points
▪ Advantages of Naïve Bayes Classifier
▪ Fast and easy ML algorithms to predict a class of datasets
▪ It can be used for Binary as well as Multi-class Classifications
▪ It is the most popular choice for text classification problems
▪ Disadvantage of Naïve Bayes Classifier
▪ Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features
▪ Applications of Naïve Bayes Classifier
▪ It is used for Credit Scoring
▪ It is used in medical data classification
▪ It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner
▪ It is used in Text classification such as Spam filtering and Sentiment analysis
Ref: https://www.javatpoint.com/machine-learning-naive-bayes-classifier
309
Ensemble Classifiers
RJEs: Remote job entry points
▪ Ensemble learning helps improve machine learning results by combining several models[1]
▪ Better predictive performance compared to a single model
▪ Ensemble overcomes three problems:
▪ Statistical Problems: when the hypothesis space is too large for the amount of available data
▪ Computational Problems: when the learning algorithm cannot guarantee finding the best hypothesis
▪ Representational Problems: The Representational Problem arises when the hypothesis space does not contain any good
approximation of the target class(es)
▪ The main challenge with ensemble methods is to obtain base models which make different kinds of errors
▪ The three main classes of ensemble learning methods are bagging, stacking, and boosting[2]
▪ Bagging involves fitting many decision trees on different samples of the same dataset and averaging the predictions
▪ Stacking involves fitting many different models types on the same data and using another model to learn how to best
combine the predictions
Ref: [1] https://www.geeksforgeeks.org/ensemble-classifier-data-mining/
[2] https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/
310

More Related Content

PPTX
Introduction to Machine Learning and Deep Learning
PPTX
TensorFlow Event presentation08-12-2024.pptx
PPTX
L15.pptx
PPTX
马赛PPT - DL & ML.pptx
PDF
General introduction to AI ML DL DS
PPTX
ML-Chapter_one.pptx
PPTX
JavaScript and Artificial Intelligence by Aatman & Sagar - AhmedabadJS
PPTX
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Introduction to Machine Learning and Deep Learning
TensorFlow Event presentation08-12-2024.pptx
L15.pptx
马赛PPT - DL & ML.pptx
General introduction to AI ML DL DS
ML-Chapter_one.pptx
JavaScript and Artificial Intelligence by Aatman & Sagar - AhmedabadJS
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...

Similar to Introduction to Machine Learning (ML) Final - Copy.pdf (20)

PPTX
Machine learning
PDF
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
PDF
Machine Learning for Dummies (without mathematics)
PDF
Machine learning-for-dummies-andrews-sobral-activeeon
PDF
Python Machine Learning - Getting Started
PPTX
Session_2_Introduction_to_Deep_Learning.pptx
PDF
Automating Reverse Engineering: Function Classification and Matching
PDF
How to use Artificial Intelligence with Python? Edureka
PDF
Introduction to machine learning
PPTX
Deep learning with tensorflow
PDF
Dato Keynote
PPTX
Machine Learning Basics
PDF
Summary machine learning and model deployment
PPTX
Reinforcement Learning, Application and Q-Learning
PPTX
Is Spark the right choice for data analysis ?
PPT
Machine learning
PDF
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
PDF
machine learning basic unit1 for third year cse studnets
PDF
Persian MNIST in 5 Minutes
PPTX
Introduction to Machine Learning_ UNIT 1
Machine learning
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
Machine Learning for Dummies (without mathematics)
Machine learning-for-dummies-andrews-sobral-activeeon
Python Machine Learning - Getting Started
Session_2_Introduction_to_Deep_Learning.pptx
Automating Reverse Engineering: Function Classification and Matching
How to use Artificial Intelligence with Python? Edureka
Introduction to machine learning
Deep learning with tensorflow
Dato Keynote
Machine Learning Basics
Summary machine learning and model deployment
Reinforcement Learning, Application and Q-Learning
Is Spark the right choice for data analysis ?
Machine learning
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
machine learning basic unit1 for third year cse studnets
Persian MNIST in 5 Minutes
Introduction to Machine Learning_ UNIT 1
Ad

More from Dr. Rahul Pandya (20)

PDF
Quantitative, Qualitative, and Mixed Method - E1.pdf
PDF
Types of Licenses in Publication and Literature.pdf
PDF
Quantitative, Qualitative, and Mixed Methods for Research.pdf
PDF
Publication Performance Metrics: Journal Indexing, Quartiles, and Altrimatrix
PDF
Data Analysis Methods and Techniques with Comprehensive Details
PPTX
Digital Communication and Coding Theory.pptx
PPTX
Writing Research Grant Proposals : Project Proposals
PDF
Writing Review Articles? | Prof. Rahul Pandya (IIT Dharwad)
PDF
Dr. Rahul Pandya ECE Gate Course Communications Original.pdf
PPTX
Everything on Plagiarism | What is Plagiarism?
PPTX
Stochastic Process and its Applications.
PPTX
Computer Networks | Communication Networks
PPTX
Dr Rahul Pandya 6G Vision, Potential technologies, and Challenges - Animated ...
PDF
Introduction to Probability Theory
PDF
How to Cite Sources in PPT.pdf
PDF
Verbatim Plagiarism | Direct Plagiarism | Direct Copy Paste | Types of Plagia...
PDF
Paraphrasing without citing the souces.pdf
PDF
Avoid Plagiarism - Dr. Rahul Pandya.pdf
PDF
Journal Papers vs. Conference Papers - Dr. Rahul Pandya
PDF
Research Paper Writing - Dr. Rahul Pandya
Quantitative, Qualitative, and Mixed Method - E1.pdf
Types of Licenses in Publication and Literature.pdf
Quantitative, Qualitative, and Mixed Methods for Research.pdf
Publication Performance Metrics: Journal Indexing, Quartiles, and Altrimatrix
Data Analysis Methods and Techniques with Comprehensive Details
Digital Communication and Coding Theory.pptx
Writing Research Grant Proposals : Project Proposals
Writing Review Articles? | Prof. Rahul Pandya (IIT Dharwad)
Dr. Rahul Pandya ECE Gate Course Communications Original.pdf
Everything on Plagiarism | What is Plagiarism?
Stochastic Process and its Applications.
Computer Networks | Communication Networks
Dr Rahul Pandya 6G Vision, Potential technologies, and Challenges - Animated ...
Introduction to Probability Theory
How to Cite Sources in PPT.pdf
Verbatim Plagiarism | Direct Plagiarism | Direct Copy Paste | Types of Plagia...
Paraphrasing without citing the souces.pdf
Avoid Plagiarism - Dr. Rahul Pandya.pdf
Journal Papers vs. Conference Papers - Dr. Rahul Pandya
Research Paper Writing - Dr. Rahul Pandya
Ad

Recently uploaded (20)

PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Artificial Intelligence
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
Design Guidelines and solutions for Plastics parts
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
communication and presentation skills 01
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPTX
Module 8- Technological and Communication Skills.pptx
PPT
Occupational Health and Safety Management System
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
Feature types and data preprocessing steps
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Fundamentals of Mechanical Engineering.pptx
Artificial Intelligence
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Design Guidelines and solutions for Plastics parts
Abrasive, erosive and cavitation wear.pdf
communication and presentation skills 01
Fundamentals of safety and accident prevention -final (1).pptx
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Module 8- Technological and Communication Skills.pptx
Occupational Health and Safety Management System
distributed database system" (DDBS) is often used to refer to both the distri...
Feature types and data preprocessing steps
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
III.4.1.2_The_Space_Environment.p pdffdf
Management Information system : MIS-e-Business Systems.pptx
"Array and Linked List in Data Structures with Types, Operations, Implementat...

Introduction to Machine Learning (ML) Final - Copy.pdf

  • 1. Dr. Rahul J. Pandya, Assistant Professor, Electrical, Electronics, and Communication Engineering (EECE) Dept., Indian Institute of Technology (IIT), Dharwad Email: rpandya@iitdh.ac.in 1 Introduction to Machine Learning
  • 2. Course content - Syllabus RJEs: Remote job entry points ▪ Introduction to Machine Learning (ML) ▪ Types of Machine learning ▪ Supervised ML ▪ Unsupervised ML ▪ Semi-Supervised ML ▪ Reinforcement Learning (RL) ▪ Machine learning (ML) algorithms ▪ Regression- Linear Regression, Logistic Regression, Multivariate Regression ▪ Classification ▪ Clustering – Partitional clustering, Hierarchical clustering, Density based clustering ▪ Decision trees ▪ K-Nearest Neighbours (KNN) ▪ Kernel methods: Support vector machine ▪ Reinforcement Learning (RL) algorithms ▪ Graphical models: Gaussian mixture models and hidden Markov models ▪ Introduction to Bayesian Approach: Bayesian classification, Bayesian learning, Bayes optimal classifier, and Naïve Bayes Classifier. 2
  • 3. Reference books RJEs: Remote job entry points ▪ C. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006 ▪ K. P. Murphy, “Machine Learning: A Probability Perspective”, MIT Press, 2012. 3
  • 5. *` Artificial Intelligence (AI) Enables systems to perform intelligent tasks through a set of rules https://www.geeksforgeeks.org/difference-between-artificial-intelligence- vs-machine-learning-vs-deep-learning/ 5
  • 6. *` Artificial Intelligence (AI) Enables systems to perform intelligent tasks through a set of rules Machine Learning (ML) It is a process of learning from the data without using complex rules. It involves training a model from datasets and predicting the outcome. https://www.geeksforgeeks.org/difference-between-artificial-intelligence- vs-machine-learning-vs-deep-learning/ 6
  • 7. *` Artificial Intelligence (AI) Enables systems to perform intelligent tasks through a set of rules Machine Learning (ML) It is a process of learning from the data without using complex rules. It involves training a model from datasets and predicting the outcome. Deep Learning (DL) ML at a large-scale, Equipped with artificial neural networks https://www.geeksforgeeks.org/difference-between-artificial-intelligence- vs-machine-learning-vs-deep-learning/ 7
  • 8. Introduction to Machine Learning (ML) RJEs: Remote job entry points ▪ Artificial Intelligence (AI): Approaches that enable computers to perform intelligent tasks. ▪ Machine Learning (ML): Approaches that learn the underlying pattern in given set of features without being explicitly programmed. ▪ Deep Learning (DL): Approaches that learn the underlying representations and patterns in given set of raw data without being explicitly programmed. 8
  • 9. Artificial Intelligence RJEs: Remote job entry points ▪ Intelligence: Experiencing (ability to learn & understand) and use it for deciding future course ▪ Artificial Intelligence (AI): Enabling machines to do so called intelligent tasks ▪ Problem solving ▪ Discovery ▪ Learning ▪ Dealing with uncertainties ▪ AI Categories: ▪ Problem solving using search methods ▪ State space search, heuristic search, randomized search, rule based, ▪ Symbolic manipulation is one form of AI ▪ Connectionist approach is another form of AI ▪ S R Mahadeva Prasanna PRML August 9 9
  • 10. Machine Learning RJEs: Remote job entry points ▪ With more and more digital data available, task of automatic discovery and learning of patterns, both natural and synthetic data. ▪ Not much focus on feature extraction, signal processing knowledge not pre-requisite ! ▪ More emphasis on discovery and learning of patterns by machine. ▪ Ability to learn by extracting patterns from data (features) ▪ Treated pattern learning more like associated function learning. ▪ Output y = f (x), where y is output and x is input data (features). ▪ Goal of ML is to learn f () that maps x to y. Ref: https://www.javatpoint.com/reinforcement-learning 10
  • 11. Deep Learning RJEs: Remote job entry points ▪ Task of learning both features (representation) and also patterns for pattern recognition. ▪ Trying to mimic human way of learning. ▪ Learning from experience ▪ Need not specify everything in the beginning ▪ Understand in terms of hierarchy of concepts ▪ Each concept defined in terms its relation to simpler concepts ▪ Learning complicated concepts out of simpler ones ▪ S Ref: https://www.javatpoint.com/reinforcement-learning 11
  • 13. Introduction to Machine Learning (ML) RJEs: Remote job entry points ▪ Machin learning gives “ computers the ability to learn without being explicitly programmed.” ~ Arthur Samuel Ref: https://pub.towardsai.net/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa https://www.google.com/imgres?imgurl=https%3A%2F%2Fprutor.ai%2Fwp-content%2Fuploads%2FML-vs-Programming.png&tbnid= -https://prutor.ai/ml-what-is-machine-learning/ ▪ Preparing for the exams ▪ Students feed their machine (brain) with a good amount of high-quality data (questions and answers from different books or teachers notes or online video lectures). ▪ Training their brain with input as well as output i.e. what kind of approach or logic do they have to solve a different kind of questions. 13
  • 14. Introduction to Machine Learning (ML) RJEs: Remote job entry points ▪ Machin learning gives “ computers the ability to learn without being explicitly programmed.” ~ Arthur Samuel Ref: https://pub.towardsai.net/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa https://www.google.com/imgres?imgurl=https%3A%2F%2Fprutor.ai%2Fwp-content%2Fuploads%2FML-vs-Programming.png&tbnid= -https://prutor.ai/ml-what-is-machine-learning/ ▪ Preparing for the exams ▪ Similarly, in ML train machine with data (both inputs and outputs are given to model) and when the time comes test on data (with input only) and achieves our model scores by comparing its answer with the actual output which has not been fed while training. 14
  • 15. How ML works? RJEs: Remote job entry points ▪ Features of ML: ▪ Machine learning uses data to detect various patterns in a given dataset. ▪ It can learn from past data and improve automatically. ▪ It is a datas-driven technology. ▪ Machine learning is like data mining as it also deals with vast data. Ref: https://www.javatpoint.com/machine-learning ▪ Machine Learning system learns from historical data, builds the prediction models, and whenever it receives new data, predicts the output for it 15
  • 16. Why Machine Learning (ML)? RJEs: Remote job entry points ▪ Machin learning gives “ computers the ability to learn without being explicitly programmed.” ~ Arthur Samuel ▪ Why ML? ▪ Machine learning models help us in many tasks, such as: ▪ Object Recognition ▪ Summarization ▪ Prediction ▪ Classification ▪ Clustering ▪ Recommender systems ▪ And others ▪ ML refers to the scientific branch of AI ▪ Deep learning is a subset of ML Ref: https://pub.towardsai.net/machine-learning-algorithms-for-beginners-with-python-code-examples-ml-19c6afd60daa https://www.google.com/imgres?imgurl=https%3A%2F%2Fprutor.ai%2Fwp-content%2Fuploads%2FML-vs-Programming.png&tbnid= - ETheD8sGlw9TM&vet=12ahUKEwj4k9OO7NOAAxXc5TgGHQy1CN8QMygHegUIARDTAQ..i&imgrefurl=https%3A%2F%2Fprutor.ai%2Fml-what-is-machine-learning%2F&docid=-yk7- zimRN69qM&w=571&h=223&q=What%20is%20Machine%20Learning%20(ML)%3F&ved=2ahUKEwj4k9OO7NOAAxXc5TgGHQy1CN8QMygHegUIARDTAQ 16
  • 17. Basic Difference in ML and Traditional Programming? RJEs: Remote job entry points https://prutor.ai/ml-what-is-machine-learning/ ▪ What does exactly learning means for a computer? ▪ Learning from Experiences with respect to some class of Tasks, if its performance in a given Task improves with the Experience. ▪ Learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E ▪ Traditional Programming: We feed in DATA (Input) + PROGRAM (logic), run it on machine and get output. ▪ Machine Learning: We feed in DATA(Input) + Output, run it on machine during training and the machine creates its own program(logic), which can be evaluated while testing. 17
  • 18. Machine Learning in Current world RJEs: Remote job entry points https://www.javatpoint.com/applications-of-machine-learning 18
  • 19. Traditional ML vs DL RJEs: Remote job entry points https://www.researchgate.net/figure/Comparison-between-ML-and-Dl-algorithm_fig5_344628869 19
  • 20. Neural Nets vs Deep Learning RJEs: Remote job entry points [Taken from public domain. Original authors highly acknowledged.] ▪ The concept of deep learning first originated from neural network. ▪ A good example of deep neural network is a feed forward neural network (FFNN). ▪ Backpropagation (BP) is the workhorse algorithm for learning the parameters of FFNN. ▪ BP did not work well for networks having more than a small number of hidden layers. ▪ Insufficient data leading to overfitting and difficulty in training of the deep networks was the main limitation. 20
  • 21. NN vs DL RJEs: Remote job entry points https://yangxiaozhou.github.io/data/2020/09/24/intro-to-cnn.html 21
  • 22. AI, ML, NN and DL RJEs: Remote job entry points https://www.researchgate.net/figure/Relationship-between-artificial-intelligence-machine-learning-deep-learning-and_fig2_351110482 22
  • 23. AI, ML, NN and DL RJEs: Remote job entry points https://www.researchgate.net/figure/Relationship-between-artificial-intelligence-machine-learning-deep-learning-and_fig2_351110482 23
  • 24. Information Extraction & Modeling RJEs: Remote job entry points [Taken from public domain. Original authors highly acknowledged.] ▪ Information : Knowledge about something. Face, speaker, route. ▪ Extraction : Extract physical quantities that carry information. ▪ Feature extraction or representation learning ▪ Modeling : Invariant entity that carries the knowledge. From features model these invariant entities. ▪ Process : In human computer interaction it refers to signal processing, pattern recognition, machine learning and deep learning. 24
  • 25. When Machine Learning and when Deep Learning RJEs: Remote job entry points [Taken from public domain. Original authors highly acknowledged.] ▪ Problem statement well / ill defined ▪ Amount of data less / too much ▪ Domain knowledge is high / low ▪ Well meaning feature extraction possible / not possible ▪ Machine learning / deep learning 25
  • 26. Classification of Machine Learning RJEs: Remote job entry points ▪ At a broad level, machine learning can be classified into three types: https://www.javatpoint.com/machine-learning ▪ Supervised ML models ▪ Unsupervised ML models ▪ Semi-supervised ML models (combination of Supervised and Unsupervised models) ▪ Reinforcement learning models 26
  • 27. Regression RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.] ▪ Objective of regression task. ▪ Univariate vs multivariate regression. ▪ Linear vs nonlinear regression. ▪ Cost function. ▪ Gradient descent method of optimization. ▪ Normal equation approach for parameter estimation. ▪ Logistic regression 28
  • 28. Clustering RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.] ▪ Objective of clustering task. ▪ Partitioning approach - k-means, fuzzy-c means. ▪ Model based approach - Gaussian mixture model (GMM). ▪ Expectation-maximization (EM) algorithm. ▪ Hierarchical clustering. ▪ Hierarchical - agglomerative clustering. ▪ Hierarchical - divisive clustering.S 29
  • 29. Classification RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.] ▪ Objective of classification task. ▪ Binary vs multiclass classification. ▪ Generative vs discriminative classification. ▪ Parametric vs nonparametric classification. ▪ Logistic regression. ▪ k-nearest neighbour classification. ▪ Support vector machine. ▪ Generative classifiers. 30
  • 30. Dimensionality Reduction RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.] ▪ Objective of dimensionality reduction task. ▪ Principal component analysis (PCA). ▪ Linear discriminant analysis (LDA). ▪ PCA based dimensionality reduction. ▪ PCA based classification. ▪ LDA based dimensionality reduction. ▪ LDA based classification. 31
  • 31. Time Series Modelling RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.] ▪ Objective of time series modelling task. ▪ Markov process and models. ▪ Observable vs hidden Markov model ▪ Hidden Markov Model (HMM). ▪ Training and testing of HMM ▪ Forward and backward variables. ▪ Viterbi algorithm for optimal state sequence. ▪ Expectation maximization (EM) approach for training. 32
  • 32. Bayesian Approach RJEs: Remote job entry points [Taken from public sources. Original authors acknowledged.] ▪ Objective of Bayesian approach. ▪ Probabilistic framework for classification. ▪ Bayesian classification. ▪ Bayesian learning. ▪ Maximum a posteriori (MAP) approach. ▪ Bayes optimal classifier. ▪ Gibbs sampling. ▪ Naive Bayes classifier. ▪ Bayesian network. 33
  • 34. Classification of Machine Learning RJEs: Remote job entry points ▪ At a broad level, machine learning can be classified into three types: https://www.javatpoint.com/machine-learning ▪ Supervised ML models ▪ Unsupervised ML models ▪ Semi-supervised ML models (combination of Supervised and Unsupervised models) ▪ Reinforcement learning models 35
  • 36. Supervised Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Supervised learning is a type of machine learning in which machines are trained using well "labelled" training data, and on basis of that data, machines predict the output. ▪ The labelled data means some input data is already tagged with the correct output. ▪ In supervised learning, the training data provided to the machines work as the supervisor that teaches the machines to predict the output correctly. ▪ It applies the same concept as a student learns in the supervision of the teacher. ▪ Supervised learning is a process of providing input data as well as correct output data to the machine learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y). 37
  • 37. How Supervised Learning Works? RJEs: Remote job entry points ▪ In supervised learning, models are trained using labelled dataset, where the model learns about each type of data. Once the training process is completed, the model is tested on the basis of test data (a subset of the training set), and then it predicts the output. Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we need to train the model for each shape. ▪ If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square. ▪ If the given shape has three sides, then it will be labelled as a triangle. ▪ If the given shape has six equal sides then it will be labelled as hexagon. ▪ Now, after training, we test our model using the test set, and the task of the model is to identify the shape. 38
  • 38. How Supervised Learning Works? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud Detection, spam filtering, etc. ▪ Algorithms like Decision tree, Random Forest, KNN, Logistic Regression, etc. fall under supervised ML models 39
  • 39. Steps Involved in Supervised Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •First Determine the type of training dataset •Collect/Gather the labelled training data. •Split the training dataset into training dataset, test dataset, and validation dataset. •Determine the input features of the training dataset, which should have enough knowledge so that the model can accurately predict the output. •Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc. •Execute the algorithm on the training dataset. •Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which means our model is accurate. 40
  • 40. Types of supervised Machine Learning Algorithms RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Regression • Regression algorithms are used if there is a relationship between the input variable and the output variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some popular Regression algorithms which come under supervised learning: •Linear Regression •Regression Trees •Non-Linear Regression •Bayesian Linear Regression •Polynomial Regression Classification • Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male- Female, True-false, etc. • Spam Filtering, • Random Forest • Decision Trees • Logistic Regression • Support Vector Machines 41
  • 41. Advantages/Disadvantages of Supervised learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Advantages of supervised learning: •With the help of supervised learning, the model can predict the output on the basis of prior experiences. •In supervised learning, we can have an exact idea about the classes of objects. •Supervised learning model helps us to solve various real-world problems such as fraud detection, spam filtering, etc. Disadvantages of supervised learning: •Supervised learning models are not suitable for handling the complex tasks. •Supervised learning cannot predict the correct output if the test data is different from the training dataset. •Training required lots of computation times. •In supervised learning, we need enough knowledge about the classes of object. 42
  • 42. Unsupervised Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Supervised machine learning in which models are trained using labeled data under the supervision of training data. But there may be many cases in which we do not have labeled data and need to find the hidden patterns from the given dataset. So, to solve such types of cases in machine learning, we need unsupervised learning techniques. ▪ What is Unsupervised Learning? ▪ Unsupervised learning is a machine learning technique in which models are not supervised using training dataset. Instead, models itself find the hidden patterns and insights from the given data. It can be compared to learning which takes place in the human brain while learning new things. It can be defined as: ▪ Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision. ▪ Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. 43
  • 43. Unsupervised Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and dogs. ▪ The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset. ▪ The task of the unsupervised learning algorithm is to identify the image features on their own. ▪ Unsupervised learning algorithm will perform this task by clustering the image dataset into the groups according to similarities between images. 44
  • 44. Why use Unsupervised Learning? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •Unsupervised learning is helpful for finding useful insights from the data. •Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it closer to the real AI. •Unsupervised learning works on unlabelled and uncategorized data which make unsupervised learning more important. •In real-world, we do not always have input data with the corresponding output so to solve such cases, we need unsupervised learning. 45
  • 45. Working of Unsupervised Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are also not given. ▪ Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-means clustering, Decision tree, etc. ▪ Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and difference between the objects. 46
  • 46. Types of Unsupervised Learning Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •Clustering: •Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a group and has less or no similarities with the objects of another group. •Cluster analysis finds the commonalities between the data objects and categorizes them as per the presence and absence of those commonalities. •Association: •An association rule is an unsupervised learning method which is used for finding the relationships between variables in the large database. •It determines the set of items that occurs together in the dataset. • Association rule makes marketing strategy more effective. •Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. •A typical example of Association rule is Market Basket Analysis. 47
  • 47. Unsupervised Learning algorithms RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •K-means clustering •KNN (k-nearest neighbors) •Hierarchal clustering •Anomaly detection •Neural Networks •Principle Component Analysis •Independent Component Analysis •Apriori algorithm •Singular value decomposition 48
  • 48. Advantages/ Disadvantages of Unsupervised Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •Advantages •Unsupervised learning is used for more complex tasks as compared to supervised learning because, in unsupervised learning, we don't have labelled input data. •Unsupervised learning is preferable as it is easy to get unlabelled data in comparison to labelled data. •Disadvantages •Unsupervised learning is intrinsically more difficult than supervised learning as it does not have corresponding output. •The result of the unsupervised learning algorithm might be less accurate as input data is not labelled, and algorithms do not know the exact output in advance 49
  • 49. Difference between Supervised and Unsupervised Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning 50
  • 50. Difference between Supervised and Unsupervised Learning RJEs: Remote job entry points Supervised Learning Unsupervised Learning ▪ Supervised learning algorithms are trained using labeled data. ▪ Unsupervised learning algorithms are trained using unlabeled data. ▪ Supervised learning model takes direct feedback to check if it is predicting correct output or not. ▪ Unsupervised learning model does not take any feedback. ▪ Supervised learning model predicts the output. ▪ Unsupervised learning model finds the hidden patterns in data. ▪ In supervised learning, input data is provided to the model along with the output. ▪ In unsupervised learning, only input data is provided to the model. ▪ The goal of supervised learning is to train the model so that it can predict the output when it is given new data. ▪ The goal of unsupervised learning is to find the hidden patterns and useful insights from the unknown dataset. ▪ Supervised learning needs supervision to train the model. ▪ Unsupervised learning does not need any supervision to train the model. ▪ Supervised learning can be categorized in Classification and Regression problems. ▪ Unsupervised Learning can be classified in Clustering and Associations problems. ▪ Supervised learning can be used for those cases where we know the input as well as corresponding outputs. ▪ Unsupervised learning can be used for those cases where we have only input data and no corresponding output data. ▪ Supervised learning model produces an accurate result. ▪ Unsupervised learning model may give less accurate result as compared to supervised learning. ▪ Supervised learning is not close to true Artificial intelligence as in this, we first train the model for each data, and then only it can predict the correct output. ▪ Unsupervised learning is more close to the true Artificial Intelligence as it learns similarly as a child learns daily routine things by his experiences. ▪ It includes various algorithms such as Linear Regression, Logistic Regression, Support Vector Machine, Multi-class ▪ It includes various algorithms such as Clustering, KNN, and Apriori algorithm. Ref: https://www.javatpoint.com/supervised-machine-learning 51
  • 52. Regression Analysis in Machine learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Regression analysis is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables. More specifically, Regression analysis helps us to understand how the value of the dependent variable is changing corresponding to an independent variable when other independent variables are held fixed. It predicts continuous/real values such as temperature, age, salary, price, etc. ▪ Example: Suppose there is a marketing company A, who does various advertisement every year and get sales on that. The list shows the advertisement made by the company in the last 5 years and the corresponding sales: 53
  • 53. Regression Analysis in Machine learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Now, the company wants to do the advertisement of $200 in the year and wants to know the prediction about the sales for this year. So to solve such type of prediction problems in machine learning, we need regression analysis. 54
  • 54. Regression Analysis in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly used for prediction, forecasting, time series modelling, and determining the causal-effect relationship between variables. ▪ In Regression, we plot a graph between the variables which best fits the given datapoints, using this plot, the machine learning model can make predictions about the data. In simple words, "Regression shows a line or curve that passes through all the datapoints on target- predictor graph in such a way that the vertical distance between the datapoints and the regression line is minimum." The distance between datapoints and line tells whether a model has captured a strong relationship or not. 55
  • 55. Regression Analysis in Machine learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Some examples of regression can be as: •Prediction of rain using temperature and other factors •Determining Market trends •Prediction of road accidents due to rash driving. 56
  • 56. Why do we use Regression Analysis? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Regression analysis helps in the prediction of a continuous variable. There are various scenarios in the real world where we need some future predictions such as weather condition, sales prediction, marketing trends, etc., for such case we need some technology which can make predictions more accurately. So for such case we need Regression analysis which is a statistical method and used in machine learning and data science. ▪ Regression estimates the relationship between the target and the independent variable. ▪ It is used to find the trends in data. ▪ By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. 59
  • 57. Types of Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning 60
  • 59. Linear Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Linear regression is a statistical regression method which is used for predictive analysis. ▪ Shows the relationship between the continuous variables. ▪ It is used for solving the regression problem in machine learning. ▪ Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent variable (Y-axis), hence called linear regression. Y= aX+b Here, Y = dependent variables (target variables), X= Independent variables (predictor variables), a and b are the linear coefficients 62
  • 60. Linear Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than one input variables, then such linear regression is called multiple linear regression. ▪ The relationship between variables in the linear regression model can be explained using the image. Here we are predicting the salary of an employee on the basis of the year of experience. Here, Y = dependent variables (target variables), X= Independent variables (predictor variables), a and b are the linear coefficients Y= aX+b 63
  • 61. Linear Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc. ▪ Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable. ▪ The linear regression model provides a sloped straight line representing the relationship between the variables. 64
  • 62. Linear Regression in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning y= a0+a1x+ ε Here, ▪ Y= Dependent Variable (Target Variable) ▪ X= Independent Variable (Predictor Variable) ▪ a0= Intercept of the line (Gives an additional degree of freedom) ▪ a1 = Linear regression coefficient (scale factor to each input value). ▪ ε = random error ▪ The values for x and y variables are training datasets for Linear Regression model representation. 65
  • 63. Types of Linear Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Simple Linear Regression: If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression. ▪ Multiple Linear regression: If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression. 66
  • 64. Finding the best fit line RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ When working with linear regression, our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best fit line will have the least error. ▪ The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost function. y= a0+a1x+ ε 67
  • 65. Cost function RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ The different values for weights or coefficients of lines (a0, a1) give the different line of regression, and the cost function is used to estimate the values of the coefficients for the best fit line. ▪ Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is performing. ▪ We can use the cost function to find the accuracy of the mapping function, which maps the input variable to the output variable. This mapping function is also known as Hypothesis function. y= a0+a1x+ ε 68
  • 66. Cost function RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Where, N=Total number of observations Yi = Actual value (a1xi+a0)= Predicted value Residuals: The distance between the actual value and predicted values is called residual. If the observed points are far from the regression line, then the residual will be high, and so cost function will be high. If the scatter points are close to the regression line, then the residual will be small and hence the cost function. ▪ For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of squared error occurred between the predicted values and actual values. ▪ For the above linear equation, MSE can be calculated as: 69
  • 68. Logistic Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1. ▪ Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam or not spam, etc. ▪ It is a predictive analysis algorithm which works on the concept of probability. ▪ Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used to model the data in logistic regression. 71
  • 69. Logistic Regression RJEs: Remote job entry points https://mathworld.wolfram.com/SigmoidFunction.html ▪ f(x) = Output between the 0 and 1 value ▪ x = input to the function ▪ e = base of natural logarithm ▪ There are three types of logistic regression: •Binary (0/1, pass/fail) •Multi (cats, dogs, lions) •Ordinal (low, medium, high) 73
  • 71. Polynomial Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •Polynomial Regression is a type of regression which models the non-linear dataset using a linear model. •It is similar to multiple linear regression, but it fits a non-linear curve between the value of x and corresponding conditional values of y. •Suppose there is a dataset which consists of datapoints which are present in a non- linear fashion, so for such case, linear regression will not best fit to those datapoints. To cover such datapoints, we need Polynomial regression. •In Polynomial regression, the original features are transformed into polynomial features of given degree and then modelled using a linear model. Which means the data-points are best fitted using a polynomial line. 75
  • 72. Polynomial Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •The equation for polynomial regression also derived from linear regression equation that means Linear regression equation Y=b0+b1x, is transformed into Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn •Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is our independent/input variable. •The model is still linear as the coefficients are still linear with quadratic 76
  • 74. Support Vector Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Support Vector Machine is a supervised learning algorithm which can be used for regression as well as classification problems. So if we use it for regression problems, then it is termed as Support Vector Regression. ▪ Support Vector Regression is a regression algorithm which works for continuous variables. Below are some keywords which are used in Support Vector Regression: ▪ Kernel: It is a function used to map a lower- dimensional data into higher dimensional data. ▪ Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line which helps to predict the continuous variables and cover most of the datapoints. 78
  • 75. Support Vector Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin for data points. ▪ Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and opposite class. In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum number of datapoints are covered in that margin. ▪ The main goal of SVR is to consider the maximum data points within the boundary lines and the hyperplane (best-fit line) must contain a maximum number of data points. 79
  • 76. Decision Tree Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •Decision Tree is a supervised learning algorithm which can be used for solving both classification and regression problems. •It can solve problems for both categorical and numerical data •Decision Tree regression builds a tree-like structure in which each internal node represents the "test" for an attribute, each branch represent the result of the test, and each leaf node represents the final decision or result. •A decision tree is constructed starting from the root node/parent node (dataset), which splits into left and right child nodes (subsets of dataset). These child nodes are further divided into their children node, and themselves become the parent node of those nodes. 80
  • 78. Decision Tree Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Image showing the example of Decision Tee regression, here, the model is trying to predict the choice of a person between Sports cars or Luxury car. g(x)= f0(x)+ f1(x)+ f2(x)+.... 82
  • 80. Random forest RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Random forest is one of the most powerful supervised learning algorithms which is capable of performing regression as well as classification tasks ▪ The Random Forest regression is an ensemble learning method which combines multiple decision trees and predicts the final output based on the average of each tree output. The combined decision trees are called as base models g(x)= f0(x)+ f1(x)+ f2(x)+.... 84
  • 81. Random forest RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Random forest uses Bagging or Bootstrap Aggregation technique of ensemble learning in which aggregated decision tree runs in parallel and do not interact with each other. ▪ With the help of Random Forest regression, we can prevent Overfitting in the model by creating random subsets of the dataset. 85
  • 83. Ridge Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning •Ridge regression is one of the most robust versions of linear regression in which a small amount of bias is introduced so that we can get better long term predictions. •The amount of bias added to the model is known as Ridge Regression penalty. We can compute this penalty term by multiplying with the lambda to the squared weight of each individual features. •The equation for ridge regression will be: 87
  • 84. Ridge Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ A general linear or polynomial regression will fail if there is high collinearity between the independent variables, so to solve such problems, Ridge regression can be used. ▪ Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is also called as L2 regularization. ▪ It helps to solve the problems if we have more parameters than samples. 88
  • 86. Lasso Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Lasso regression is another regularization technique to reduce the complexity of the model. ▪ It is similar to the Ridge Regression except that penalty term contains only the absolute weights instead of a square of weights. ▪ Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near to 0. ▪ It is also called as L1 regularization. The equation for Lasso regression is: 90
  • 87. Model Performance RJEs: Remote job entry points https://byjus.com/maths/coefficient-of-determination/ https://aaweg-i.medium.com/what-precautions-we-need-to-keep-in-mind-when-using-coefficient-of-determination-98625e8bdb51 ▪ The Goodness of fit determines how the line of regression fits the set of observations. The process of finding the best model out of various models is called optimization. It can be achieved by below method: R-squared method: ▪ It measures the strength of the relationship between the dependent and independent variables on a scale of 0-100%. ▪ The high value of R-square determines the less difference between the predicted values and actual values and hence represents a good model. ▪ It is also called a coefficient of determination, or coefficient of multiple determination for multiple regression. 91
  • 88. Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning ▪ Gradient descent is used to minimize the MSE by calculating the gradient of the cost function. ▪ A regression model uses gradient descent to update the coefficients of the line by reducing the cost function. ▪ It is done by a random selection of values of coefficient and then iteratively update the values to reach the minimum cost function. 92
  • 89. Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning ▪ Gradient Descent is known as one of the most commonly used optimization algorithms to train machine learning models by means of minimizing errors between actual and expected results. Further, gradient descent is also used to train Neural Networks. ▪ Optimization algorithm refers to the task of minimizing/maximizing an objective function f(x) parameterized by x. ▪ Similarly, in machine learning, optimization is the task of minimizing the cost function parameterized by the model's parameters. The main objective of gradient descent is to minimize the convex function using iteration of parameter updates. 93
  • 90. What is Gradient Descent or Steepest Descent? RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning ▪ Gradient Descent is defined as one of the most commonly used iterative optimization algorithms of machine learning to train the machine learning and deep learning models. It helps in finding the local minimum of a function. ▪ If we move towards a negative gradient or away from the gradient of the function at the current point, it will give the local minimum of that function. ▪ Whenever we move towards a positive gradient or towards the gradient of the function at the current point, we will get the local maximum of that function. ▪ The main objective of using a gradient descent algorithm is to minimize the cost function using iteration. 94
  • 91. Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning https://www.geeksforgeeks.org/gradient-descent-in-linear-regression/ ▪ Calculates the first-order derivative of the function to compute the gradient or slope of that function. ▪ Move away from the direction of the gradient, which means slope increased from the current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the optimization process which helps to decide the length of the steps. 95
  • 92. What is Cost-function? RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning ▪ The cost function is defined as the measurement of difference or error between actual values and expected values. ▪ It helps to increase and improve machine learning efficiency by providing feedback to this model so that it can minimize error and find the local or global minimum. Further, it continuously iterates along the direction of the negative gradient until the cost function approaches zero. ▪ At this steepest descent point, model will stop learning further. 96
  • 93. How does Gradient Descent work? RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning ▪ The starting point (shown in fig.) is used to evaluate the performance as it is considered just as an arbitrary point. At this starting point, we will derive the first derivative or slope and then use a tangent line to calculate the steepness of this slope. Further, this slope will inform the updates to the parameters (weights and bias). ▪ The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters are generated, then steepness gradually reduces, and at the lowest point, it approaches the lowest point, which is called a point of convergence. ▪ The main objective of gradient descent is to minimize the cost function or the error between expected and actual. 97
  • 94. Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning Direction & Learning Rate ▪ These two factors are used to determine the partial derivative calculation of future iteration and allow it to the point of convergence or local minimum or global minimum. Learning Rate: ▪ It is defined as the step size taken to reach the minimum or lowest point. This is typically a small value that is evaluated and updated based on the behavior of the cost function. If the learning rate is high, it results in larger steps but also leads to risks of overshooting the minimum. At the same time, a low learning rate shows the small step sizes, which compromises overall efficiency but gives the advantage of more precision. 98
  • 95. Types of Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/ ▪ Based on the error in various training models, the Gradient Descent learning algorithm can be divided into ▪ Batch gradient descent ▪ Mini-batch gradient descent ▪ Stochastic gradient descent 99
  • 96. Batch Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/ ▪ Batch Gradient Descent: ▪ Batch gradient descent (BGD) is used to find the error for each point in the training set and update the model after evaluating all training examples of the batch. This procedure is known as the training epoch. ▪ Advantages of Batch gradient descent: ▪ It produces less noise in comparison to other gradient descent. ▪ It produces stable gradient descent convergence. ▪ It is Computationally efficient as all resources are used for all training samples. 100
  • 97. Stochastic gradient descent RJEs: Remote job entry points ▪ Stochastic gradient descent (SGD) is a type of gradient descent that runs one training example per iteration. Or in other words, it processes a training epoch for each example within a dataset and updates each training example's parameters one at a time. ▪ As it requires only one training example at a time, hence it is easier to store in allocated memory. ▪ However, it shows some computational efficiency losses in comparison to batch gradient systems as it shows frequent updates that require more detail and speed. ▪ Further, due to frequent updates, it is also treated as a noisy gradient. However, sometimes it can be helpful in finding the global minimum and also escaping the local minimum. https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/ 101
  • 98. Stochastic gradient descent RJEs: Remote job entry points ▪ Advantages of Stochastic gradient descent: ▪ In Stochastic gradient descent (SGD), learning happens on every example, and it consists of a few advantages over other gradient descent. ▪ It is easier to allocate in desired memory. ▪ It is relatively fast to compute than batch gradient descent. https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/ 102
  • 99. MiniBatch Gradient Descent: RJEs: Remote job entry points ▪ Mini Batch gradient descent is the combination of both batch gradient descent and stochastic gradient descent. It divides the training datasets into small batch sizes then performs the updates on those batches separately. ▪ Splitting training datasets into smaller batches make a balance to maintain the computational efficiency of batch gradient descent and speed of stochastic gradient descent. ▪ Hence, we can achieve a special type of gradient descent with higher computational efficiency and less noisy gradient descent. ▪ Advantages of Mini Batch gradient descent: ▪ It is easier to fit in allocated memory. ▪ It is computationally efficient. ▪ It produces stable gradient descent convergence. https://www.javatpoint.com/gradient-descent-in-machine-learning, https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/ 103
  • 100. Challenges with the Gradient Descent RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning Local Minima and Saddle Point: ▪ For convex problems, gradient descent can find the global minimum easily, while for non-convex problems, it is sometimes difficult to find the global minimum, where the machine learning models achieve the best results. ▪ Whenever the slope of the cost function is at zero or just close to zero, this model stops learning further. Apart from the global minimum, there occur some scenarios that can show this slop, which is saddle point and local minimum. Local minima generate the shape similar to the global minimum, where the slope of the cost function increases on both sides of the current points. ▪ In contrast, with saddle points, the negative gradient only occurs on one side of the point, which reaches a local maximum on one side and a local minimum on the other side. The name of a saddle point is taken by that of a horse's saddle. ▪ The name of local minima is because the value of the loss function is minimum at that point in a local region. In contrast, the name of the global minima is given so because the value of the loss function is minimum there, globally across the entire domain the loss function. 104
  • 101. Vanishing and Exploding Gradient RJEs: Remote job entry points https://www.javatpoint.com/gradient-descent-in-machine-learning ▪ Vanishing Gradients: ▪ Vanishing Gradient occurs when the gradient is smaller than expected. During backpropagation, this gradient becomes smaller that causing the decrease in the learning rate of earlier layers than the later layer of the network. Once this happens, the weight parameters update until they become insignificant. ▪ Exploding Gradient: ▪ Exploding gradient is just opposite to the vanishing gradient as it occurs when the Gradient is too large. Further, in this scenario, model weight increases, and they will be represented as NaN. This problem can be solved using the dimensionality reduction technique, which helps to minimize complexity within the model 105
  • 102. Classification Algorithm in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning What is the Classification Algorithm? ▪ The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories. ▪ y = f(x), where y = categorical output ▪ The best example of an ML classification algorithm is Email Spam Detector. ▪ The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data. ▪ Classification algorithms can be better understood using the diagram. In the diagram, there are two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other classes. 106
  • 103. Classification Algorithm in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ The algorithm which implements the classification on a dataset is known as a classifier. ▪ Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier. ▪ Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc. ▪ Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier. ▪ Example: Classifications of types of crops, Classification of types of music. 107
  • 104. Classification RJEs: Remote job entry points ▪ A Supervised Learning technique that is used to identify the category of new observations on the basis of training data[1] ▪ In classification, the output is categorical unlike in regression where it was based on predicting ‘values’ ▪ Types of classification[2]- ▪ Binary classification: When we have to categorize given data into 2 distinct classes. Example – On the basis of given health conditions of a person, we have to determine whether the person has a certain disease or not ▪ Multiclass classification: The number of classes is more than 2. For Example – On the basis of data about different species of flowers, we have to determine which specie our observation belongs Ref: [1] https://www.javatpoint.com/classification-algorithm-in-machine-learning [2] https://www.geeksforgeeks.org/getting-started-with-classification/?ref=lbp 108
  • 105. Classification and its types RJEs: Remote job entry points ▪ General Block diagram of classification task: Ref: https://www.geeksforgeeks.org/getting-started-with-classification/?ref=lbp ▪ There are various types of classifiers. Some of them are: ▪ Linear Classifiers: Logistic Regression ▪ Tree-Based Classifiers: Decision Tree Classifier ▪ Support Vector Machines ▪ Artificial Neural Networks ▪ Bayesian Regression ▪ Gaussian Naive Bayes Classifiers ▪ Stochastic Gradient Descent (SGD) Classifier ▪ Ensemble Methods: Random Forests, AdaBoost, Bagging Classifier, Voting Classifier, etc. • X: Pre-classified data • y: label/observations for X • y’: predicted labels for X 109
  • 106. Learners in Classification Problems RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ In the classification problems, there are two types of learners: ▪ Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In Lazy learner case, classification is done on the basis of the most related data stored in the training dataset. It takes less time in training but more time for predictions. Example: K-NN algorithm, Case-based reasoning ▪ Eager Learners: Eager Learners develop a classification model based on a training dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in prediction. ▪ Example: Decision Trees, Naïve Bayes, ANN. 110
  • 107. Types of ML Classification Algorithms RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Classification algorithms can be further divided into the mainly two category: ▪ Linear Models ▪ Logistic Regression ▪ Support Vector Machines ▪ Non-linear Models ▪ K-Nearest Neighbours ▪ Kernel SVM ▪ Naïve Bayes ▪ Decision Tree Classification ▪ Random Forest Classification 111
  • 108. Evaluating a Classification model RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Log Loss or Cross-Entropy Loss: ▪ It is used for evaluating the performance of a classifier, whose output is a probability value between the 0 and 1. ▪ For a good binary Classification model, the value of log loss should be near to 0. ▪ The value of log loss increases if the predicted value deviates from the actual value. ▪ The lower log loss represents the higher accuracy of the model. ▪ For Binary classification, cross-entropy can be calculated as: ▪ Here, pi is the probability of class 1, and (1-pi) is the probability of class 0. ▪ When the observation belongs to class 1 the first part of the formula becomes active and the second part vanishes and vice versa 112
  • 109. Confusion Matrix RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 ▪ The confusion matrix provides us a matrix/table as output and describes the performance of the model. ▪ It is also known as the error matrix. ▪ The matrix consists of predictions result in a summarized form, which has a total number of correct predictions and incorrect predictions. 113
  • 110. Accuracy RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 Accuracy simply measures how often the classifier makes the correct prediction. It’s the ratio between the number of correct predictions and the total number of predictions. 114
  • 111. Precision RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 ▪ It is a measure of correctness that is achieved in true prediction. In simple words, it tells us how many predictions are actually positive out of all the total positive predicted. 115
  • 112. Recall RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 ▪ It is a measure of actual observations which are predicted correctly, i.e. how many observations of positive class are actually predicted as positive. ▪ It is also known as Sensitivity. ▪ Recall is a valid choice of evaluation metric when we want to capture as many positives as possible. 116
  • 113. F-measure / F1-Score RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 ▪ The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall. We use harmonic mean because it is not sensitive to extremely large values, unlike simple averages. 117
  • 114. Sensitivity & Specificity RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 118
  • 115. Difference between Regression and Classification RJEs: Remote job entry points https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5 Regression Algorithm Classification Algorithm ▪ In Regression, the output variable must be of continuous nature or real value. ▪ In Classification, the output variable must be a discrete value. ▪ The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). ▪ The task of the classification algorithm is to map the input value(x) with the discrete output variable(y). ▪ Regression Algorithms are used with continuous data. ▪ Classification Algorithms are used with discrete data. ▪ In Regression, we try to find the best fit line, which can predict the output more accurately. ▪ In Classification, we try to find the decision boundary, which can divide the dataset into different classes. ▪ Regression algorithms can be used to solve the regression problems such as Weather Prediction, House price prediction, etc. ▪ Classification Algorithms can be used to solve classification problems such as Identification of spam emails, Speech Recognition, Identification of cancer cells, etc. ▪ The regression Algorithm can be further divided into Linear and Non-linear Regression. ▪ The Classification algorithms can be divided into Binary Classifier and Multi-class Classifier. 119
  • 116. RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Linear Regression Logistic Regression Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems. In Linear regression, we predict the value of continuous variables. In logistic Regression, we predict the values of categorical variables. In linear regression, we find the best fit line, by which we can easily predict the output. In Logistic Regression, we find the S-curve by which we can classify the samples. Least square estimation method is used for estimation of accuracy. Maximum likelihood estimation method is used for estimation of accuracy. The output for Linear Regression must be a continuous value, such as price, age, etc. The output of Logistic Regression must be a Categorical value such as 0 or 1, Yes or No, etc. In Linear regression, it is required that relationship between dependent variable and independent variable must be linear. In Logistic regression, it is not required to have the linear relationship between the dependent and independent variable. In linear regression, there may be collinearity between the independent variables. In logistic regression, there should not be collinearity between the independent variable. 120
  • 117. Linear Regression vs. Logistic Regression RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ The Linear Regression is used for solving Regression problems whereas Logistic Regression is used for solving the Classification problems. 121
  • 118. Clustering in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It can be defined as "A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group." ▪ It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color, behavior, etc., and divides them as per the presence and absence of those similar patterns. ▪ It is an unsupervised learning method; hence no supervision is provided to the algorithm, and it deals with the unlabelled dataset. 122
  • 119. Clustering in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can use this id to simplify the processing of large and complex datasets. ▪ The clustering technique can be widely used in various tasks. ▪ Market Segmentation ▪ Statistical data analysis ▪ Social network analysis ▪ Image segmentation ▪ Anomaly detection, etc. 123
  • 120. Types of Clustering Methods RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one group) and Soft Clustering (data points can belong to another group also). ▪ Partitioning Clustering ▪ Density-Based Clustering ▪ Distribution Model-Based Clustering ▪ Hierarchical Clustering ▪ Fuzzy Clustering 124
  • 121. Hierarchical Clustering in Machine Learning RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA. ▪ In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. ▪ Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they both differ depending on how they work. As there is no requirement to predetermine the number of clusters as we did in the K-Means algorithm. ▪ The hierarchical clustering technique has two approaches: ▪ Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data points as single clusters and merging them until one cluster is left. ▪ Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top- down approach. 125
  • 122. Hierarchical Clustering RJEs: Remote job entry points ▪ The clusters formed in this method form a tree-type structure called dendrogram based on the hierarchy[1] ▪ New clusters are formed using the previously formed one ▪ It is divided into two categories: ▪ Agglomerative clustering: a bottom-up approach ▪ Divisive clustering: top-down approach ▪ Examples are CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies), etc. ▪ Agglomerative based dendrogram[2]: Ref: [1] https://www.geeksforgeeks.org/clustering-in-machine-learning/ [2] https://towardsdatascience.com/machine-learning-algorithms-part-12-hierarchical-agglomerative-clustering-example-in-python-1e18e0075019 126
  • 123. Why hierarchical clustering? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ As we already have other clustering algorithms such as K-Means Clustering, then why we need hierarchical clustering? ▪ As we have seen in the K-means clustering that there are some challenges with this algorithm, which are a predetermined number of clusters, and it always tries to create the clusters of the same size. ▪ To solve these two challenges, we can opt for the hierarchical clustering algorithm ▪ In this algorithm, we don't need to have knowledge about the predefined number of clusters. 127
  • 124. Agglomerative Hierarchical clustering RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ The agglomerative hierarchical clustering algorithm is a popular example of HCA. ▪ To group the datasets into clusters, it follows the bottom-up approach. It means, this algorithm considers each dataset as a single cluster at the beginning, and then start combining the closest pair of clusters together. ▪ It does this until all the clusters are merged into a single cluster that contains all the datasets. ▪ This hierarchy of clusters is represented in the form of the dendrogram. 128
  • 125. How the Agglomerative Hierarchical clustering Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Step-1: Create each data point as a single cluster. Let's say there are N data points, so the number of clusters will also be N. ▪ Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there will now be N-1 clusters. ▪ Step-3: Again, take the two closest clusters and merge them together to form one cluster. There will be N-2 clusters. 129
  • 126. How the Agglomerative Hierarchical clustering Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters. ▪ Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to divide the clusters as per the problem. 130
  • 127. Measure for the distance between two clusters RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ As we have seen, the closest distance between the two clusters is crucial for the hierarchical clustering. There are various ways to calculate the distance between two clusters, and these ways decide the rule for clustering. ▪ These measures are called Linkage methods. ▪ Single Linkage: It is the Shortest Distance between the closest points of the clusters. 131
  • 128. Measure for the distance between two clusters RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Complete Linkage: It is the farthest distance between the two points of two different clusters. It is one of the popular linkage methods as it forms tighter clusters than single-linkage. ▪ Average Linkage: It is the linkage method in which the distance between each pair of datasets is added up and then divided by the total number of datasets to calculate the average distance between two clusters. ▪ Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is calculated. 132
  • 129. Density-Based Clustering RJEs: Remote job entry points ▪ This method connects the highly-dense areas into clusters [1] ▪ These methods have good accuracy and the ability to merge two clusters [2] ▪ Type of clustering algorithms play a crucial role in evaluating and finding non-linear shape structures based on density [3] ▪ The most popular density-based algorithm is DBSCAN which allows spatial clustering of data with noise ▪ It makes use of two concepts – Data Reachability and Data Connectivity Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning, [4] https://www.kdnuggets.com/2020/04/dbscan-clustering-algorithm-machine-learning.html [2] https://www.geeksforgeeks.org/clustering-in-machine-learning/, [3] https://www.geeksforgeeks.org/clustering-in-machine-learning/ ▪ Density-based spatial clustering of applications with noise (DBSCAN): ▪ Based on the idea that a cluster in data space is a contiguous region of high point density, separated from other such clusters by contiguous regions of low point density [4] ▪ No need to explicitly define the number of clusters (K) like in K-Means ▪ The DBSCAN algorithm uses two parameters: 1) minPts: The minimum number of points (a threshold) clustered together for a region to be considered dense, 2) eps (ε): A distance measure that will be used to locate the points in the neighborhood of any point ▪ There are three types of points after the DBSCAN clustering is complete: 1) Core points 2) Border points 3) Noise points 133
  • 130. Distribution Model-Based Clustering RJEs: Remote job entry points ▪ Here, the data is divided based on the probability of how a dataset belongs to a particular distribution [1] ▪ The grouping is done by assuming some distributions commonly Gaussian Distribution ▪ The data observed arises from a distribution consisting of a mixture of two or more cluster components [2] ▪ Furthermore, each component cluster has a density function having an associated probability or weight in this mixture ▪ The example of this type is the Expectation-Maximization Clustering algorithm that uses Gaussian Mixture Models (GMM) [1]. Two different examples of EM clustering are represented below: Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning [2] https://data-flair.training/blogs/clustering-in-machine-learning/ 134
  • 131. Partition Clustering RJEs: Remote job entry points ▪ It is a type of clustering that divides the data into non-hierarchical groups [1] ▪ It is also known as the centroid-based method ▪ These methods partition the objects into k clusters and each partition forms one cluster[2] ▪ This method is used to optimize an objective criterion similarity function ▪ The most common example of partitioning clustering is the K-Means Clustering algorithm [1] Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning [2] https://www.geeksforgeeks.org/clustering-in-machine-learning/ ▪ K-Means Clustering: ▪ It groups the unlabeled dataset into K clusters ▪ Main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters ▪ The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats the process until it does not find the best clusters ▪ It determines the best value for K center points or centroids by an iterative process 135
  • 132. Fuzzy Clustering RJEs: Remote job entry points ▪ Fuzzy clustering is a type of soft method in which a data object may belong to more than one group or cluster [1] ▪ Each dataset has a set of membership coefficients, which depend on the degree of membership to be in a cluster ▪ Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also known as the Fuzzy k-means algorithm Ref: [1] https://www.javatpoint.com/clustering-in-machine-learning [2] https://2-bitbio.com/post/clustering-rnaseq-data-using-fuzzy-c-means-clustering/ ▪ In the adjacent image, K-means clustering produces output based on minimum distance calculation and is an example of hard clustering[2] ▪ Fuzzy c-means perform soft clustering by giving a membership coefficient to the data points ▪ Fuzzy clustering is used to solve multiclass or ambiguous clustering problems. 136
  • 133. Applications of Clustering RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ In Identification of Cancer Cells: The clustering algorithms are widely used for the identification of cancerous cells. It divides the cancerous and non-cancerous data sets into different groups. ▪ In Search Engines: Search engines also work on the clustering technique. The search result appears based on the closest object to the search query. It does it by grouping similar data objects in one group that is far from the other dissimilar objects. The accurate result of a query depends on the quality of the clustering algorithm used. ▪ Customer Segmentation: It is used in market research to segment the customers based on their choice and preferences. ▪ In Biology: It is used in the biology stream to classify different species of plants and animals using the image recognition technique. 137
  • 134. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre- defined clusters that need to be created in the process, as if K=3, there will be three clusters, and for K=4, there will be four clusters, and so on. ▪ It is an iterative algorithm that divides the unlabelled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. ▪ It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabelled dataset on its own without the need for any training. 138
  • 135. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters. ▪ The algorithm takes the unlabelled dataset as input, divides the dataset into k- number of clusters, and repeats the process until it does not find the best clusters. The value of k should be predetermined in this algorithm. ▪ The k-means clustering algorithm mainly performs two tasks: ▪ Determines the best value for K center points or centroids by an iterative process. ▪ Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster. ▪ Hence each cluster has datapoints with some commonalities, and it is away from other clusters. 139
  • 136. How does the K-Means Algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Step-1: Select the number K to decide the number of clusters. ▪ Step-2: Select random K points or centroids. (It can be other from the input dataset). ▪ Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. ▪ Step-4: Calculate the variance and place a new centroid of each cluster. ▪ Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster. ▪ Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. ▪ Step-7: The model is ready. 140
  • 137. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different clusters. It means here we will try to group these datasets into two different clusters. We need to choose some random k points or centroid to form the cluster. These points can be either the points from the dataset or any other point. So, here we are selecting the below two points as k points, which are not the part of our dataset. ▪ Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute it by applying some mathematics that we have studied to calculate the distance between two points. So, we will draw a median between both the centroids. 141
  • 138. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Points left side of the line is near to the K1 or blue centroid, and points to the right of the line are close to the yellow centroid. ▪ Let's color them as blue and yellow for clear visualization. 142
  • 139. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ As we need to find the closest cluster, so we will repeat the process by choosing a new centroid. To choose the new centroids, we will compute the center of gravity of these centroids, and will find new centroids as below ▪ Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process of finding a median line. 143
  • 140. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ We can see, one yellow point is on the left side of the line, and two blue points are right to the line. So, these three points will be reassigned to new centroids ▪ As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-points. ▪ We will repeat the process by finding the center of gravity of centroids, so the new centroids will be as shown in the image 144
  • 141. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ As we got the new centroids so again will draw the median line and reassign the data points. ▪ We can see in the image; there are no dissimilar data points on either side of the line, which means our model is formed. 145
  • 142. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as shown in the below image 146
  • 143. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ How to choose the value of "K number of clusters" in K-means Clustering? ▪ The performance of the K-means clustering algorithm depends upon highly efficient clusters that it forms. But choosing the optimal number of clusters is a big task. ▪ There are some different ways to find the optimal number of clusters, but here we are discussing the most appropriate method to find the number of clusters or value of K. ▪ Elbow Method 147
  • 144. Elbow Method RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ The Elbow method is one of the most popular ways to find the optimal number of clusters. ▪ This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the total variations within a cluster. The formula to calculate the value of WCSS (for 3 clusters) is given below ▪ ∑Pi in Cluster1 distance (Pi C1)2: It is the sum of the square of the distances between each data point and its centroid within a cluster1 and the same for the other two terms. 148
  • 145. K-Means Clustering Algorithm RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ To measure the distance between data points and centroid, we can use any method such as Euclidean distance or Manhattan distance. ▪ To find the optimal value of clusters, the elbow method follows the below steps: ▪ It executes the K-means clustering on a given dataset for different K values (ranges from 1-10). ▪ For each value of K, calculates the WCSS value. ▪ Plots a curve between calculated WCSS values and the number of clusters K. ▪ The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the best value of K. ▪ Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow method. 149
  • 146. Decision Tree Classification Algorithm RJEs: Remote job entry points https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm ▪ Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. ▪ In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. ▪ The decisions or the test are performed on the basis of features of the given dataset. ▪ It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. 150
  • 147. Decision Tree Classification Algorithm RJEs: Remote job entry points https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm ▪ It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure. ▪ In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. ▪ A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees. 151
  • 148. Why use Decision Trees? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand. ▪ The logic behind the decision tree can be easily understood because it shows a tree-like structure. ▪ Decision Tree Terminologies ▪ Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets. ▪ Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. ▪ Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. ▪ Branch/Sub Tree: A tree formed by splitting the tree. ▪ Pruning: Pruning is the process of removing the unwanted branches from the tree. ▪ Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes. 152
  • 149. How does the Decision Tree algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the comparison, follows the branch and jumps to the next node. ▪ For the next node, the algorithm again compares the attribute value with the other sub- nodes and move further. It continues the process until it reaches the leaf node of the tree. The complete process can be better understood using the below algorithm: ▪ Step-1: Begin the tree with the root node, says S, which contains the complete dataset. ▪ Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM). ▪ Step-3: Divide the S into subsets that contains possible values for the best attributes. ▪ Step-4: Generate the decision tree node, which contains the best attribute. ▪ Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node. 153
  • 150. How does the Decision Tree algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or Not. ▪ So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM). ▪ The root node splits further into the next decision node (distance from the office) and one leaf node based on the corresponding labels. ▪ The next decision node further gets split into one decision node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer). 154
  • 151. How does the Decision Tree algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Attribute Selection Measures ▪ While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub- nodes. So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There are two popular techniques for ASM, which are: ▪ Information Gain ▪ Gini Index 155
  • 152. How does the Decision Tree algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Information Gain: ▪ Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute. ▪ It calculates how much information a feature provides us about a class. ▪ According to the value of information gain, we split the node and build the decision tree. ▪ A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula: Information Gain= Entropy (S) - [(Weighted Avg) *Entropy(each feature)] 156
  • 153. How does the Decision Tree algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy(S)= -P(yes)log2 P(yes)- P(no) log2 P(no) Where, •S= Total number of samples •P (yes)= probability of yes •P (no)= probability of no 157
  • 154. How does the Decision Tree algorithm Work? RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Gini Index: •Gini index is a measure of impurity or purity used while creating a decision tree in the CART (Classification and Regression Tree) algorithm. •An attribute with the low Gini index should be preferred as compared to the high Gini index. •It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. •Gini index can be calculated using the below formula: 158
  • 155. Pruning: Getting an Optimal Decision tree RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning, https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/ ▪ Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision tree. ▪ A too-large tree increases the risk of overfitting, and a small tree may not capture all the important features of the dataset. ▪ Therefore, a technique that decreases the size of the learning tree without reducing accuracy is known as Pruning. ▪ There are mainly two types of tree pruning technology used: ▪ Cost Complexity Pruning ▪ Reduced Error Pruning 159
  • 156. Pruning: Getting an Optimal Decision tree RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning, https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/ 160
  • 157. Pruning: Getting an Optimal Decision tree RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning, https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/ [2] 161
  • 158. Advantages/Disadvantages of the Decision Tree RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Advantages of the Decision Tree ▪ It is simple to understand as it follows the same process which a human follow while making any decision in real-life. ▪ It can be very useful for solving decision-related problems. ▪ It helps to think about all the possible outcomes for a problem. ▪ There is less requirement of data cleaning compared to other algorithms. ▪ Disadvantages of the Decision Tree ▪ The decision tree contains lots of layers, which makes it complex. ▪ It may have an overfitting issue, which can be resolved using the Random Forest algorithm. ▪ For more class labels, the computational complexity of the decision tree may increase. 162
  • 159. Python Implementation of Decision Tree RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Data Pre-processing step ▪ Fitting a Decision-Tree algorithm to the Training set ▪ Predicting the test result ▪ Test accuracy of the result (Creation of Confusion matrix) ▪ Visualizing the test set result 163
  • 160. Data Pre-Processing Step RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning 1.# importing libraries 2.import numpy as nm 3.import matplotlib.pyplot as mtp 4.import pandas as pd 5.#importing datasets 6.data_set= pd.read_csv('user_data.csv') 7.#Extracting Independent and dependent Variable 8.x= data_set.iloc[:, [2,3]].values 9.y= data_set.iloc[:, 4].values 10.# Splitting the dataset into training and test set. 11.from sklearn.model_selection import train_test_split 12.x_train, x_test, y_train, y_test= train_test_split (x, y, test_size= 0.25, random_state=0) 13. #feature Scaling 14.from sklearn.preprocessing import StandardScaler 15.st_x= StandardScaler() 16.x_train= st_x.fit_transform(x_train) 17.x_test= st_x.transform(x_test) 164
  • 161. Data Pre-Processing Step RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning 165
  • 162. Fitting a Decision-Tree algorithm to the Training set RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Now we will fit the model to the training set. For this, we will import the DecisionTreeClassifier class from sklearn.tree library. Below is the code for it: 1.#Fitting Decision Tree classifier to the training set 2.From sklearn.tree import DecisionTreeClassifier 3.classifier= DecisionTreeClassifier(criterion='entropy', random_state=0) 4.classifier.fit(x_train, y_train) ▪ In the above code, we have created a classifier object, in which we have passed two main parameters; ▪ "criterion='entropy': Criterion is used to measure the quality of split, which is calculated by information gain given by entropy. ▪ random_state=0": For generating the random states. 166
  • 163. Fitting a Decision-Tree algorithm to the Training set RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Now we will fit the model to the training set. For this, we will import the DecisionTreeClassifier class from sklearn.tree library. Below is the code for it: 1.#Fitting Decision Tree classifier to the training set 2.From sklearn.tree import DecisionTreeClassifier 3.classifier= DecisionTreeClassifier(criterion='entropy', random_state=0) 4.classifier.fit(x_train, y_train) ▪ In the above code, we have created a classifier object, in which we have passed two main parameters; ▪ "criterion='entropy': Criterion is used to measure the quality of split, which is calculated by information gain given by entropy. ▪ random_state=0": For generating the random states. Out[8]: DecisionTreeClassifier(class_weight= None, criterion='entropy', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=0, 167
  • 164. Fitting a Decision-Tree algorithm to the Training set RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Out[8]: DecisionTreeClassifier(class_weight =None, criterion='entropy', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=0, splitter='best') 168
  • 165. Predicting the test result RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning Now we will predict the test set result. We will create a new prediction vector y_pred. Below is the code for it: 1.#Predicting the test set result 2.y_pred= classifier.predict(x_test) In the below output image, the predicted output and real test output are given. We can clearly see that there are some values in the prediction vector, which are different from the real vector values. These are prediction errors. 169
  • 166. Predicting the test result RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning 170
  • 167. Test accuracy of the result (Creation of Confusion matrix) RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ In the above output, we have seen that there were some incorrect predictions, so if we want to know the number of correct and incorrect predictions, we need to use the confusion matrix. Below is the code for it: 1.#Creating the Confusion matrix 2.from sklearn.metrics import confusion_matrix 3.cm= confusion_matrix(y_test, y_pred) ▪ In the above output image, we can see the confusion matrix, which has 6+3= 9 incorrect predictions and 62+29=91 correct predictions. 171
  • 168. Visualizing the training set result: RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Here we will visualize the training set result. To visualize the training set result we will plot a graph for the decision tree classifier. The classifier will predict yes or No for the users who have either Purchased or Not purchased the SUV through Logistic Regression. Below is the code for it: ▪ The above output is completely different from the rest classification models. It has both vertical and horizontal lines that are splitting the dataset according to the age and estimated salary variable. ▪ As we can see, the tree is trying to capture each dataset, which is the case of overfitting. 172
  • 169. Visualizing the training set result: RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning 1.#Visulaizing the trianing set result 2.from matplotlib.colors import ListedColormap 3.x_set, y_set = x_train, y_train 4.x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), 5.nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2. ravel()]).T).reshape(x1.shape), 7.alpha = 0.75, cmap = ListedColormap(('purple','green' ))) 8.mtp.xlim(x1.min(), x1.max()) 9.mtp.ylim(x2.min(), x2.max()) 10.fori, j in enumerate(nm.unique(y_set)): 11.mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 12. c = ListedColormap(('purple', 'green'))(i), label = j) 13.mtp.title('Decision Tree Algorithm (Training set)') 14.mtp.xlabel('Age') 15.mtp.ylabel('Estimated Salary') 16.mtp.legend() 17.mtp.show() 173
  • 170. Visualizing the test set result: RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ Visualization of test set result will be similar to the visualization of the training set except that the training set will be replaced with the test set. 1.#Visulaizing the test set result 2.from matplotlib.colors import ListedColormap 3.x_set, y_set = x_test, y_test 4.x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), 5.nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2. ravel()]).T).reshape(x1.shape), 7.alpha = 0.75, cmap = ListedColormap(('purple','green' ))) 8.mtp.xlim(x1.min(), x1.max()) 9.mtp.ylim(x2.min(), x2.max()) 10.fori, j in enumerate(nm.unique(y_set)): 11.mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 12. c = ListedColormap(('purple', 'green'))(i), label = j) 13.mtp.title('Decision Tree Algorithm(Test set)') 14.mtp.xlabel('Age') 15.mtp.ylabel('Estimated Salary') 16.mtp.legend() 17.mtp.show() 174
  • 171. Visualizing the test set result: RJEs: Remote job entry points Ref: https://www.javatpoint.com/supervised-machine-learning ▪ As we can see in the above image that there are some green data points within the purple region and vice versa. ▪ So, these are the incorrect predictions which we have discussed in the confusion matrix. 175
  • 172. K-Nearest Neighbor (KNN) Algorithm 176
  • 173. K-Nearest Neighbor (KNN) Algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique. ▪ K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. ▪ K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. ▪ K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems. ▪ K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. ▪ It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. ▪ KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. ▪ Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either cat or dog category. 177
  • 174. K-Nearest Neighbor (KNN) Algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning 178
  • 175. Why do we need a K-NN Algorithm? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. ▪ With the help of K-NN, we can easily identify the category or class of a particular dataset. 179
  • 176. How does K-NN work? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Step-1: Select the number K of the neighbors ▪ Step-2: Calculate the Euclidean distance of K number of neighbors ▪ Step-3: Take the K nearest neighbors as per the calculated Euclidean distance. ▪ Step-4: Among these k neighbors, count the number of the data points in each category. ▪ Step-5: Assign the new data points to that category for which the number of the neighbor is maximum. 180
  • 177. How does K-NN work? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Suppose we have a new data point and we need to put it in the required category. Consider the image: ▪ Firstly, we will choose the number of neighbors, so we will choose the k=5. ▪ Next, we will calculate the Euclidean distance between the data points. ▪ The Euclidean distance is the distance between two points, It can be calculated as follows: 181
  • 178. How does K-NN work? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B. Consider the below image: ▪ As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A. 182
  • 179. How to select the value of K in the K-NN Algorithm? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Below are some points to remember while selecting the value of K in the K-NN algorithm: ▪ There is no particular way to determine the best value for "K", so we need to try some values to find the best out of them. ▪ The most preferred value for K is 5. ▪ A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model. 183
  • 180. Advantages / Disadvantages of KNN Algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Advantages of KNN Algorithm ▪ It is simple to implement ▪ It is robust to the noisy training data ▪ It can be more effective if the training data is large. ▪ Disadvantages of KNN Algorithm ▪ Always needs to determine the value of K which may be complex some time. ▪ The computation cost is high because of calculating the distance between the data points for all the training samples. 184
  • 181. Python implementation of the KNN algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Problem statement: There is a Car manufacturer company that has manufactured a new SUV car. ▪ The company wants to give the ads to the users who are interested in buying that SUV. ▪ So for this problem, we have a dataset that contains multiple user's information through the social network. ▪ The dataset contains lots of information but the Estimated Salary and Age we will consider for the independent variable and the Purchased variable is for the dependent variable. 185
  • 182. Steps to implement the K-NN algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Data Pre-processing step ▪ Fitting the K-NN algorithm to the Training set ▪ Predicting the test result ▪ Test accuracy of the result(Creation of Confusion matrix) ▪ Visualizing the test set result. 186
  • 183. Data Pre-Processing Step RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning 1.# importing libraries 2.import numpy as nm 3.import matplotlib.pyplot as mtp 4.import pandas as pd 5. #importing datasets 6.data_set= pd.read_csv('user_data.csv') 7. #Extracting Independent and dependent Variable 8.x= data_set.iloc[:, [2,3]].values 9.y= data_set.iloc[:, 4].values 10. # Splitting the dataset into training and test set. 11.from sklearn.model_selection import train_test_split 12.x_train, x_test, y_train, y_test= train_test_split (x, y, test_size= 0.25, random_state=0) 13. #feature Scaling 14.from sklearn.preprocessing import StandardScaler 15.st_x= StandardScaler() 16.x_train= st_x.fit_transform(x_train) 17.x_test= st_x.transform(x_test) 187
  • 184. Data Pre-Processing Step RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ By executing the above code, our dataset is imported to our program and well pre- processed. After feature scaling our test dataset will look like ▪ From the above output image, we can see that our data is successfully scaled 188
  • 185. Fitting K-NN classifier to the Training data RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Now we will fit the K-NN classifier to the training data. ▪ To do this we will import the KNeighborsClassifier class of Sklearn Neighbors library. ▪ After importing the class, we will create the Classifier object of the class. ▪ The Parameter of this class will ben_neighbors: To define the required neighbors of the algorithm. Usually, it takes 5. ▪ metric='minkowski': This is the default parameter and it decides the distance between the points. ▪ p=2: It is equivalent to the standard Euclidean metric. ▪ And then we will fit the classifier to the training data. 1.#Fitting K- NN classifier to the training set 2.from sklearn.neighbors import KNeigh borsClassifier 3.classifier= KNeighborsClassifier(n_nei ghbors=5, metric='minkowski', p=2 ) 4.classifier.fit(x_train, y_train) 189
  • 186. Python implementation of the KNN algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning Output: By executing the above code, we will get the output as: Out[10]: KNeighborsClassifier (algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform') ▪ Predicting the Test Result: To predict the test set result 1.#Predicting the test set result 2.y_pred= classifier.predict (x_test) 190
  • 187. Creating the Confusion Matrix RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Now we will create the Confusion Matrix for our K-NN model to see the accuracy of the classifier. ▪ #Creating the Confusion matrix ▪ from sklearn.metrics import confusion_matrix ▪ cm= confusion_matrix (y_test, y_pred) ▪ In above code, we have imported the confusion_matrix function and called it using the variable cm. ▪ Output: By executing the above code, we will get the matrix as shown in the image: 191
  • 188. Confusion Matrix RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ In the image, we can see there are 64+29= 93 correct predictions ▪ 3+4= 7 incorrect predictions 192
  • 189. Visualizing the Training set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning 1.#Visulaizing the trianing set result 2.from matplotlib.colors import ListedColormap 3.x_set, y_set = x_train, y_train 4.x1, x2 = nm.meshgrid (nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), 5.nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2. ravel()]).T).reshape(x1.shape), 7.alpha = 0.75, cmap = ListedColormap(('red','green' ))) 8.mtp.xlim(x1.min(), x1.max()) 9.mtp.ylim(x2.min(), x2.max()) 10.for i, j in enumerate(nm.unique(y_set)): 11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 12. c = ListedColormap(('red', 'green'))(i), label = j) 13.mtp.title('K-NN Algorithm (Training set)') 14.mtp.xlabel('Age') 15.mtp.ylabel('Estimated Salary') 16.mtp.legend() 17.mtp.show() 193
  • 190. Visualizing the Training set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ The above graph is showing the output for the test data set. ▪ As we can see in the graph, the predicted output is well good as most of the red points are in the red region and most of the green points are in the green region ▪ However, there are few green points in the red region and a few red points in the green region. So these are the incorrect observations that we have observed in the confusion matrix (7 Incorrect output). 194
  • 192. Support Vector Machine RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. ▪ However, primarily, it is used for Classification problems in Machine Learning. ▪ The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. 196
  • 193. Support Vector Machine RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ This best decision boundary is called a hyperplane. ▪ SVM chooses the extreme points/vectors that help in creating the hyperplane. ▪ These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine. ▪ Consider the diagram in which there are two different categories that are classified using a decision boundary or hyperplane: 197
  • 194. Support Vector Machine RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Example: Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. ▪ We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature. ▪ So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below diagram: 198
  • 195. Types of SVM RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning SVM can be of two types: ▪ Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. ▪ Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. 199
  • 196. Hyperplane and Support Vectors in the SVM algorithm RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM. ▪ The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. ▪ We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points. ▪ The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector. 200
  • 197. How does SVM works? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Linear SVM: ▪ The working of the SVM algorithm can be understood by using an example. ▪ Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. ▪ We want a classifier that can classify the pair (x1, x2) of coordinates in either green or blue. Consider the image: ▪ So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be multiple lines that can separate these classes. Consider the below image: 201
  • 198. How does SVM works? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. ▪ SVM algorithm finds the closest point of the lines from both the classes. These points are called support vectors. ▪ The distance between the vectors and the hyperplane is called as margin. ▪ And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane. 202
  • 199. How does SVM works? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Non-Linear SVM: ▪ If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. Consider the below image: ▪ So to separate these data points, we need to add one more dimension. ▪ For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as: z = x2 +y2 203
  • 200. How does SVM works? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ By adding the third dimension, the sample space will become as below image: ▪ So now, SVM will divide the datasets into classes in the following way. z = x2 +y2 204
  • 201. How does SVM works? RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. ▪ If we convert it in 2d space with z=1, then it will become as: z = x2 +y2 205
  • 202. Working of Unsupervised Learning RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Hence we get a circumference of radius 1 in case of non-linear data. z = x2 +y2 206
  • 203. Data Pre-processing step RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning 1.#Data Pre-processing Step 2.# importing libraries 3.import numpy as nm 4.import matplotlib.pyplot as mtp 5.import pandas as pd 6. #importing datasets 7.data_set= pd.read_csv ('user_data.csv') 8. #Extracting Independent and dependent Variable 9.x= data_set.iloc [:, [2,3]].values 10.y= data_set.iloc [:, 4].values 11.# Splitting the dataset into training and test set. 12.from sklearn.model_selection import train_test_split 13.x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0) 14.#feature Scaling 15.from sklearn.preprocessing import StandardScaler 16.st_x= StandardScaler() 17.x_train= st_x.fit_transform(x_train) 18.x_test= st_x.transform(x_test) 207
  • 204. Data Pre-processing step RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ After executing the above code, we will pre-process the data. The code will give the dataset as: 208
  • 205. Data Pre-processing step RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning The scaled output for the test set will be: 209
  • 206. Fitting the SVM classifier to the training set RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Now the training set will be fitted to the SVM classifier. ▪ To create the SVM classifier, we will import SVC class from Sklearn.svm library. ▪ Below is the code for it: ▪ from sklearn.svm import SVC # "Support vector classifier" ▪ classifier = SVC(kernel='linear', random_state=0) ▪ classifier.fit(x_train, y_train) ▪ In the above code, we have used kernel='linear', as here we are creating SVM for linearly separable data. However, we can change it for non-linear data. And then we fitted the classifier to the training dataset(x_train, y_train) 210
  • 207. Output RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning Out[8]: SVC (C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto_deprecated', kernel='linear', max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001, verbose=False) ▪ The model performance can be altered by changing the value of ▪ C (Regularization factor), ▪ gamma, ▪ kernel. 211
  • 208. Predicting the test set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Now, we will predict the output for test set. For this, we will create a new vector y_pred. Below is the code for it: ▪ #Predicting the test set result ▪ y_pred= classifier.predict (x_test) ▪ After getting the y_pred vector, we can compare the result of y_pred and y_test to check the difference between the actual value and predicted value. ▪ Output: Image is the output for the prediction of the test set: 212
  • 209. Creating the confusion matrix RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Now we will see the performance of the SVM classifier that how many incorrect predictions are there. ▪ To create the confusion matrix, we need to import the confusion_matrix function of the sklearn library. ▪ After importing the function, we will call it using a new variable cm. ▪ The function takes two parameters, mainly y_true (the actual values) and y_pred (the targeted value return by the classifier). 1.#Creating the Confusion matrix 2.from sklearn.metrics import confusion_matrix 3.cm= confusion_matrix(y_test, y_pred) 213
  • 210. Creating the confusion matrix RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ As we can see in the output image, there are 66+24= 90 correct predictions and 8+2= 10 correct predictions. 214
  • 211. Visualizing the training set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning 1.from matplotlib.colors import ListedColormap 2.x_set, y_set = x_train, y_train 3.x1, x2 = nm.meshgrid (nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), 4.nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 5.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape), 6.alpha = 0.75, cmap = ListedColormap(('red', 'green'))) 7.mtp.xlim(x1.min(), x1.max()) 8.mtp.ylim(x2.min(), x2.max()) 9.for i, j in enumerate(nm.unique(y_set)): 10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 11. c = ListedColormap(('red', 'green'))(i), label = j) 12.mtp.title('SVM classifier (Training set)') 13.mtp.xlabel('Age') 14.mtp.ylabel('Estimated Salary') 15.mtp.legend() 16.mtp.show() 215
  • 212. Visualizing the training set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Output ▪ As we can see, the above output is appearing similar to the Logistic regression output. ▪ In the output, we got the straight line as hyperplane because we have used a linear kernel in the classifier. ▪ We have also discussed above that for the 2d space, the hyperplane in SVM is a straight line. 216
  • 213. Visualizing the test set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning 1.#Visulaizing the test set result 2.from matplotlib.colors import ListedColormap 3.x_set, y_set = x_test, y_test 4.x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), 5.nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 6.mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel() ]).T).reshape(x1.shape), 7.alpha = 0.75, cmap = ListedColormap(('red','green' ))) 8.mtp.xlim(x1.min(), x1.max()) 9.mtp.ylim(x2.min(), x2.max()) 10.for i, j in enumerate(nm.unique(y_set)): 11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 12. c = ListedColormap(('red', 'green'))(i), label = j) 13.mtp.title('SVM classifier (Test set)') 14.mtp.xlabel('Age') 15.mtp.ylabel('Estimated Salary') 16.mtp.legend() 17.mtp.show() 217
  • 214. Visualizing the test set result RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ As we can see in the above output image, the SVM classifier has divided the users into two regions (Purchased or Not purchased). ▪ Users who purchased the SUV are in the red region with the red scatter points. ▪ Users who did not purchase the SUV are in the green region with green scatter points. ▪ The hyperplane has divided the two classes into Purchased and not purchased variable. 218
  • 216. Semi-Supervised Learning RJEs: Remote job entry points ▪ Also known as transductive learning ▪ Uses both labeled and unlabeled data to perform an otherwise supervised learning or unsupervised learning task ▪ Initially motivated by its practical value in learning faster, better, and cheaper ▪ Has applications in cognitive psychology as a computational model for human learning ▪ It comprises a smaller part of the supervised learning method and a larger part of the unlabeled component ▪ Some of the applications are in text classification, iterative co-training-based applications such as webpage classification, lane finding on GPS data, etc. Ref: https://pages.cs.wisc.edu/~jerryzhu/pub/SSL_EoML.pdf 220
  • 217. Algorithm Flow RJEs: Remote job entry points ▪ Semi-Supervised learning Algorithm Flow Ref: https://www.cs.cmu.edu/~ninamf/courses/401sp18/lectures/ssl-04-18.pdf ▪ The models based on this are semi-supervised SVM, graph-based models, generative models, etc. 221
  • 218. Major Kernel Functions in Support Vector Machine 222
  • 219. Major Kernel Functions in Support Vector Machine RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning What is Kernel Method? ▪ A set of techniques known as kernel methods are used in machine learning to address classification, regression, and other prediction issues. They are built around the idea of kernels, which are functions that gauge how similar two data points are to one another in a high- dimensional feature space. ▪ Kernel methods' fundamental premise is used to convert the input data into a high-dimensional feature space, which makes it simpler to distinguish between classes or generate predictions. Kernel methods employ a kernel function to implicitly map the data into the feature space, as opposed to manually computing the feature space. ▪ The most popular kind of kernel approach is the Support Vector Machine (SVM), a binary classifier that determines the best hyperplane that most effectively divides the two groups. In order to efficiently locate the ideal hyperplane, SVMs map the input into a higher-dimensional space using a kernel function. ▪ Other examples of kernel methods include kernel ridge regression, kernel PCA, and Gaussian processes. Since they are strong, adaptable, and computationally efficient, kernel approaches are frequently employed in machine learning. They are resilient to noise and outliers and can handle sophisticated data structures like strings and graphs. 223
  • 220. Kernel Method in SVMs RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Support Vector Machines (SVMs) use kernel methods to transform the input data into a higher- dimensional feature space, which makes it simpler to distinguish between classes or generate predictions. ▪ Kernel approaches in SVMs work on the fundamental principle of implicitly mapping input data into a higher-dimensional feature space without directly computing the coordinates of the data points in that space. ▪ The kernel function in SVMs is essential in determining the decision boundary that divides the various classes. ▪ In order to calculate the degree of similarity between any two points in the feature space, the kernel function computes their dot product. ▪ The most commonly used kernel function in SVMs is the Gaussian or radial basis function (RBF) kernel. The RBF kernel maps the input data into an infinite-dimensional feature space using a Gaussian function. This kernel function is popular because it can capture complex nonlinear relationships in the data. 224
  • 221. Kernel Method in SVMs RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Other types of kernel functions that can be used in SVMs include the polynomial kernel, the sigmoid kernel, and the Laplacian kernel. The choice of kernel function depends on the specific problem and the characteristics of the data. ▪ Basically, kernel methods in SVMs are a powerful technique for solving classification and regression problems, and they are widely used in machine learning because they can handle complex data structures and are robust to noise and outliers. 225
  • 222. Characteristics of Kernel Function RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Mercer's condition: A kernel function must satisfy Mercer's condition to be valid. This condition ensures that the kernel function is positive semi definite, which means that it is always greater than or equal to zero. ▪ Positive definiteness: A kernel function is positive definite if it is always greater than zero except for when the inputs are equal to each other. ▪ Non-negativity: A kernel function is non-negative, meaning that it produces non- negative values for all inputs. ▪ Symmetry: A kernel function is symmetric, meaning that it produces the same value regardless of the order in which the inputs are given. 226
  • 223. Characteristics of Kernel Function RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Reproducing property: A kernel function satisfies the reproducing property if it can be used to reconstruct the input data in the feature space. ▪ Smoothness: A kernel function is said to be smooth if it produces a smooth transformation of the input data into the feature space. ▪ Complexity: The complexity of a kernel function is an important consideration, as more complex kernel functions may lead to over fitting and reduced generalization performance. 227
  • 224. Selecting an appropriate kernel function RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ Basically, the choice of kernel function depends on the specific problem and the characteristics of the data, and selecting an appropriate kernel function can significantly impact the performance of machine learning algorithms. ▪ Major Kernel Function in Support Vector Machine ▪ In Support Vector Machines (SVMs), there are several types of kernel functions that can be used to map the input data into a higher-dimensional feature space. The choice of kernel function depends on the specific problem and the characteristics of the data. 228
  • 225. Linear Kernel RJEs: Remote job entry points https://www.javatpoint.com/ ▪ A linear kernel is a type of kernel function used in machine learning, including in SVMs (Support Vector Machines). It is the simplest and most commonly used kernel function, and it defines the dot product between the input vectors in the original feature space. ▪ The linear kernel can be defined as: K(x, y) = x.y ▪ Where x and y are the input feature vectors. ▪ The dot product of the input vectors is a measure of their similarity or distance in the original feature space. ▪ When using a linear kernel in an SVM, the decision boundary is a linear hyperplane that separates the different classes in the feature space. ▪ This linear boundary can be useful when the data is already separable by a linear decision boundary or when dealing with high-dimensional data, where the use of more complex kernel functions may lead to overfitting 229
  • 226. Polynomial Kernel RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ It is a nonlinear kernel function that employs polynomial functions to transfer the input data into a higher-dimensional feature space. ▪ One definition of the polynomial kernel is: ▪ Where x and y are the input feature vectors, c is a constant term, and d is the degree of the polynomial, K(x, y) = (x. y + c)d. ▪ The constant term is added to, and the dot product of the input vectors elevated to the degree of the polynomial. ▪ The decision boundary of an SVM with a polynomial kernel might capture more intricate correlations between the input characteristics because it is a nonlinear hyperplane. ▪ The degree of nonlinearity in the decision boundary is determined by the degree of the polynomial 230
  • 227. Polynomial Kernel RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ The polynomial kernel has the benefit of being able to detect both linear and nonlinear correlations in the data. ▪ It can be difficult to select the proper degree of the polynomial, though, as a larger degree can result in overfitting while a lower degree cannot adequately represent the underlying relationships in the data. ▪ In general, the polynomial kernel is an effective tool for converting the input data into a higher-dimensional feature space in order to capture nonlinear correlations between the input characteristics. Gaussian (RBF) Kernel The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a popular kernel function used in machine learning, particularly in SVMs (Support Vector Machines). It is a nonlinear kernel function that maps the input data into a higher-dimensional feature space using a Gaussian function. The Gaussian kernel can be defined as: 1.K(x, y) = exp(-gamma * ||x - y||^2) 231
  • 228. Gaussian (RBF) Kernel RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning K(x, y) = exp(-gamma * ||x - y||^2) ▪ One advantage of the Gaussian kernel is its ability to capture complex relationships in the data without the need for explicit feature engineering. ▪ However, the choice of the gamma parameter can be challenging, as a smaller value may result in under fitting, while a larger value may result in over fitting. 232
  • 229. Laplace Kernel RJEs: Remote job entry points https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning ▪ The Laplacian kernel, also known as the Laplace kernel or the exponential kernel, is a type of kernel function used in machine learning, including in SVMs (Support Vector Machines). It is a non-parametric kernel that can be used to measure the similarity or distance between two input feature vectors. ▪ The Laplacian kernel can be defined as: K(x, y) = exp(-gamma * ||x - y||) ▪ Where x and y are the input feature vectors, gamma is a parameter that controls the width of the Laplacian function, and ||x - y|| is the L1 norm or Manhattan distance between the input vectors. ▪ When using a Laplacian kernel in an SVM, the decision boundary is a nonlinear hyperplane that can capture complex relationships between the input features. The width of the Laplacian function, controlled by the gamma parameter, determines the degree of nonlinearity in the decision boundary. ▪ One advantage of the Laplacian kernel is its robustness to outliers, as it places less weight on large distances between the input vectors than the Gaussian kernel. However, like the Gaussian kernel, choosing the correct value of the gamma parameter can be challenging. 233
  • 231. Reinforcement Learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 237 ▪ What is Reinforcement Learning? ▪ Terms used in Reinforcement Learning. ▪ Key features of Reinforcement Learning. ▪ Elements of Reinforcement Learning. ▪ Approaches to implementing Reinforcement Learning. ▪ How does Reinforcement Learning Work? ▪ The Bellman Equation. ▪ Types of Reinforcement Learning. ▪ Reinforcement Learning Algorithm. ▪ Markov Decision Process. ▪ What is Q-Learning? ▪ Difference between Supervised Learning and Reinforcement Learning. ▪ Applications of Reinforcement Learning. ▪ Conclusion.
  • 232. Reinforcement Learning Tutorial RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 238 ▪ Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. ▪ In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data, unlike supervised learning. ▪ Since there is no labeled data, so the agent is bound to learn by its experience only.
  • 233. Reinforcement Learning Tutorial RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 239 ▪ RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such as game-playing, robotics, etc. ▪ The agent interacts with the environment and explores it by itself. ▪ The primary goal of an agent in reinforcement learning is to improve the performance by getting the maximum positive rewards.
  • 234. Reinforcement Learning Tutorial RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 240 ▪ The agent learns with the process of hit and trial, and based on the experience, it learns to perform the task. Hence, we can say that "Reinforcement learning is a type of machine learning method where an intelligent agent (computer program) interacts with the environment and learns to act within that." ▪ It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement learning. Here we do not need to pre-program the agent, as it learns from its own experience without any human intervention. ▪ Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the diamond. The agent interacts with the environment by performing some actions, and based on those actions, the state of the agent gets changed, and it also receives a reward or penalty as feedback.
  • 235. Reinforcement Learning Tutorial RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 241 ▪ The agent continues doing these three things (take action, change state/remain in the same state, and get feedback), and by doing these actions, he learns and explores the environment. ▪ The agent learns that what actions lead to positive feedback or rewards and what actions lead to negative feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative point.
  • 236. Terms used in Reinforcement Learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 242 ▪ Agent(): An entity that can perceive/explore the environment and act upon it. ▪ Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the stochastic environment, which means it is random in nature. ▪ Action(): Actions are the moves taken by an agent within the environment. ▪ State(): State is a situation returned by the environment after each action taken by the agent. ▪ Reward(): A feedback returned to the agent from the environment to evaluate the action. ▪ Policy(): Policy is a strategy applied by the agent for the next action based on the current state. ▪ Value(): It is expected long-term retuned with the discount factor and opposite to the short-term reward. ▪ Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action.
  • 237. Key Features of Reinforcement Learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 243 ▪ In RL, the agent is not instructed about the environment and what actions need to be taken. ▪ It is based on the hit and trial process. ▪ The agent takes the next action and changes states according to the feedback of the previous action. ▪ The agent may get a delayed reward. ▪ The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive rewards.
  • 238. Approaches to implement Reinforcement Learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 244 ▪ Value-based: The value-based approach is about to find the optimal value function, which is the maximum value at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy π. ▪ Policy-based: Policy-based approach is to find the optimal policy for the maximum future rewards without using the value function. In this approach, the agent tries to apply such a policy that the action performed in each step helps to maximize the future reward. ▪ The policy-based approach has mainly two types of policy: ▪ Deterministic: The same action is produced by the policy (π) at any state. ▪ Stochastic: In this policy, probability determines the produced action. ▪ Model-based: In the model-based approach, a virtual model is created for the environment, and the agent explores that environment to learn it. There is no particular solution or algorithm for this approach because the model representation is different for each environment.
  • 239. Elements of Reinforcement Learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 245 1. Policy 2. Reward Signal 3. Value Function 4. Model of the environment
  • 240. Policy RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 246 ▪ A policy can be defined as a way how an agent behaves at a given time. ▪ It maps the perceived states of the environment to the actions taken on those states. ▪ A policy is the core element of the RL as it alone can define the behavior of the agent. ▪ In some cases, it may be a simple function or a lookup table, whereas, for other cases, it may involve general computation as a search process. ▪ It could be deterministic or a stochastic policy: For deterministic policy: a = π(s) For stochastic policy: π(a | s) = P[At =a | St = s]
  • 241. Reward Signal RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 247 ▪ The goal of reinforcement learning is defined by the reward signal. ▪ At each state, the environment sends an immediate signal to the learning agent, and this signal is known as a reward signal. ▪ These rewards are given according to the good and bad actions taken by the agent. ▪ The agent's main objective is to maximize the total number of rewards for good actions. ▪ The reward signal can change the policy, such as if an action selected by the agent leads to low reward, then the policy may change to select other actions in the future.
  • 242. Value Function RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 248 ▪ The value function gives information about how good the situation and action are and how much reward an agent can expect. ▪ A reward indicates the immediate signal for each good and bad action, whereas a value function specifies the good state and action for the future. ▪ The value function depends on the reward as, without reward, there could be no value. The goal of estimating values is to achieve more rewards.
  • 243. Model RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 249 ▪ Model mimics the behavior of the environment. With the help of the model, one can make inferences about how the environment will behave. Such as, if a state and an action are given, then a model can predict the next state and reward. ▪ The model is used for planning, which means it provides a way to take a course of action by considering all future situations before actually experiencing those situations. The approaches for solving the RL problems with the help of the model are termed as the model-based approach. Comparatively, an approach without using a model is called a model-free approach.
  • 244. How does Reinforcement Learning Work? RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 250 ▪ To understand the working process of the RL, we need to consider two main things: ▪ Environment: It can be anything such as a room, maze, football ground, etc. ▪ Agent: An intelligent agent such as AI robot. Let's take an example of a maze environment that the agent needs to explore.
  • 245. How does Reinforcement Learning Work? RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 251 ▪ In the above image, the agent is at the very first block of the maze. The maze is consisting of an S6 block, which is a wall, S8 a fire pit, and S4 a diamond block. ▪ The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then get the +1 reward; if it reaches the fire pit, then gets -1 reward point. It can take four actions: move up, move down, move left, and move right. ▪ The agent can take any path to reach to the final point, but he needs to make it in possible fewer steps. Suppose the agent considers the path S9-S5- S1-S2-S3, so he will get the +1-reward point. ▪ The agent will try to remember the preceding steps that it has taken to reach the final step. To memorize the steps, it assigns 1 value to each previous step.
  • 246. How does Reinforcement Learning Work? RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 252 ▪ Now, the agent has successfully stored the previous steps assigning the 1 value to each previous block. ▪ But what will the agent do if he starts moving from the block, which has 1 value block on both sides? ▪ It will be a difficult condition for the agent whether he should go up or down as each block has the same value. So, the above approach is not suitable for the agent to reach the destination. Hence to solve the problem, we will use the Bellman equation, which is the main concept behind reinforcement learning.
  • 247. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 253 ▪ The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the year 1953, and hence it is called as a Bellman equation. It is associated with dynamic programming and used to calculate the values of a decision problem at a certain point by including the values of previous states. ▪ It is a way of calculating the value functions in dynamic programming or environment that leads to modern reinforcement learning. ▪ The key-elements used in Bellman equations are: ▪ Action performed by the agent is referred to as "a" ▪ State occurred by performing the action is "s." ▪ The reward/feedback obtained for each good and bad action is "R." ▪ A discount factor is Gamma "γ.“ ▪ The Bellman equation can be written as: V(s) = max [R(s,a) + γV(s`)]
  • 248. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 254 ▪ The Bellman equation can be written as: V(s) = max [R(s,a) + γV(s`)] Where, ▪ V(s)= value calculated at a particular point. ▪ R(s,a) = Reward at a particular state s by performing an action. ▪ γ = Discount factor ▪ V(s`) = The value at the previous state. ▪ In the above equation, we are taking the max of the complete values because the agent tries to find the optimal solution always. ▪ So now, using the Bellman equation, we will find value at each state of the given environment. We will start from the block, which is next to the target block.
  • 249. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 255 ▪ For 1st block: ▪ V(s3) = max [R(s,a) + γV(s`)], ▪ here V(s')= 0 ▪ because there is no further state to move. ▪ V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.
  • 250. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 256 ▪ For 2nd block: ▪ V(s2) = max [R(s,a) + γV(s`)], ▪ here γ= 0.9 (lets), V(s')= 1, and R(s, a)= 0, ▪ because there is no reward at this state. ▪ V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9
  • 251. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 257 ▪ For 3rd block: ▪ V(s1) = max [R(s,a) + γV(s`)], ▪ here γ= 0.9 (lets), ▪ V(s')= 0.9, and R(s, a)= 0, because there is no reward at this state also. ▪ V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81
  • 252. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 258 ▪ For 4th block: ▪ V(s5) = max [R(s,a) + γV(s`)], ▪ here γ= 0.9 (lets), ▪ V(s')= 0.81, and ▪ R(s, a) = 0, because there is no reward at this state also. ▪ V(s5)= max[0.9(0.81)]=> V(s5)= max[0.9*0.81]=> V(s5) =0.73
  • 253. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 259 ▪ For 5th block: ▪ V(s9) = max [R(s,a) + γV(s`)], ▪ here γ= 0.9(lets), ▪ V(s')= 0.73, and R(s, a)= 0, ▪ because there is no reward at this state also. ▪ V(s9)= max[0.9(0.73)]=> V(s4)= max[0.9*0.73]=> V(s4) =0.66
  • 254. The Bellman Equation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 260 ▪ Now, the agent has three options to move ▪ if he moves to the blue box, then he will feel a bump if he moves to the fire pit, then he will get the -1 reward. ▪ But here we are taking only positive rewards, so for this, he will move to upwards only. ▪ The complete block values will be calculated using this formula.
  • 255. Types of Reinforcement learning 261
  • 256. Types of Reinforcement learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 262 ▪ There are mainly two types of reinforcement learning, which are: ▪ Positive Reinforcement ▪ Negative Reinforcement
  • 257. Positive Reinforcement RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning, https://www.verywellmind.com/what-is-positive-reinforcement-2795412 263 ▪ The positive reinforcement learning means adding something to increase the tendency that expected behavior would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior. ▪ This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may lead to an overload of states that can reduce the consequences.
  • 258. Negative Reinforcement RJEs: Remote job entry points 264 ▪ The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that the specific behavior will occur again by avoiding the negative condition. ▪ It can be more effective than the positive reinforcement depending on situation and behavior, but it provides reinforcement only to meet minimum behavior. https://www.javatpoint.com/reinforcement-learning, https://www.parentingforbrain.com/negative-reinforcement/
  • 260. How to represent the agent state? RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 266 ▪ We can represent the agent state using the Markov State that contains all the required information from the history. ▪ The State St is Markov state if it follows the given condition: P[St+1 | St ] = P[St +1 | S1,......, St] ▪ The Markov state follows the Markov property, which says that the future is independent of the past and can only be defined with the present. ▪ The RL works on fully observable environments, where the agent can observe the environment and act for the new state. The complete process is known as Markov Decision process
  • 261. Markov Decision Process RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 267 ▪ Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the environment is completely observable, then its dynamic can be modeled as a Markov Process. ▪ In MDP, the agent constantly interacts with the environment and performs actions; at each action, the environment responds and generates a new state. ▪ MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using MDP.
  • 262. Markov Decision Process RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 268 ▪ MDP contains a tuple of four elements (S, A, Pa, Ra): ▪ A set of finite States S ▪ A set of finite Actions A ▪ R - Rewards received after transitioning from state S to state S', due to action a. ▪ Probability Pa. ▪ MDP uses Markov property
  • 263. Markov Property RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 269 ▪ It says that "If the agent is present in the current state S1, performs an action a1 and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states.“ ▪ Or, in other words, as per Markov Property, the current state transition does not depend on any past action or state. ▪ Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess game, the players only focus on the current state and do not need to remember past actions or states.
  • 264. Finite MDP RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 270 ▪ A finite MDP is when there are finite states, finite rewards, and finite actions. ▪ In RL, we consider only the finite MDP.
  • 265. Markov Process RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 271 ▪ Markov Process is a memoryless process with a sequence of random states S1, S2, ....., St that uses the Markov Property. ▪ Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition function P. ▪ These two components (S and P) can define the dynamics of the system.
  • 267. Q-Learning: RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 273 ▪ Q-learning is an Off policy RL algorithm, which is used for the temporal difference Learning. ▪ The temporal difference learning methods are the way of comparing temporally successive predictions. ▪ It learns the value function Q (S, a), which means how good to take action "a" at a particular state "s.“ ▪ The below flowchart explains the working of Q- learning
  • 268. State Action Reward State Action (SARSA) RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 274 ▪ SARSA stands for State Action Reward State Action, which is an on-policy temporal difference learning method. The on-policy control method selects the action for each state while learning using a specific policy. ▪ The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and all pairs of (s-a). ▪ The main difference between Q-learning and SARSA algorithms is that unlike Q-learning, the maximum reward for the next state is not required for updating the Q-value in the table. ▪ In SARSA, new action and reward are selected using the same policy, which has determined the original action. ▪ The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where, s: Original state a: Original action r: reward observed while following the states s' and a': New state, action pair.
  • 269. Deep Q Neural Network (DQN) RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 275 ▪ As the name suggests, DQN is a Q-learning using Neural networks. ▪ For a big state space environment, it will be a challenging and complex task to define and update a Q-table. ▪ To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table, neural network approximates the Q-values for each action and state.
  • 270. Q-Learning Explanation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 276 ▪ Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation. ▪ The main objective of Q-learning is to learn the policy which can inform the agent that what actions should be taken for maximizing the reward under what circumstances. ▪ It is an off-policy RL that attempts to find the best action to take at a current state. ▪ The goal of the agent in Q-learning is to maximize the value of Q. ▪ The value of Q-learning can be derived from the Bellman equation. Consider the Bellman equation given below:
  • 271. Q-Learning Explanation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 277 ▪ In the equation, we have various components, including reward, discount factor (γ), probability, and end states s’. ▪ But there is no any Q-value is given ▪ In the image, we can see there is an agent who has three values options, V(s1), V(s2), V(s3). As this is MDP, so agent only cares for the current state and the future state. The agent can go to any direction (Up, Left, or Right), so he needs to decide where to go for the optimal path. Here agent will take a move as per probability bases and changes the state. But if we want some exact moves, so for this, we need to make some changes in terms of Q-value.
  • 272. Q-Learning Explanation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 278 ▪ Q - represents the quality of the actions at each state. ▪ So instead of using a value at each state, we will use a pair of state and action, i.e., Q(s, a). ▪ Q-value specifies that which action is better than others, and according to the best Q-value, the agent takes his next move. The Bellman equation can be used for deriving the Q-value. ▪ To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q -value equation will be: ▪ Hence, we can say that, V(s) = max [Q(s, a)]
  • 273. Q-Learning Explanation RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 279 ▪ The Q stands for quality in Q-learning, which means it specifies the quality of an action taken by the agent.
  • 274. Q-table RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 280 ▪ A Q-table or matrix is created while performing the Q-learning. ▪ The table follows the state and action pair, i.e., [s, a], and initializes the values to zero. ▪ After each action, the table is updated, and the q-values are stored within the table. ▪ The RL agent uses this Q-table as a reference table to select the best action based on the q-values.
  • 275. Difference Between Reinforcement Learning and Supervised Learning RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 281 Reinforcement Learning Supervised Learning ▪ RL works by interacting with the environment. ▪ Supervised learning works on the existing dataset. ▪ The RL algorithm works like the human brain works when making some decisions. ▪ Supervised Learning works as when a human learns things in the supervision of a guide. ▪ There is no labeled dataset is present ▪ The labeled dataset is present. ▪ No previous training is provided to the learning agent. ▪ Training is provided to the algorithm so that it can predict the output. ▪ RL helps to take decisions sequentially. ▪ In Supervised learning, decisions are made when input is given.
  • 276. Reinforcement Learning RJEs: Remote job entry points ▪ There are various applications based on the concept of RL. Ref: [1]https://www.javatpoint.com/reinforcement-learning; [2]https://medium.com/@yuxili/rl-applications-73ef685c07eb [1] [2] 282
  • 278. Gaussian Mixture Model (GMM) RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 284 ▪ k-means exploits only mean of the cluster or distribution as representation for class-specific information. ▪ Second order moment like variance also contains class-specific information. ▪ Gaussian distribution can exploit both mean and variance. ▪ In case of scalar it is univariate and in case of vector it is multivariate Gaussian distribution.
  • 279. Univariate vs Multivariate Gaussian Distribution RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 285
  • 280. Univariate vs Multivariate Gaussian Distribution RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 286
  • 281. Univariate vs Multivariate Gaussian Distribution RJEs: Remote job entry points https://www.javatpoint.com/reinforcement-learning 287
  • 282. Clustering using Multivariate Gaussian Distribution RJEs: Remote job entry points 288
  • 283. Gaussian Mixture Model (GMM) RJEs: Remote job entry points 289
  • 284. Expectation-Maximization (EM) Algorithm RJEs: Remote job entry points 290
  • 285. Implementation of EM Algorithm RJEs: Remote job entry points 291
  • 286. Re-estimation in EM Algorithm RJEs: Remote job entry points 292
  • 287. Clustering using GMM RJEs: Remote job entry points 293
  • 288. What is Gaussian Mixture Model (GMMs)? RJEs: Remote job entry points 294 A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. One can think of a mixture model as a generalization of a k-means clustering algorithm, as it can be used for density estimation and classification. In a Gaussian mixture model, each cluster is associated with a multivariate Gaussian distribution, and the mixture model is a weighted sum of these distributions. The weights indicate the probability that a data point belongs to a particular cluster, and the Gaussian distributions describe the distribution of the data within each cluster. The parameters of a Gaussian mixture model can be estimated using the expectation-maximization (EM) algorithm. This involves alternating between estimating the parameters of the Gaussian distributions and the weights of the mixture model until convergence is reached.
  • 289. Univariate vs Multivariate Gaussian Distribution RJEs: Remote job entry points 295 https://www.shiksha.com/online- courses/articles/understanding-gaussian-mixture-models/ The above example code generates a dataset X which contains 200 samples drawn from two 2D Gaussian distributions which have different means. The Gaussian mixture model is then fit to the data, with n_components=2 indicating that there are two mixture components (i.e., two clusters). The covariance_type parameter specifies the type of covariance matrix to use for the Gaussian distributions. In the above example, the covariance_type value is ‘full’. Once the model is fit, the prediction method can be used to predict the cluster labels for the data points in X. The resulting cluster labels are stored in the predictions array.
  • 290. Univariate vs Multivariate Gaussian Distribution RJEs: Remote job entry points 296 To plot the data and the predicted cluster labels, the matplotlib is used, as follows: The above output is a scattered plot of data, having points coloured according to their predicted cluster label.
  • 291. Real-Life Examples of Gaussian mixture models RJEs: Remote job entry points 297 ▪ Gaussian mixture models (GMMs) as already stated above are statistical models that can be used to represent the probability distribution of a multi-dimensional continuous variable as a weighted sum of multiple multivariate normal distributions. GMMs are often used in a variety of applications, including clustering, density estimation, and anomaly detection. Here are a few examples of how GMMs could be used in real life: ▪ Clustering: GMMs can be used to identify patterns and group similar observations together. For example, a GMM could be used to cluster customers into different segments based on their purchase history and demographic data. ▪ Density estimation: GMMs can be used to estimate the probability density function (PDF) of a given dataset. This can be useful for tasks such as density-based anomaly detection, where GMMs can be used to identify observations that are significantly different from the rest of the data. ▪ Anomaly detection: GMMs can be used to detect anomalous observations in a dataset. For example, a GMM could be trained on normal network traffic data, and then used to identify unusual traffic patterns that may indicate an intrusion attempt. ▪ Computer vision: GMMs can be used in computer vision applications to model the appearance of objects in an image. For example, a GMM could be used to model the appearance of different types of vehicles in a traffic surveillance system.
  • 292. Advantages of Gaussian Mixture Models RJEs: Remote job entry points 298 ▪ Flexibility- Gaussian Mixture Models have the ability to model a wide range of probability distributions, as they can approximate any distribution that can be represented as a weighted sum of multiple normal distributions. Hence, very flexible in nature. ▪ Robustness- Gaussian Mixture Models are relatively robust to the outliers which are present in the data, as they can accommodate the presence of multiple modes called “peaks” in the distribution. ▪ Speed- Gaussian Mixture Models are relatively fast to fit a dataset, especially when using an efficient optimization algorithm such as the expectation-maximization (EM) algorithm. ▪ To Handle Missing Data- Gaussian Mixture Models have the ability to handle missing data by marginalizing the missing variables, which can be useful in situations where some observations are incomplete. ▪ Interpretability- The parameters of a Gaussian Mixture Model (i.e., the weights, means, and covariances of the components) have a clear interpretation, which can be useful for understanding the underlying structure of the data.
  • 293. Disadvantages of Gaussian Mixture Models RJEs: Remote job entry points 299 •Sensitivity To Initialization- Gaussian Mixture Models can be sensitive to the initial values of the model parameters, especially when there are too many components in the mixture. This can sometimes lead to poor convergence to the true maximum likelihood solution. •Assumption Of Normality- Gaussian Mixture Models assume that the data are generated from a mixture of normal distributions, which may not always be the case in practice. If the data deviate significantly from normality, GMMs may not be the most appropriate model. •Number Of Components- Choosing the appropriate number of components in a Gaussian Mixture Model can be challenging, as adding too many components may overfit the data, while using too few components may underfit the data. The extremes of both points result in a challenging task, which becomes tough to be handled. •High-dimensional data- Gaussian Mixture Models can be computationally expensive to fit when working with high-dimensional data, as the number of model parameters increases quadratically with the number of dimensions. •Limited expressive power- Gaussian Mixture Models can only represent distributions that can be expressed as a weighted sum of normal distributions. This means that they may not be suitable for modelling more complex distributions.
  • 294. Hidden Markov Model in Machine Learning
  • 295. Hidden Markov Model in Machine Learning RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning ▪ Hidden Markov Models (HMMs) are a type of probabilistic model that are commonly used in machine learning for tasks such as ▪ Speech recognition ▪ Natural language processing ▪ Bioinformatics ▪ They are a popular choice for modelling sequences of data because they can effectively capture the underlying structure of the data, even when the data is noisy or incomplete.
  • 296. What are Hidden Markov Models? RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning ▪ A Hidden Markov Model (HMM) is a probabilistic model that consists of a sequence of hidden states, each of which generates an observation. The hidden states are usually not directly observable, and the goal of HMM is to estimate the sequence of hidden states based on a sequence of observations. An HMM is defined by the following components: ▪ A set of N hidden states, S = {s1, s2, ..., sN}. ▪ A set of M observations, O = {o1, o2, ..., oM}. ▪ An initial state probability distribution, ? = {?1, ?2, ..., ?N}, which specifies the probability of starting in each hidden state. ▪ A transition probability matrix, A = [aij], defines the probability of moving from one hidden state to another. ▪ An emission probability matrix, B = [bjk], defines the probability of emitting an observation from a given hidden state. ▪ The basic idea behind an HMM is that the hidden states generate the observations, and the observed data is used to estimate the hidden state sequence. This is often referred to as the forward-backwards algorithm.
  • 297. Applications of Hidden Markov Models RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning ▪ Speech Recognition One of the most well-known applications of HMMs is speech recognition. In this field, HMMs are used to model the different sounds and phones that makeup speech. The hidden states, in this case, correspond to the different sounds or phones, and the observations are the acoustic signals that are generated by the speech. The goal is to estimate the hidden state sequence, which corresponds to the transcription of the speech, based on the observed acoustic signals. HMMs are particularly well-suited for speech recognition because they can effectively capture the underlying structure of the speech, even when the data is noisy or incomplete. In speech recognition systems, the HMMs are usually trained on large datasets of speech signals, and the estimated parameters of the HMMs are used to transcribe speech in real time. ▪ Natural Language Processing Another important application of HMMs is natural language processing. In this field, HMMs are used for tasks such as part-of-speech tagging, named entity recognition, and text classification. In these applications, the hidden states are typically associated with the underlying grammar or structure of the text, while the observations are the words in the text. The goal is to estimate the hidden state sequence, which corresponds to the structure or meaning of the text, based on the observed words. HMMs are useful in natural language processing because they can effectively capture the underlying structure of the text, even when the data is noisy or ambiguous. In natural language processing systems, the HMMs are usually trained on large datasets of text, and the estimated parameters of the HMMs are used to perform various NLP tasks, such as text classification, part-of-speech tagging, and named entity recognition.
  • 298. Applications of Hidden Markov Models RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning ▪ Bioinformatics HMMs are also widely used in bioinformatics, where they are used to model sequences of DNA, RNA, and proteins. The hidden states, in this case, correspond to the different types of residues, while the observations are the sequences of residues. The goal is to estimate the hidden state sequence, which corresponds to the underlying structure of the molecule, based on the observed sequences of residues. HMMs are useful in bioinformatics because they can effectively capture the underlying structure of the molecule, even when the data is noisy or incomplete. In bioinformatics systems, the HMMs are usually trained on large datasets of molecular sequences, and the estimated parameters of the HMMs are used to predict the structure or function of new molecular sequences. ▪ Finance Finally, HMMs have also been used in finance, where they are used to model stock prices, interest rates, and currency exchange rates. In these applications, the hidden states correspond to different economic states, such as bull and bear markets, while the observations are the stock prices, interest rates, or exchange rates. The goal is to estimate the hidden state sequence, which corresponds to the underlying economic state, based on the observed prices, rates, or exchange rates. HMMs are useful in finance because they can effectively capture the underlying economic state, even when the data is noisy or incomplete. In finance systems, the HMMs are usually trained on large datasets of financial data, and the estimated parameters of the HMMs are used to make predictions about future market trends or to develop investment strategies.
  • 299. Limitations of Hidden Markov Models RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning ▪ Limited Modeling Capabilities One of the key limitations of HMMs is that they are relatively limited in their modelling capabilities. HMMs are designed to model sequences of data, where the underlying structure of the data is represented by a set of hidden states. However, the structure of the data can be quite complex, and the simple structure of HMMs may not be enough to accurately capture all the details. For example, in speech recognition, the complex relationship between the speech sounds and the corresponding acoustic signals may not be fully captured by the simple structure of an HMM. ▪ Overfitting Another limitation of HMMs is that they can be prone to overfitting, especially when the number of hidden states is large or the amount of training data is limited. Overfitting occurs when the model fits the training data too well and is unable to generalize to new data. This can lead to poor performance when the model is applied to real-world data and can result in high error rates. To avoid overfitting, it is important to carefully choose the number of hidden states and to use appropriate regularization techniques.
  • 300. Limitations of Hidden Markov Models RJEs: Remote job entry points https://www.javatpoint.com/hidden-markov-model-in-machine-learning ▪ Lack of Robustness HMMs are also limited in their robustness to noise and variability in the data. For example, in speech recognition, the acoustic signals generated by speech can be subjected to a variety of distortions and noise, which can make it difficult for the HMM to accurately estimate the underlying structure of the data. In some cases, these distortions and noise can cause the HMM to make incorrect decisions, which can result in poor performance. To address these limitations, it is often necessary to use additional processing and filtering techniques, such as noise reduction and normalization, to pre-process the data before it is fed into the HMM. ▪ Computational Complexity Finally, HMMs can also be limited by their computational complexity, especially when dealing with large amounts of data or when using complex models. The computational complexity of HMMs is due to the need to estimate the parameters of the model and to compute the likelihood of the data given in the model. This can be time-consuming and computationally expensive, especially for large models or for data that is sampled at a high frequency. To address this limitation, it is often necessary to use parallel computing techniques or to use approximations that reduce the computational complexity of the model.
  • 301. Naïve Bayes Classifier RJEs: Remote job entry points ▪ Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem[1] ▪ It is mainly used in text classification that includes a high-dimensional training dataset[2] ▪ It is a probabilistic classifier, which means it predicts on the basis of the probability of an object ▪ Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. Another strong assumption it does is, that all features are equal i.e., are given the same weight/importance. ▪ Bayes: It is based on Bayes’ Theorem so is called Bayes. Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. It is mathematically given as: 𝑃 𝐴 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴 𝑃 𝐵 where, 𝑃 𝐴 𝐵 is Posterior Probability, P(A) is Prior Probability 𝑃( 𝐵 𝐴 ) is Likelihood Probability, P(B) is Marginal Probability Ref: [1] https://www.geeksforgeeks.org/naive-bayes-classifiers/?ref=leftbar-rightbar [2] https://www.javatpoint.com/machine-learning-naive-bayes-classifier 307
  • 302. Naïve Bayes Classifier RJEs: Remote job entry points Ref: [1] https://www.geeksforgeeks.org/naive-bayes-classifiers/?ref=leftbar-rightbar [2] https://www.tutorialspoint.com/machine_learning_with_python/classification_algorithms_naive_bayes.htm, [3] https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ ▪ There are primarily three types of Naïve Bayes Classifiers: ▪ Gaussian Naïve Bayes – In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. The likelihood of the features is assumed to be Gaussian[1] ▪ Multinomial Naïve Bayes – In this features are assumed to be drawn from a simple Multinomial distribution. Such kinds of Naïve Bayes are most appropriate for the features that represent discrete counts[2] ▪ Bernoulli Naïve Bayes – Here the features are assumed to be binary (0s and 1s). Text classification with ‘bag of words’ model can be an application of Bernoulli Naïve Bayes ▪ The adjacent figure, shows an example of a Naïve Bayes classifier of classifying the probability of play or no play based on likelihood estimation [3] 308
  • 303. Naïve Bayes Classifier RJEs: Remote job entry points ▪ Advantages of Naïve Bayes Classifier ▪ Fast and easy ML algorithms to predict a class of datasets ▪ It can be used for Binary as well as Multi-class Classifications ▪ It is the most popular choice for text classification problems ▪ Disadvantage of Naïve Bayes Classifier ▪ Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features ▪ Applications of Naïve Bayes Classifier ▪ It is used for Credit Scoring ▪ It is used in medical data classification ▪ It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner ▪ It is used in Text classification such as Spam filtering and Sentiment analysis Ref: https://www.javatpoint.com/machine-learning-naive-bayes-classifier 309
  • 304. Ensemble Classifiers RJEs: Remote job entry points ▪ Ensemble learning helps improve machine learning results by combining several models[1] ▪ Better predictive performance compared to a single model ▪ Ensemble overcomes three problems: ▪ Statistical Problems: when the hypothesis space is too large for the amount of available data ▪ Computational Problems: when the learning algorithm cannot guarantee finding the best hypothesis ▪ Representational Problems: The Representational Problem arises when the hypothesis space does not contain any good approximation of the target class(es) ▪ The main challenge with ensemble methods is to obtain base models which make different kinds of errors ▪ The three main classes of ensemble learning methods are bagging, stacking, and boosting[2] ▪ Bagging involves fitting many decision trees on different samples of the same dataset and averaging the predictions ▪ Stacking involves fitting many different models types on the same data and using another model to learn how to best combine the predictions Ref: [1] https://www.geeksforgeeks.org/ensemble-classifier-data-mining/ [2] https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ 310