1. NADAR SARASWATHI
COLLEGE OF ARTS
AND SCIENCE
SUBJECT: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
TOPIC: SUPPORT VECTOR MACHINES(SVM),
NAÏVE BAYES CLASSIFICATION
R. Girisakthi
II M.Sc Computer science
2. SUPPORT VECTOR MACHINE(SVM)
ALGORITHM
Support Vector Machine (SVM) is a supervised
machine learning algorithm used for
classification and regression tasks. While it can
handle regression problems, SVM is
particularly well-suited for classification tasks.
SVM aims to find the optimal hyper plane in
an N-dimensional space to separate data points
into different classes. The algorithm maximizes
the margin between the closest points of
different classes
3. SUPPORT VECTOR MACHINE(SVM)
TERMINOLOGY
Hyperplane: A decision boundary separating different
classes in feature space, represented by the
equation wx + b = 0 in linear classification.
Support Vectors: The closest data points to the
hyperplane, crucial for determining the hyperplane and
margin in SVM.
Margin: The distance between the hyperplane and the
support vectors. SVM aims to maximize this margin
for better classification performance.
Kernel: A function that maps data to a higher-
dimensional space, enabling SVM to handle non-
linearly separable data.
4. Hard Margin: A maximum-margin hyperplane that
perfectly separates the data without misclassifications.
Soft Margin: Allows some misclassifications by
introducing slack variables, balancing margin
maximization and misclassification penalties when data
is not perfectly separable.
C: A regularization term balancing margin
maximization and misclassification penalties. A higher
C value enforces a stricter penalty for
misclassifications.
Hinge Loss: A loss function penalizing misclassified
points or margin violations, combined with
regularization in SVM.
Dual Problem: Involves solving for Lagrange
multipliers associated with support vectors, facilitating
the kernel trick and efficient computation.
5. SUPPORT VECTOR MACHINE(SVM)
ALGORITHM WORK
The key idea behind the SVM algorithm is to find the
hyperplane that best separates two classes by
maximizing the margin between them. This margin is
the distance from the hyperplane to the nearest data
points (support vectors) on each side.
6. The best hyperplane, also known as the “hard
margin,” is the one that maximizes the distance
between the hyperplane and the nearest data points
from both classes. This ensures a clear separation
between the classes. So, from the above figure, we
choose L2 as hard margin.
7. MATHEMATICAL
COMPUTATION(SVM)
Consider a binary classification problem with two classes,
labeled as +1 and -1. We have a training dataset consisting of
input feature vectors X and their corresponding class labels Y.
The equation for the linear hyperplane can be written as:
wTx+b=0
Where:
ww is the normal vector to the hyperplane (the direction
perpendicular to it).
bb is the offset or bias term, representing the distance of the
hyperplane from the origin along the normal vector ww.
8. NAÏVE BAYES CLASSIFICATION
Naive Bayes classifiers are supervised machine
learning algorithms used for classification tasks, based
on Bayes’ Theorem to find probabilities. This article
will give you an overview as well as more advanced
use and implementation of Naive Bayes in machine
learning.
Key Features of Naive Bayes Classifiers
The main idea behind the Naive Bayes classifier is to
use Bayes’ Theorem to classify data based on the
probabilities of different classes given the features of
the data. It is used mostly in high-dimensional text
classification
9. The Naive Bayes Classifier is a simple probabilistic
classifier and it has very few number of parameters
which are used to build the ML models that can predict
at a faster speed than other classification algorithms.
It is a probabilistic classifier because it assumes that
one feature in the model is independent of existence of
another feature. In other words, each feature
contributes to the predictions with no relation between
each other.
Naïve Bayes Algorithm is used in spam filtration,
Sentimental analysis, classifying articles and many
more
10. UNDERSTANDING BAYES’ THEOREM
FOR NAÏVE BAYES
Bayes’ Theorem finds the probability of an event
occurring given the probability of another event that has
already occurred.
Bayes’ theorem is stated mathematically as the following
equation:
P(y X)=P(X y)P(y)P(X)
∣ ∣ P(y∣X)=P(X)P(X∣y)P(y)
where A and B are events and P(B) ≠ 0
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on
the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence
given that the probability of a hypothesis is true.X=(x1,x2,x3,
……,xn)
12. ADVANTAGES OF NAÏVE BAYES CLASSIFIER:
Easy to implement and computationally efficient.
Effective in cases with a large number of features.
Performs well even with limited training data.
It performs well in the presence of categorical features.
For numerical features data is assumed to come from
normal distributions
DISADVANTAGES OF NAÏVE BAYES CLASSIFIER:
Assumes that features are independent, which may not
always hold in real-world data.
Can be influenced by irrelevant attributes.
May assign zero probability to unseen events, leading
to poor generalization.