SlideShare a Scribd company logo
Decision Tree, Naive Bayes,
Association rule Mining, Support
Vector Machine, KNN, Kmeans
Clustering, Random Forest
Presented to
Prof. Vibhakar Mansotra
Dean of Mathematical Science, University of Jammu
Presented by
Akanksha Bali
Research Scholar,Batch-2019, University of Jammu
Contents
 Decision Tree
 Naive Bayes Classifier
 Support Vector Machine
 Association Rule Mining
 Apriori Algorithm
 K Nearest Neighbour
 K means Clustering
 Random forest
2
Decision Trees
 A decision tree is a flowchart-like tree structure where the data is continuously
split according to a certain parameter
 Each internal node(decision node) denotes a test on an
attribute.
 Each branch represents an outcome of the test.
 Here are two main types of decision trees:
Classification trees (yes/no types)
What we’ve seen above is an example of classification tree, where the
outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable
is categorical.
Regression trees (continuous data types)
Here the decision or the outcome variable is continuous, e.g. a number
like 12
3
4
Entropy
Entropy
Entropy, also called as shannon entropy is denoted by H(S) for a finite set S,
is the measure of the amount of uncertainty or randomness in data.
H(S) = ∑ p(x)log2p(x)
Information gain
Information gain is also called as kullback-leibler divergence denoted by
IG(S,A) for a set S is the effective change in entropy after deciding on a
particular attribute A. It measures the relative change in entropy with respect
to the independent variables.
IG(S,A) = H(S)-H(S,A)
IG(S,A) = H(S) - ∑P(x)*H(x)
Where IG(S, A) is the information gain by applying feature A. H(S) is the
entropy of the entire set, while the second term calculates the entropy after
applying the feature A, where p(x) is the probability of event x.
5
Top-Down Induction of Decision
Trees ID3
D3 Algorithm will perform following tasks recursively
1.Create root node for the tree
2.If all examples are positive, return leaf node ‘positive’
3.Else if all examples are negative, return leaf node ‘negative’
4.Calculate the entropy of current state H(S)
5.For each attribute, calculate the entropy with respect to the attribute
‘x’ denoted by H(S, x)
6. Calculate
7. Select the attribute which has maximum value of IG(S, x)
8. Remove the attribute that offers highest IG from the set of attributes
9. Repeat until we run out of all attributes, or the decision tree has all
leaf nodes.
6
Training Example
NoStrongHighMildRainD14
YesWeakNormalHotOvercastD13
YesStrongHighMildOvercastD12
YesStrongNormalMildSunnyD11
YesStrongNormalMildRainD10
YesWeakNormalColdSunnyD9
NoWeakHighMildSunnyD8
YesWeakNormalCoolOvercastD7
NoStrongNormalCoolRainD6
YesWeakNormalCoolRainD5
YesWeakHighMildRainD4
YesWeakHighHotOvercastD3
NoStrongHighHotSunnyD2
NoWeakHighHotSunnyD1
Play TennisWindHumidityTemp.OutlookDay
7
Selecting the Next Attribute
Humidity
High Normal
[3+, 4-] [6+, 1-]
S=[9+,5-]
E=0.940
Gain(S,Humidity)
=0.940-(7/14)*0.985
– (7/14)*0.592
=0.151
E=0.985 E=0.592
Wind
Weak Strong
[6+, 2-] [3+, 3-]
S=[9+,5-]
E=0.940
E=0.811 E=1.0
Gain(S,Wind)
=0.940-(8/14)*0.811
– (6/14)*1.0
=0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
8
Selecting the Next Attribute
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]
E=0.940
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
E=0.971 E=0.971
Over
cast
[4+, 0]
E=0.0
9
Selecting the Next Attribute
The information gain values for the 4 attributes
are:
• Gain(S,Outlook) =0.247
• Gain(S,Humidity) =0.151
• Gain(S,Wind) =0.048
• Gain(S,Temperature) =0.029
where S denotes the collection of training
examples
10
ID3 Algorithm
Outlook
Sunny Overcast Rain
Yes
[D1,D2,…,D14]
[9+,5-]
Ssunny =[D1,D2,D8,D9,D11]
[2+,3-]
? ?
[D3,D7,D12,D13]
[4+,0-]
[D4,D5,D6,D10,D14]
[3+,2-]
Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
11
ID3 Algorithm
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
12
Converting a Tree to Rules
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
R1: If (Outlook=Sunny) ∧ (Humidity=High) Then PlayTennis=No
R2: If (Outlook=Sunny) ∧ (Humidity=Normal) Then PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain) ∧ (Wind=Strong) Then PlayTennis=No
R5: If (Outlook=Rain) ∧ (Wind=Weak) Then PlayTennis=Yes
13
Overfitting
 One of the biggest problems with decision trees
is Overfitting
14
Avoid Overfitting
 stop growing when split not statistically
significant
 grow full tree, then post-prune
NAÏVE BAYES ALGORITHM
 The Bayesian Classification represents a supervised
learning method as well as a statistical method for
classification.
 It can solve diagnostic and predictive problems.
 It is based on the name of Thomas Bayes(1700-61).
 It works on the principle of comditional probability as
given by the bayes theorem.
15
Derivation
Derivation D : Set of tuples
Each Tuple is an ‘n’ dimensional attribute vector X :
(x1,x2,x3,…. xn)
Let there be ‘m’ Classes : C1,C2,C3…Cm
Maximum Posteriori Hypothesis
P(Ci/X) = P(X/Ci) P(Ci) / P(X) (bayes theorem)
16
Problem Statement
 Consider the given data set, apply naive bayes
algorithm and predict that if the fruit has the
folowing properties then which type of fruit it is
Fruit = { yellow, sweet, long}
Fruit Yellow Sweet Long Total
Orange 350 450 0 650
Banana 400 300 350 400
others 50 100 50 150
Total 800 850 400 1200
Problem
 Step 1: Compute the prior probabilities for each of the class of fruits:
 P(C=orange) = 650/1200 = 0.54
 P(C=banana) = 400/1200 = 0.33
 P(C=others) = 150/1200 = 0.125
 Step 2: Compute the probability of evidence
 P(X1=long) = 400/1200=0.33
 P(X2=sweet) = 850/1200 = 0.708
 P(X3=yellow) = 800/1200 = 0.66
 Step 3: Compute the probability of likelihood of evidences
 P(C=orange|X1=long) = 0/400 = 0
 P(C=orange|X2=sweet) = 450/850 = 0.52
 P(C=orange|X3=yellow) = 350/800 = 0.43
 P(C=Banana|X1=long) = 350/400 = 0.875
 P(C=Banana|X2=sweet) = 300/850 = 0.35
 P(C=Banana|X3=yellow) = 400/800 = 0.5
 P(C=others|X1=long) = 50/400 = 0.125
 P(C=others|X2=sweet) = 100/850 = 0.117
 P(C=others|X3=yellow) = 50/800 = 0.0625
18
Problem
 Step 5: Calculate posterior probability
 P(Yellow|Orange)=P(orange|Yellow)*P(yellow)
= (0.43*0.66)/0.5 = 0.5676
 P(Sweet|Orange) = 0.69
 P(Long|Orange) = 0
Step 6: P(fruit| Orange) = 0.56*0.69*0 = 0
In the Similar way P(fruit|banana)= 1*0.75 * 0.87 = 0.65
P(fruit|others) = 0.33*0.66*0.33 = 0.072
Step 7: Prediction :- type of fruit is Banana
19
P(Orange)
Association rule mining
 Association rule learning is a
rule-based machine learning method for discovering
interesting relations between variable
 Using association rule learning, the supermarket can
determine which products are frequently bought
together and use this information for marketing
purposes. This is sometimes referred to as market
basket analysis.
20
Market basket analysis
Association rule mining
Important concepts of Association Rule Mining:
The support supp(X) of an itemset is defined as the proportion of transactions in
the data set which contain the itemset. In the example database, the itemset
{milk,bread,butter}has a support of 1/5=0.2 since it occurs in 20% of all
transactions (1 out of 5 transactions).
The confidence of a rule is defined conf(X=>Y)= supp(XUY)/supp(X)
For example, the rule {butter, bread}=>{milk}
has a confidence of supp(butter,bread,milk}/support(butter,bread} = 0.2/0.2=1
in the database, which means that for 100% of the transactions containing butter
and bread the rule is correct (100% of the times a customer buys butter and bread,
milk is bought as well).
22
APIORI ALGORITHM
 The name of the algorithm is based on the fact that the
algorithm uses prior knowledge of frequent itemset properties.
 Apriori employs an iterative approach known as a level-wise
search, where k-itemsets are used to explore (k+1)-itemsets.
 First, the set of frequent 1-itemsets is found by scanning the
database to accumulate the count for each item, and collecting
those items that satisfy minimum support.
 The resulting set is denoted L1.
 Next, L1 is used to find L2, the set of frequent 2-itemsets, which
is used to find L3, and so on, until no more frequent k-itemsets
can be found.
 The finding of each Lk requires one full scan of the database.
Problem Statement
For the following given transaction dataset. Generate rules
using apriori algorithm. Consider the values as support =
50% and confidence = 50%
24
Transaction ID Items
Purchased
I1 A,B,C
I2 A,C
I3 A,D
I4 B,E,F
Problem Statement
 Step 1: Create table of Frequent itemset and calculate
support
25
items Frequency Support count
{A} 3 ¾=75%
{B} 2 2/4=50%
{C} 2 2/4=50%
{D} 1 ¼=25%
{E} 1 ¼=25%
{F} 1 ¼=25%
Problem Statement
 Step 2: Choose rows with support value is equal or
greater than 50%
26
items Frequency Support count
{A} 3 ¾=75%
{B} 2 2/4=50%
{C} 2 2/4=50%
Problem Statement
 Step 3: Create table of 2 item Frequent set and calculate
their frequency and support
27
items Frequency Support count
{A,B} 1 ¼ =25%
{A,C} 2 2/4 =50%
{B,C} 1 ¼ =25%
Problem Statement
 Step 4: Choose rows with support value is equal or
greater than 50%
 Formulate Final rules and calculate confidence
28
items Frequency Support count
{A,C} 2 2/4 =50%
Association
rules
supp confiden
ce
Conf%
A->C 2 2/3=.66 66%
C->A 2 2/2=1 100%
SUPPORT VECTOR
MACHINE
 Support Vector Machine” (SVM) is a supervised machine
learning algorithm which can be used for both classification or
regression challenges.
 Mostly used in classification problems.
 we perform classification by finding the hyper-plane
that differentiate the two classes very well
29
Identify the right hyper-plane
scenario 1: scenario 2:
30
scenario 3:
Identify the right hyper-plane
31
scenario 4:
Support vector machine
 Pros:
 It works really well with clear margin of separation
 It is effective in high dimensional spaces.
 It is effective in cases where number of dimensions is greater
than the number of samples.
 It uses a subset of training points in the decision function
(called support vectors), so it is also memory efficient.
 Cons:
 It doesn’t perform well, when we have large data set because
the required training time is higher
 It also doesn’t perform very well, when the data set has more
noise i.e. target classes are overlapping
32
K Nearest Neighbour
 Contents
 Introduction
 Closeness
 Algorithm
 Example
33
K Nearest Neighbour
• K-Nearest Neighbors is one of the most basic yet
essential classification algorithms in Machine Learning.
It belongs to the supervised learning domain and finds
intense application in pattern recognition, data mining
and intrusion detection.
• It was first described in the early 1950s.
• Gained popularity, when increased computing power
became available.
• Used widely in area of pattern recognition and
statistical estimation.
34
Closeness
 The Euclidean distance between two points
or tuples, say,
X1 = (x11,x12,...,x1n) and X2 =(x21,x22,...,x2n),is
35
KNN Classifier Algorithm
36
.
Example
•  We have data from the questionnaires survey and objective
testing with two attributes (acid durability and strength) to classify
whether a special paper tissue is good or not. Here are four training
samples :
X1 = Acid Durability
(seconds)
X2 = Strength
(kg/square meter)
Y = Classification
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a new paper tissue that passes the
laboratory test with X1 = 3 and X2 = 7. Guess the classification
of this new tissue.
 Step 1 : Initialize and Define k.
Lets say, k = 3
(Always choose k as an odd number if the number of attributes is even to avoid
a tie in the class prediction)
 Step 2 : Compute the distance between input sample and
trainingsample
- Co-ordinate of the input sample is (3,7).
- Instead of calculating the Euclidean distance, we
calculate the Squared Euclidean distance.
X1 = Acid Durability
(seconds)
X2 = Strength
(kg/square meter)
Squared Euclidean distance
7 7 (7-3)2 + (7-7)2 = 16
7 4 (7-3)2 + (4-7)2 = 25
3 4 (3-3)2 + (4-7)2 = 09
1 4 (1-3)2 + (4-7)2 = 13
 Step 3 : Sort the distance and determine the nearest
neighbours based of the Kth minimum distance :
X1 = Acid
Durability
(seconds)
X2 = Strength
(kg/square
meter)
Squared
Euclidean
distance
Rank
minimum
distance
Is it included
in 3-Nearest
Neighbour?
7 7 16 3 Yes
7 4 25 4 No
3 4 09 1 Yes
1 4 13 2 Yes
Example
Step 4 : Take 3-Nearest Neighbours:
Gather the category Y of the nearest neighbours.
X1 = Acid
Durability
(seconds)
X2 =
Strength
(kg/square
meter)
Squared
Euclidean
distance
Rank
minimum
distance
Is it
included in
3-Nearest
Neighbour?
Y =
Categor
y of the
nearest
neighbo
ur
7 7 16 3 Yes Bad
7 4 25 4 No -
3 4 09 1 Yes Good
1 4 13 2 Yes Good
Example
Step 5 : Apply simple majority
Use simple majority of the category of the nearest
neighbours as the prediction value of the query
instance.
We have 2 “good” and 1 “bad”. Thus we conclude
that the new paper tissue that passes the laboratory
test with X1 = 3 and X2 = 7 is included in the
“good” category.
Example
K – Means Clustering
 Contents
 Introduction
 Algorithm
 Example
 Application
42
KNN Clustering Algorithm
 Clustering: the process of grouping a set of objects into
classes of similar objects
 Documents within a cluster should be similar.
 Documents from different clusters should be dissimilar.
 The commonest form of unsupervised learning

Unsupervised learning = learning from raw data, as
opposed to supervised data where a classification of
examples is given.
 in principle, optimal partition achieved via minimising the sum
of squared distance to its “representative object” in each cluster
43
2
1
2
)(),( knn
N
n
k mxd −= ∑=
mxe.g., Euclidean distance =
K-means Algorithm
44
.
A Simple example showing the implementation
of k-means algorithm
(using K=2)
 .
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.
 Therefore, the new
clusters are:
{1,2} and {3,4,5,6,7}
 Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
λ Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
λ Therefore, there is no
change in the cluster.
λ Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
Example
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
50
consider the following data set consisting of the scores
of two variables on each of seven individuals:
Example
.
51
This data set is to be grouped into two clusters. As a first step in
finding a sensible initial partition, let the A & B values of the two
individuals furthest apart (using the Euclidean distance measure),
define the initial cluster means, giving:
  Individual
Mean Vector
(centroid)
Group 1 1 (1.0, 1.0)
Group 2 4 (5.0, 7.0)
Example
 The remaining individuals are now examined in sequence and
allocated to the cluster to which they are closest, in terms of
Euclidean distance to the cluster mean. The mean vector is
recalculated each time a new member is added.
52
Cluster 1 Cluster 2
Step Individual
Mean
Vector
(centroid)
Individual
Mean
Vector
(centroid)
1 1 (1.0, 1.0) 4 (5.0, 7.0)
2 1, 2 (1.2, 1.5) 4 (5.0, 7.0)
3 1, 2, 3 (1.8, 2.3) 4 (5.0, 7.0)
4 1, 2, 3 (1.8, 2.3) 4, 5 (4.2, 6.0)
5 1, 2, 3 (1.8, 2.3) 4, 5, 6 (4.3, 5.7)
6 1, 2, 3 (1.8, 2.3) 4, 5, 6, 7 (4.1, 5.4)
Example
 Now the initial partition has changed, and the two
clusters at this stage having the following
characteristics:
53
  Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)
Example
Individual
Distance to mean
(centroid) of
Cluster 1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
54
But we cannot yet be sure that each individual has been assigned
to the right cluster. So, we compare each individual’s distance to
its own cluster mean and to
that of the opposite cluster.
Example
 The iterative relocation would now continue from this new
partition until no more relocations occur. However, in this
example each individual is now nearer its own cluster mean than
that of the other cluster and the iteration stops, choosing the
latest partitioning as the final cluster solution.
55
  Individual
Mean Vector
(centroid)
Cluster 1 1, 2 (1.3, 1.5)
Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)
Applications
 Clustering helps marketers improve their customer base
and work on the target areas. It helps group people
(according to different criteria’s such as willingness,
purchasing power etc.) based on their similarity in many
ways related to the product under consideration.
 Clustering helps in identification of groups of houses on
the basis of their value, type and geographical locations.
 Clustering is used to study earth-quake. Based on the
areas hit by an earthquake in a region, clustering can
help analyse the next probable location where
earthquake can occur.
56
Random Forest
 Contents
 Random Forest Introduction
 Pseudocode
 Prediction Pseudocode
 Example
 Random Forest vs Decision Tree
 Advantages
 Disadvantages
 Application
57
Random Forest
 Random forest algorithm is a supervised classification and
regression algorithm.
 Randomly creates a forest with several trees.
58
Random Forest pseudocode
 Randomly select “k” features from total “m” features.
 Where k << m
 Among the “k” features, calculate the node “d” using
the best split point.
 Split the node into daughter nodes using the best split.
 Repeat 1 to 3 steps until “l” number of nodes has been
reached.
 Build forest by repeating steps 1 to 4 for “n” number
times to create “n” number of trees.
59
Prediction pseudocode
To perform prediction using the trained random forest
algorithm uses the below pseudocode.
 Takes the test features and use the rules of each
randomly created decision tree to predict the oucome
and stores the predicted outcome (target)
 Calculate the votes for each predicted target.
 Consider the high voted predicted target as the final
prediction from the random forest algorithm.
60
Example
61
Day Outlook Humidity Wind Play
D1 Sunny High Weak Yes
D2 Sunny High Strong No
D3 Overcast High Weak Yes
D4 Rain High Weak Yes
D5 Rain Normal Weak Yes
D6 Rain Normal Strong No
D7 Overcast Normal Strong Yes
D8 Sunny High Weak No
D9 Sunny Normal Weak Yes
D10 Rain Normal Weak Yes
D11 Sunny Normal Strong Yes
D12 Overcast High Strong Yes
D13 Overcast Normal Weak Yes
D14 Rain High Strong No
Example
 Whether the game will happen if the weather condition
is
Outlook = Rain, Humidity = High, Wind = weak
Play=?
 Step 1: divide the data into smaller subsets
 Step 2: every subsets need not be distinct, some
subsets may be overlapped
62
63
No Yes
Weak
Strong
Yes No
High
Normal
Yes No
High
Normal
Weak Weak StrongStrong
D1,D2,D3
D3,D4,D5,D6
D7,D8,D9
Majority Vote = Play
No Play
Play
Advantages
 Random forests is considered as a highly accurate and
robust method.
 It does not suffer from the overfitting problem.
 The algorithm can be used in both classification and
regression problems.
 Random forests can also handle missing values.
 You can get the relative feature importance, which helps
in selecting the most contributing features for the
classifier.
64
Disadvantages
 It can take longer than expected time to compute a large
number of trees.
 The model is difficult to interpret compared to a
decision tree.
65
Random forest vs Decision
Trees
 Random forests is a set of multiple decision trees.
 Deep decision trees may suffer from overfitting, but
random forests prevents overfitting by creating trees on
random subsets.
 Decision trees are computationally faster.
 Random forests is difficult to interpret, while a decision
tree is easily interpretable and can be converted to
rules.
66
Applications
 Banking
 Medicine
 Stock Market
 E-Commerce
67
68

More Related Content

PPTX
Gradient Boosted trees
PDF
Understanding random forests
PPTX
Decision Tree Classification Algorithm.pptx
PPSX
Decision tree Using c4.5 Algorithm
PPTX
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
PPT
Classification (ML).ppt
PDF
Data Science - Part V - Decision Trees & Random Forests
Gradient Boosted trees
Understanding random forests
Decision Tree Classification Algorithm.pptx
Decision tree Using c4.5 Algorithm
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Classification (ML).ppt
Data Science - Part V - Decision Trees & Random Forests

What's hot (20)

PPTX
Decision Trees
PDF
Bias and variance trade off
PPTX
Random forest
PPTX
ID3 ALGORITHM
PPTX
Supervised Machine Learning
PDF
I. Alpha-Beta Pruning in ai
PPTX
Decision Tree Learning
PPTX
Machine learning and types
PPTX
Introduction to ML (Machine Learning)
PPTX
Linear regression with gradient descent
PDF
Support Vector Machines ( SVM )
PPTX
Association Analysis in Data Mining
PPTX
Forms of learning in ai
PPTX
Naive bayes
PPTX
Uninformed search /Blind search in AI
PPTX
Chapter 4 Classification
PPTX
Lecture 18: Gaussian Mixture Models and Expectation Maximization
PPTX
Classification Algorithm.
PPTX
Learning in AI
Decision Trees
Bias and variance trade off
Random forest
ID3 ALGORITHM
Supervised Machine Learning
I. Alpha-Beta Pruning in ai
Decision Tree Learning
Machine learning and types
Introduction to ML (Machine Learning)
Linear regression with gradient descent
Support Vector Machines ( SVM )
Association Analysis in Data Mining
Forms of learning in ai
Naive bayes
Uninformed search /Blind search in AI
Chapter 4 Classification
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Classification Algorithm.
Learning in AI
Ad

Similar to Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, K nearest Neighbour, K means Clustering, Random Forest By akanksha Bali (20)

PPTX
Unit 3.pptx
PDF
Decision trees
PPT
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
PDF
machine_learning.pptx
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
PPTX
Data mining classifiers.
PPTX
Decision Trees Learning in Machine Learning
PDF
A Decision Tree Based Classifier for Classification & Prediction of Diseases
PDF
classification-BIA-13122022-103948am-05102023-013341pm.pdf
PPTX
Machine Learning, Decision Tree Learning module_2_ppt.pptx
PDF
Classification Based Machine Learning Algorithms
PDF
classification in data mining and data warehousing.pdf
PPTX
machine learning
PPT
Lecture -3 Classification(Decision Tree).ppt
PDF
Introduction to conventional machine learning techniques
PPTX
03-classificationTrees03-classificationTrees.pptx
PDF
Decision tree learning
PPTX
Machine learning Chapter three (16).pptx
PPTX
3. Tree Models in machine learning
Unit 3.pptx
Decision trees
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
machine_learning.pptx
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
Data mining classifiers.
Decision Trees Learning in Machine Learning
A Decision Tree Based Classifier for Classification & Prediction of Diseases
classification-BIA-13122022-103948am-05102023-013341pm.pdf
Machine Learning, Decision Tree Learning module_2_ppt.pptx
Classification Based Machine Learning Algorithms
classification in data mining and data warehousing.pdf
machine learning
Lecture -3 Classification(Decision Tree).ppt
Introduction to conventional machine learning techniques
03-classificationTrees03-classificationTrees.pptx
Decision tree learning
Machine learning Chapter three (16).pptx
3. Tree Models in machine learning
Ad

More from Akanksha Bali (7)

PDF
Feedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
PDF
Feedback by akanksha bali
PPTX
Regression analysis by akanksha Bali
PPTX
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
PPTX
Python Basics by Akanksha Bali
PPTX
Machine learning basics by akanksha bali
PPTX
Machine learning basics
Feedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
Feedback by akanksha bali
Regression analysis by akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Python Basics by Akanksha Bali
Machine learning basics by akanksha bali
Machine learning basics

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
web development for engineering and engineering
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
DOCX
573137875-Attendance-Management-System-original
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Digital Logic Computer Design lecture notes
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Internet of Things (IOT) - A guide to understanding
R24 SURVEYING LAB MANUAL for civil enggi
web development for engineering and engineering
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
573137875-Attendance-Management-System-original
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Sustainable Sites - Green Building Construction
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
UNIT 4 Total Quality Management .pptx
bas. eng. economics group 4 presentation 1.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Digital Logic Computer Design lecture notes
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf

Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, K nearest Neighbour, K means Clustering, Random Forest By akanksha Bali

  • 1. Decision Tree, Naive Bayes, Association rule Mining, Support Vector Machine, KNN, Kmeans Clustering, Random Forest Presented to Prof. Vibhakar Mansotra Dean of Mathematical Science, University of Jammu Presented by Akanksha Bali Research Scholar,Batch-2019, University of Jammu
  • 2. Contents  Decision Tree  Naive Bayes Classifier  Support Vector Machine  Association Rule Mining  Apriori Algorithm  K Nearest Neighbour  K means Clustering  Random forest 2
  • 3. Decision Trees  A decision tree is a flowchart-like tree structure where the data is continuously split according to a certain parameter  Each internal node(decision node) denotes a test on an attribute.  Each branch represents an outcome of the test.  Here are two main types of decision trees: Classification trees (yes/no types) What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable is categorical. Regression trees (continuous data types) Here the decision or the outcome variable is continuous, e.g. a number like 12 3
  • 4. 4 Entropy Entropy Entropy, also called as shannon entropy is denoted by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data. H(S) = ∑ p(x)log2p(x) Information gain Information gain is also called as kullback-leibler divergence denoted by IG(S,A) for a set S is the effective change in entropy after deciding on a particular attribute A. It measures the relative change in entropy with respect to the independent variables. IG(S,A) = H(S)-H(S,A) IG(S,A) = H(S) - ∑P(x)*H(x) Where IG(S, A) is the information gain by applying feature A. H(S) is the entropy of the entire set, while the second term calculates the entropy after applying the feature A, where p(x) is the probability of event x.
  • 5. 5 Top-Down Induction of Decision Trees ID3 D3 Algorithm will perform following tasks recursively 1.Create root node for the tree 2.If all examples are positive, return leaf node ‘positive’ 3.Else if all examples are negative, return leaf node ‘negative’ 4.Calculate the entropy of current state H(S) 5.For each attribute, calculate the entropy with respect to the attribute ‘x’ denoted by H(S, x) 6. Calculate 7. Select the attribute which has maximum value of IG(S, x) 8. Remove the attribute that offers highest IG from the set of attributes 9. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
  • 7. 7 Selecting the Next Attribute Humidity High Normal [3+, 4-] [6+, 1-] S=[9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind Weak Strong [6+, 2-] [3+, 3-] S=[9+,5-] E=0.940 E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 Humidity provides greater info. gain than Wind, w.r.t target classification.
  • 8. 8 Selecting the Next Attribute Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971 =0.247 E=0.971 E=0.971 Over cast [4+, 0] E=0.0
  • 9. 9 Selecting the Next Attribute The information gain values for the 4 attributes are: • Gain(S,Outlook) =0.247 • Gain(S,Humidity) =0.151 • Gain(S,Wind) =0.048 • Gain(S,Temperature) =0.029 where S denotes the collection of training examples
  • 10. 10 ID3 Algorithm Outlook Sunny Overcast Rain Yes [D1,D2,…,D14] [9+,5-] Ssunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
  • 11. 11 ID3 Algorithm Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes YesNo [D3,D7,D12,D13] [D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
  • 12. 12 Converting a Tree to Rules Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes YesNo R1: If (Outlook=Sunny) ∧ (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) ∧ (Humidity=Normal) Then PlayTennis=Yes R3: If (Outlook=Overcast) Then PlayTennis=Yes R4: If (Outlook=Rain) ∧ (Wind=Strong) Then PlayTennis=No R5: If (Outlook=Rain) ∧ (Wind=Weak) Then PlayTennis=Yes
  • 13. 13 Overfitting  One of the biggest problems with decision trees is Overfitting
  • 14. 14 Avoid Overfitting  stop growing when split not statistically significant  grow full tree, then post-prune
  • 15. NAÏVE BAYES ALGORITHM  The Bayesian Classification represents a supervised learning method as well as a statistical method for classification.  It can solve diagnostic and predictive problems.  It is based on the name of Thomas Bayes(1700-61).  It works on the principle of comditional probability as given by the bayes theorem. 15
  • 16. Derivation Derivation D : Set of tuples Each Tuple is an ‘n’ dimensional attribute vector X : (x1,x2,x3,…. xn) Let there be ‘m’ Classes : C1,C2,C3…Cm Maximum Posteriori Hypothesis P(Ci/X) = P(X/Ci) P(Ci) / P(X) (bayes theorem) 16
  • 17. Problem Statement  Consider the given data set, apply naive bayes algorithm and predict that if the fruit has the folowing properties then which type of fruit it is Fruit = { yellow, sweet, long} Fruit Yellow Sweet Long Total Orange 350 450 0 650 Banana 400 300 350 400 others 50 100 50 150 Total 800 850 400 1200
  • 18. Problem  Step 1: Compute the prior probabilities for each of the class of fruits:  P(C=orange) = 650/1200 = 0.54  P(C=banana) = 400/1200 = 0.33  P(C=others) = 150/1200 = 0.125  Step 2: Compute the probability of evidence  P(X1=long) = 400/1200=0.33  P(X2=sweet) = 850/1200 = 0.708  P(X3=yellow) = 800/1200 = 0.66  Step 3: Compute the probability of likelihood of evidences  P(C=orange|X1=long) = 0/400 = 0  P(C=orange|X2=sweet) = 450/850 = 0.52  P(C=orange|X3=yellow) = 350/800 = 0.43  P(C=Banana|X1=long) = 350/400 = 0.875  P(C=Banana|X2=sweet) = 300/850 = 0.35  P(C=Banana|X3=yellow) = 400/800 = 0.5  P(C=others|X1=long) = 50/400 = 0.125  P(C=others|X2=sweet) = 100/850 = 0.117  P(C=others|X3=yellow) = 50/800 = 0.0625 18
  • 19. Problem  Step 5: Calculate posterior probability  P(Yellow|Orange)=P(orange|Yellow)*P(yellow) = (0.43*0.66)/0.5 = 0.5676  P(Sweet|Orange) = 0.69  P(Long|Orange) = 0 Step 6: P(fruit| Orange) = 0.56*0.69*0 = 0 In the Similar way P(fruit|banana)= 1*0.75 * 0.87 = 0.65 P(fruit|others) = 0.33*0.66*0.33 = 0.072 Step 7: Prediction :- type of fruit is Banana 19 P(Orange)
  • 20. Association rule mining  Association rule learning is a rule-based machine learning method for discovering interesting relations between variable  Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. 20
  • 22. Association rule mining Important concepts of Association Rule Mining: The support supp(X) of an itemset is defined as the proportion of transactions in the data set which contain the itemset. In the example database, the itemset {milk,bread,butter}has a support of 1/5=0.2 since it occurs in 20% of all transactions (1 out of 5 transactions). The confidence of a rule is defined conf(X=>Y)= supp(XUY)/supp(X) For example, the rule {butter, bread}=>{milk} has a confidence of supp(butter,bread,milk}/support(butter,bread} = 0.2/0.2=1 in the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well). 22
  • 23. APIORI ALGORITHM  The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset properties.  Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to explore (k+1)-itemsets.  First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each item, and collecting those items that satisfy minimum support.  The resulting set is denoted L1.  Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no more frequent k-itemsets can be found.  The finding of each Lk requires one full scan of the database.
  • 24. Problem Statement For the following given transaction dataset. Generate rules using apriori algorithm. Consider the values as support = 50% and confidence = 50% 24 Transaction ID Items Purchased I1 A,B,C I2 A,C I3 A,D I4 B,E,F
  • 25. Problem Statement  Step 1: Create table of Frequent itemset and calculate support 25 items Frequency Support count {A} 3 ¾=75% {B} 2 2/4=50% {C} 2 2/4=50% {D} 1 ¼=25% {E} 1 ¼=25% {F} 1 ¼=25%
  • 26. Problem Statement  Step 2: Choose rows with support value is equal or greater than 50% 26 items Frequency Support count {A} 3 ¾=75% {B} 2 2/4=50% {C} 2 2/4=50%
  • 27. Problem Statement  Step 3: Create table of 2 item Frequent set and calculate their frequency and support 27 items Frequency Support count {A,B} 1 ¼ =25% {A,C} 2 2/4 =50% {B,C} 1 ¼ =25%
  • 28. Problem Statement  Step 4: Choose rows with support value is equal or greater than 50%  Formulate Final rules and calculate confidence 28 items Frequency Support count {A,C} 2 2/4 =50% Association rules supp confiden ce Conf% A->C 2 2/3=.66 66% C->A 2 2/2=1 100%
  • 29. SUPPORT VECTOR MACHINE  Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges.  Mostly used in classification problems.  we perform classification by finding the hyper-plane that differentiate the two classes very well 29
  • 30. Identify the right hyper-plane scenario 1: scenario 2: 30 scenario 3:
  • 31. Identify the right hyper-plane 31 scenario 4:
  • 32. Support vector machine  Pros:  It works really well with clear margin of separation  It is effective in high dimensional spaces.  It is effective in cases where number of dimensions is greater than the number of samples.  It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.  Cons:  It doesn’t perform well, when we have large data set because the required training time is higher  It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping 32
  • 33. K Nearest Neighbour  Contents  Introduction  Closeness  Algorithm  Example 33
  • 34. K Nearest Neighbour • K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection. • It was first described in the early 1950s. • Gained popularity, when increased computing power became available. • Used widely in area of pattern recognition and statistical estimation. 34
  • 35. Closeness  The Euclidean distance between two points or tuples, say, X1 = (x11,x12,...,x1n) and X2 =(x21,x22,...,x2n),is 35
  • 37. Example •  We have data from the questionnaires survey and objective testing with two attributes (acid durability and strength) to classify whether a special paper tissue is good or not. Here are four training samples : X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Y = Classification 7 7 Bad 7 4 Bad 3 4 Good 1 4 Good Now the factory produces a new paper tissue that passes the laboratory test with X1 = 3 and X2 = 7. Guess the classification of this new tissue.
  • 38.  Step 1 : Initialize and Define k. Lets say, k = 3 (Always choose k as an odd number if the number of attributes is even to avoid a tie in the class prediction)  Step 2 : Compute the distance between input sample and trainingsample - Co-ordinate of the input sample is (3,7). - Instead of calculating the Euclidean distance, we calculate the Squared Euclidean distance. X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Squared Euclidean distance 7 7 (7-3)2 + (7-7)2 = 16 7 4 (7-3)2 + (4-7)2 = 25 3 4 (3-3)2 + (4-7)2 = 09 1 4 (1-3)2 + (4-7)2 = 13
  • 39.  Step 3 : Sort the distance and determine the nearest neighbours based of the Kth minimum distance : X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Squared Euclidean distance Rank minimum distance Is it included in 3-Nearest Neighbour? 7 7 16 3 Yes 7 4 25 4 No 3 4 09 1 Yes 1 4 13 2 Yes Example
  • 40. Step 4 : Take 3-Nearest Neighbours: Gather the category Y of the nearest neighbours. X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Squared Euclidean distance Rank minimum distance Is it included in 3-Nearest Neighbour? Y = Categor y of the nearest neighbo ur 7 7 16 3 Yes Bad 7 4 25 4 No - 3 4 09 1 Yes Good 1 4 13 2 Yes Good Example
  • 41. Step 5 : Apply simple majority Use simple majority of the category of the nearest neighbours as the prediction value of the query instance. We have 2 “good” and 1 “bad”. Thus we conclude that the new paper tissue that passes the laboratory test with X1 = 3 and X2 = 7 is included in the “good” category. Example
  • 42. K – Means Clustering  Contents  Introduction  Algorithm  Example  Application 42
  • 43. KNN Clustering Algorithm  Clustering: the process of grouping a set of objects into classes of similar objects  Documents within a cluster should be similar.  Documents from different clusters should be dissimilar.  The commonest form of unsupervised learning  Unsupervised learning = learning from raw data, as opposed to supervised data where a classification of examples is given.  in principle, optimal partition achieved via minimising the sum of squared distance to its “representative object” in each cluster 43 2 1 2 )(),( knn N n k mxd −= ∑= mxe.g., Euclidean distance =
  • 45. A Simple example showing the implementation of k-means algorithm (using K=2)  .
  • 46. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
  • 47. Step 2:  Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}.  Their new centroids are:
  • 48. Step 3:  Now using these centroids we compute the Euclidean distance of each object, as shown in table.  Therefore, the new clusters are: {1,2} and {3,4,5,6,7}  Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
  • 49. λ Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7} λ Therefore, there is no change in the cluster. λ Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
  • 50. Example Subject A B 1 1.0 1.0 2 1.5 2.0 3 3.0 4.0 4 5.0 7.0 5 3.5 5.0 6 4.5 5.0 7 3.5 4.5 50 consider the following data set consisting of the scores of two variables on each of seven individuals:
  • 51. Example . 51 This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the A & B values of the two individuals furthest apart (using the Euclidean distance measure), define the initial cluster means, giving:   Individual Mean Vector (centroid) Group 1 1 (1.0, 1.0) Group 2 4 (5.0, 7.0)
  • 52. Example  The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a new member is added. 52 Cluster 1 Cluster 2 Step Individual Mean Vector (centroid) Individual Mean Vector (centroid) 1 1 (1.0, 1.0) 4 (5.0, 7.0) 2 1, 2 (1.2, 1.5) 4 (5.0, 7.0) 3 1, 2, 3 (1.8, 2.3) 4 (5.0, 7.0) 4 1, 2, 3 (1.8, 2.3) 4, 5 (4.2, 6.0) 5 1, 2, 3 (1.8, 2.3) 4, 5, 6 (4.3, 5.7) 6 1, 2, 3 (1.8, 2.3) 4, 5, 6, 7 (4.1, 5.4)
  • 53. Example  Now the initial partition has changed, and the two clusters at this stage having the following characteristics: 53   Individual Mean Vector (centroid) Cluster 1 1, 2, 3 (1.8, 2.3) Cluster 2 4, 5, 6, 7 (4.1, 5.4)
  • 54. Example Individual Distance to mean (centroid) of Cluster 1 Distance to mean (centroid) of Cluster 2 1 1.5 5.4 2 0.4 4.3 3 2.1 1.8 4 5.7 1.8 5 3.2 0.7 6 3.8 0.6 7 2.8 1.1 54 But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each individual’s distance to its own cluster mean and to that of the opposite cluster.
  • 55. Example  The iterative relocation would now continue from this new partition until no more relocations occur. However, in this example each individual is now nearer its own cluster mean than that of the other cluster and the iteration stops, choosing the latest partitioning as the final cluster solution. 55   Individual Mean Vector (centroid) Cluster 1 1, 2 (1.3, 1.5) Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)
  • 56. Applications  Clustering helps marketers improve their customer base and work on the target areas. It helps group people (according to different criteria’s such as willingness, purchasing power etc.) based on their similarity in many ways related to the product under consideration.  Clustering helps in identification of groups of houses on the basis of their value, type and geographical locations.  Clustering is used to study earth-quake. Based on the areas hit by an earthquake in a region, clustering can help analyse the next probable location where earthquake can occur. 56
  • 57. Random Forest  Contents  Random Forest Introduction  Pseudocode  Prediction Pseudocode  Example  Random Forest vs Decision Tree  Advantages  Disadvantages  Application 57
  • 58. Random Forest  Random forest algorithm is a supervised classification and regression algorithm.  Randomly creates a forest with several trees. 58
  • 59. Random Forest pseudocode  Randomly select “k” features from total “m” features.  Where k << m  Among the “k” features, calculate the node “d” using the best split point.  Split the node into daughter nodes using the best split.  Repeat 1 to 3 steps until “l” number of nodes has been reached.  Build forest by repeating steps 1 to 4 for “n” number times to create “n” number of trees. 59
  • 60. Prediction pseudocode To perform prediction using the trained random forest algorithm uses the below pseudocode.  Takes the test features and use the rules of each randomly created decision tree to predict the oucome and stores the predicted outcome (target)  Calculate the votes for each predicted target.  Consider the high voted predicted target as the final prediction from the random forest algorithm. 60
  • 61. Example 61 Day Outlook Humidity Wind Play D1 Sunny High Weak Yes D2 Sunny High Strong No D3 Overcast High Weak Yes D4 Rain High Weak Yes D5 Rain Normal Weak Yes D6 Rain Normal Strong No D7 Overcast Normal Strong Yes D8 Sunny High Weak No D9 Sunny Normal Weak Yes D10 Rain Normal Weak Yes D11 Sunny Normal Strong Yes D12 Overcast High Strong Yes D13 Overcast Normal Weak Yes D14 Rain High Strong No
  • 62. Example  Whether the game will happen if the weather condition is Outlook = Rain, Humidity = High, Wind = weak Play=?  Step 1: divide the data into smaller subsets  Step 2: every subsets need not be distinct, some subsets may be overlapped 62
  • 63. 63 No Yes Weak Strong Yes No High Normal Yes No High Normal Weak Weak StrongStrong D1,D2,D3 D3,D4,D5,D6 D7,D8,D9 Majority Vote = Play No Play Play
  • 64. Advantages  Random forests is considered as a highly accurate and robust method.  It does not suffer from the overfitting problem.  The algorithm can be used in both classification and regression problems.  Random forests can also handle missing values.  You can get the relative feature importance, which helps in selecting the most contributing features for the classifier. 64
  • 65. Disadvantages  It can take longer than expected time to compute a large number of trees.  The model is difficult to interpret compared to a decision tree. 65
  • 66. Random forest vs Decision Trees  Random forests is a set of multiple decision trees.  Deep decision trees may suffer from overfitting, but random forests prevents overfitting by creating trees on random subsets.  Decision trees are computationally faster.  Random forests is difficult to interpret, while a decision tree is easily interpretable and can be converted to rules. 66
  • 67. Applications  Banking  Medicine  Stock Market  E-Commerce 67
  • 68. 68