SlideShare a Scribd company logo
Machine Learning with Python
Machine Learning Algorithms - RANDOM FOREST
Prof.ShibdasDutta,
Associate Professor,
DCGDATACORESYSTEMSINDIAPVTLTD
Kolkata
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
Machine Learning Algorithms – Classification Algo- RANDOM FOREST
Introduction - RANDOM FOREST
As the name suggests, the Random forest is a “forest” of trees! i.e Decision Trees.
A random forest is a tree-based machine learning algorithm that randomly selects
specific features to build multiple decision trees.
The random forest then combines the output of individual decision trees to generate
the final output.
Decision trees involve the greedy selection to the best split point from the dataset at
each step.
We can use random forest for classification as well as regression problems.
If the total number of column in the training dataset is denoted by p :
We take sqrt(p) number of columns for classification
For regression, we take a p/3 number of columns.
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
WHEN TO USE RANDOM FOREST ?
When we focus on accuracy rather than interpretation
If you want better accuracy on the unexpected validation dataset
HOW TO USE RANDOM FOREST ?
Select random samples from a given dataset
Construct decision trees from every sample and obtain their output
Perform a vote for each predicted result.
Most voted prediction is selected as the final prediction result.
Random Forest
Training
Sample 1
Training
Sample 2
Voting
Prediction
Training
Sample 1
Training
Sample n
Training
Sample 1
Training
Sample 1
Training Set
Test Set
The following diagram will illustrate its working:
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
STOCK PREDICTION USING RANDOM FOREST-EXAMPLE
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Import the model we are using
from sklearn.ensemble import RandomForestRegressor
data = pd.read_csv('data.csv')
data.head()
Here, we will be using the dataset (available below) which contains seven columns namely date, open, high, low, close,
volume, and name of the company.
Here in this case google is the only company we have used.
Open refers to the time at which people can begin trading on a particular exchange.
Low represents a lower price point for a stock or index.
High refers to a market milestone in which a stock or index reaches a greater price point than previously for a particular
time period.
Close simply refers to the time at which a stock exchange closes to trading.
Volume refers to the number of shares of stock traded during a particular time period, normally measured in average
daily trading volume.
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
abc=[]
for i in range(len(data)):
abc.append(data['Date'][i].split('-'))
data['Date'][i] = ''.join(abc[i])
Using above dataset, we are now trying to predict the ‘Close’ Value based on all attributes. Let’s split the data into
train and test dataset.
#These are the labels: They describe what the stock price was over a period.
X_1 = data.drop('Close',axis=1)
Y_1 = data['Close']
# Using Skicit-learn to split data into training and testing sets
from sklearn.model_selection import train_test_split
X_train_1, X_test_1, y_train_1, y_test_1 = train_test_split(X_1, Y_1, test_size=0.33, random_state=42)
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
Now, let’s instantiate the model and train the model on training dataset:
rfg = RandomForestRegressor(n_estimators= 10, random_state=42)
rfg.fit(X_train_1,y_train_1)
pd.concat([pd.Series(rfg.predict(X_test_1)), y_test_1.reset_index(
drop=True)], axis=1)
Let’s find out the features on the basis of their importance by calculating numerical feature importances
# Saving feature names for later use
feature_list = list(X_1.columns)
print(feature_list)
# Get numerical feature importances
importances = list(rfg.feature_importances_)
# List of tuples with variable and importance
feature_importances = [(feature, round(importance, 2)) for feature, importance in zip(feature_list, importances)]
# Sort the feature importances by most important first
feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)
# Print out the feature and importances
[print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
rfg.score(X_test_1, y_test_1)
We are getting an accuracy of ~99% while predicting. We then display the original value and the predicted Values.
pd.concat([pd.Series(rfg.predict(X_test_1)), y_test_1.reset_index(drop=True)], axis=1)
Prediction
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
ADVANTAGES OF RANDOM FOREST
It reduces overfitting as it yields prediction based on majority voting.
Random forest can be used for classification as well as regression.
It works well on a large range of datasets.
Random forest provides better accuracy on unseen data and even if some data is missing
Data normalization isn’t required as it is a rule-based approach
DISADVANTAGES
Random forest requires much more computational power and memory space to build numerous decision trees.
Due to the ensemble of decision trees, it also suffers interpretability and fails to determine the significance of each
variable.
Random forests can be less intuitive for a large collection of decision trees.
Using bagging techniques, Random forest makes trees only which are dependent on each other. Bagging might provide
similar predictions in each tree as the same greedy algorithm is used to create each decision tree. Hence, it is likely to be
using the same or very similar split points in each tree which mitigates the variance originally sought.
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
Thank You
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

More Related Content

PPTX
random forest.pptx
PPTX
An Introduction to Random Forest and linear regression algorithms
PPTX
Seminar PPT on Random Forest Tree Algorithm
PPTX
Random Forest Decision Tree.pptx
PDF
random forest gefhrfgygfrygfdjfggfhg.pdf
PDF
Understanding random forests
ODP
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
PDF
Random Forest / Bootstrap Aggregation
random forest.pptx
An Introduction to Random Forest and linear regression algorithms
Seminar PPT on Random Forest Tree Algorithm
Random Forest Decision Tree.pptx
random forest gefhrfgygfrygfdjfggfhg.pdf
Understanding random forests
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
Random Forest / Bootstrap Aggregation

Similar to Machine Learning with Python- Machine Learning Algorithms- Random Forest.pdf (20)

PDF
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
PPT
RANDOM FORESTS Ensemble technique Introduction
PPTX
Random_Forest_Presentation_Detailed.pptx
PPTX
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
PDF
TERM DEPOSIT SUBSCRIPTION PREDICTION
PPTX
13 random forest
PPTX
Comparitive Analysis .pptx Footprinting, Enumeration, Scanning, Sniffing, Soc...
PPTX
Supervised and Unsupervised Learning .pptx
PPTX
Random_Forest_Presentation_More_Detailed.pptx
PDF
Tree models with Scikit-Learn: Great models with little assumptions
PDF
Random forest algorithm for regression a beginner's guide
PPTX
DecisionTree_RandomForest.pptx
PPTX
DecisionTree_RandomForest good for data science
PPTX
stock market prediction project powerpoint
PDF
What Is Random Forest_ analytics_ IBM.pdf
PPTX
what is Random-Forest-Machine-Learning.pptx
PPTX
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
PPTX
Introduction to RandomForests 2004
PDF
Predict oscars (5:11)
PPTX
Decision_Trees_Random_Forests for use in machine learning and computer scienc...
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
RANDOM FORESTS Ensemble technique Introduction
Random_Forest_Presentation_Detailed.pptx
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
TERM DEPOSIT SUBSCRIPTION PREDICTION
13 random forest
Comparitive Analysis .pptx Footprinting, Enumeration, Scanning, Sniffing, Soc...
Supervised and Unsupervised Learning .pptx
Random_Forest_Presentation_More_Detailed.pptx
Tree models with Scikit-Learn: Great models with little assumptions
Random forest algorithm for regression a beginner's guide
DecisionTree_RandomForest.pptx
DecisionTree_RandomForest good for data science
stock market prediction project powerpoint
What Is Random Forest_ analytics_ IBM.pdf
what is Random-Forest-Machine-Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
Introduction to RandomForests 2004
Predict oscars (5:11)
Decision_Trees_Random_Forests for use in machine learning and computer scienc...
Ad

More from KalighatOkira (6)

PDF
Machine Learning with Python- Machine Learning Algorithms.pdf
PDF
Machine Learning with Python- Machine Learning Algorithms- Naïve Bayes.pdf
PDF
Machine Learning with Python- Machine Learning Algorithms- Logistic Regressio...
PDF
Machine Learning with Python- Machine Learning Algorithms- Decision Tree.pdf
PDF
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
PDF
Basics of C Prog Lang.pdf
Machine Learning with Python- Machine Learning Algorithms.pdf
Machine Learning with Python- Machine Learning Algorithms- Naïve Bayes.pdf
Machine Learning with Python- Machine Learning Algorithms- Logistic Regressio...
Machine Learning with Python- Machine Learning Algorithms- Decision Tree.pdf
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
Basics of C Prog Lang.pdf
Ad

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
additive manufacturing of ss316l using mig welding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
PPT on Performance Review to get promotions
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
composite construction of structures.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Sustainable Sites - Green Building Construction
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
additive manufacturing of ss316l using mig welding
Operating System & Kernel Study Guide-1 - converted.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
Model Code of Practice - Construction Work - 21102022 .pdf
R24 SURVEYING LAB MANUAL for civil enggi
PPT on Performance Review to get promotions
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
composite construction of structures.pdf
UNIT 4 Total Quality Management .pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CYBER-CRIMES AND SECURITY A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mechanical Engineering MATERIALS Selection
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Sustainable Sites - Green Building Construction

Machine Learning with Python- Machine Learning Algorithms- Random Forest.pdf

  • 1. Machine Learning with Python Machine Learning Algorithms - RANDOM FOREST Prof.ShibdasDutta, Associate Professor, DCGDATACORESYSTEMSINDIAPVTLTD Kolkata Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 2. Machine Learning Algorithms – Classification Algo- RANDOM FOREST Introduction - RANDOM FOREST As the name suggests, the Random forest is a “forest” of trees! i.e Decision Trees. A random forest is a tree-based machine learning algorithm that randomly selects specific features to build multiple decision trees. The random forest then combines the output of individual decision trees to generate the final output. Decision trees involve the greedy selection to the best split point from the dataset at each step. We can use random forest for classification as well as regression problems. If the total number of column in the training dataset is denoted by p : We take sqrt(p) number of columns for classification For regression, we take a p/3 number of columns. Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 3. WHEN TO USE RANDOM FOREST ? When we focus on accuracy rather than interpretation If you want better accuracy on the unexpected validation dataset HOW TO USE RANDOM FOREST ? Select random samples from a given dataset Construct decision trees from every sample and obtain their output Perform a vote for each predicted result. Most voted prediction is selected as the final prediction result. Random Forest
  • 4. Training Sample 1 Training Sample 2 Voting Prediction Training Sample 1 Training Sample n Training Sample 1 Training Sample 1 Training Set Test Set The following diagram will illustrate its working: Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 5. STOCK PREDICTION USING RANDOM FOREST-EXAMPLE import matplotlib.pyplot as plt import numpy as np import pandas as pd # Import the model we are using from sklearn.ensemble import RandomForestRegressor data = pd.read_csv('data.csv') data.head() Here, we will be using the dataset (available below) which contains seven columns namely date, open, high, low, close, volume, and name of the company. Here in this case google is the only company we have used. Open refers to the time at which people can begin trading on a particular exchange. Low represents a lower price point for a stock or index. High refers to a market milestone in which a stock or index reaches a greater price point than previously for a particular time period. Close simply refers to the time at which a stock exchange closes to trading. Volume refers to the number of shares of stock traded during a particular time period, normally measured in average daily trading volume. Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 6. abc=[] for i in range(len(data)): abc.append(data['Date'][i].split('-')) data['Date'][i] = ''.join(abc[i]) Using above dataset, we are now trying to predict the ‘Close’ Value based on all attributes. Let’s split the data into train and test dataset. #These are the labels: They describe what the stock price was over a period. X_1 = data.drop('Close',axis=1) Y_1 = data['Close'] # Using Skicit-learn to split data into training and testing sets from sklearn.model_selection import train_test_split X_train_1, X_test_1, y_train_1, y_test_1 = train_test_split(X_1, Y_1, test_size=0.33, random_state=42) Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 7. Now, let’s instantiate the model and train the model on training dataset: rfg = RandomForestRegressor(n_estimators= 10, random_state=42) rfg.fit(X_train_1,y_train_1) pd.concat([pd.Series(rfg.predict(X_test_1)), y_test_1.reset_index( drop=True)], axis=1) Let’s find out the features on the basis of their importance by calculating numerical feature importances # Saving feature names for later use feature_list = list(X_1.columns) print(feature_list) # Get numerical feature importances importances = list(rfg.feature_importances_) # List of tuples with variable and importance feature_importances = [(feature, round(importance, 2)) for feature, importance in zip(feature_list, importances)] # Sort the feature importances by most important first feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True) # Print out the feature and importances [print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances]; Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 8. rfg.score(X_test_1, y_test_1) We are getting an accuracy of ~99% while predicting. We then display the original value and the predicted Values. pd.concat([pd.Series(rfg.predict(X_test_1)), y_test_1.reset_index(drop=True)], axis=1) Prediction Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 9. ADVANTAGES OF RANDOM FOREST It reduces overfitting as it yields prediction based on majority voting. Random forest can be used for classification as well as regression. It works well on a large range of datasets. Random forest provides better accuracy on unseen data and even if some data is missing Data normalization isn’t required as it is a rule-based approach DISADVANTAGES Random forest requires much more computational power and memory space to build numerous decision trees. Due to the ensemble of decision trees, it also suffers interpretability and fails to determine the significance of each variable. Random forests can be less intuitive for a large collection of decision trees. Using bagging techniques, Random forest makes trees only which are dependent on each other. Bagging might provide similar predictions in each tree as the same greedy algorithm is used to create each decision tree. Hence, it is likely to be using the same or very similar split points in each tree which mitigates the variance originally sought. Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
  • 10. Thank You Company Confidential: Data-Core Systems, Inc. | datacoresystems.com