SlideShare a Scribd company logo
A Statistical Model to Predict NBA
Play-off Results
Rishikesh Ravi
Under the guidance of Divya Ma’am
Contents
No Title
1 Introduction
2 Specifications for Hardware and Software
3 Data Flow Diagram
4 Content
5 Validation
7 Output
11 Thank You
10 Questions
9 Bibliography
8 Conclusion
6 Implementation
Introduction
This research presentation is on a topic that I am interested in - Sports Analytics. The project can
be important in the future as this algorithm or a modified version can be used for prediction of
various sports events and the roster construction of NBA teams in the future.
In this project, data from various sources have been collected. Various programming and statistics
techniques have been used. These techniques have been used to determine the relevance and
correlation of different variables in the data set. In addition, various approaches for feature
selection have been used. Logistic regression has been used for the prediction model.
The final objective was to Predict the Success of an NBA Team. The model was trained using a
training set (14 seasons between 2000 and 2020) and the algorithm’s prediction capability was
tested on testing data set(3 seasons between 2021 and 2023).
Specifications (For the hardware and software)
Hardware Specifications
Processor- Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz
System Type-64-bit operating system, x64-based processor
Installed RAM- 8 GB
Software Specifications
Operating System- Windows 11 , version 22H2
Programming Language- Python 3.10.9
Developer platform- conda 23.3.1
Data Flow Diagram
Data set of NBA data Feature selection Data Cleaning
Train the
data
Test the
model on
new
dataset
Assess accuracy of
the output
Non cleaned data
Use trained model and
test data
About NBA
• NBA is the National Basketball Association
• It is a USA based basketball league where the best players
from all over the world come and play
• There are 2 conferences within NBA - the East and the West
• Within each conference there are 15 teams
About NBA Play-offs
• Each NBA team's success in the regular season is tested in the playoffs
• After regular season, based on performance, only 16 teams qualify for play-offs
• A qualified team can play a maximum of 4 series in the playoffs- The 1st round, the conference
semi-finals, the conference finals and the finals.
• Each series consists of a maximum of 7 games and the first team to win 4 games is the winner
• Using this program, I have tried to determine which regular season stats are predictors of the
post season and have tried to predict the top 4 teams in the playoffs (the teams in the
conference finals).
About the Project
• The dataset is built from NBA data
• I have built a Logistic Regression based model to Predict success in the play-offs
based on regular season player statistics
• Success metric – The Team should finish in the top 4 in its conference after Play-offs.
• This model will help us better understand what are the factors that drive success for
a Team in NBA
• They can use this for their team building and recruitment
Key Statistics that were considered
• 3 groups of statistics were used- Per game statistics, Advanced statistics and shooting statistics
• The per game statistics include- Minutes played by the team’s players per game, Field
goals attempted, Field Goals Made, Field goal percentage, 2-point attempts, 2 pointers made,
2-point percentage, 3-point attempts, 3-pointers made, 3-point percentage, Free throws
attempted, free throws made, Free throw percentage, Offensive rebounds per game, defensive
rebounds per game, Total rebounds per game, blocks per game, steals per game, assists per
game, points per game and Personal fouls per game
• The advanced statistics include- Average age of players, Offensive Rating, Net Rating,
Defensive Rating, Pace of play, Free throw attempt rate, 3- point attempt rate, True shooting
percentage, Effective field goal percentage of the team and their opponents, Turn over
percentage of the team and their opponents, Offensive and defensive rebound percentage of
the team and their opponents.
Key Statistics that were considered (Page 2)
• The shooting statistics include- Field Goal percentage, Shooting percentages and
attempts from 0- 3 feet , 3-10 feet, 10-16 feet and 16 feet till the 3-point line away from the
basket, Number of half court shots made and attempted, number of dunks made and
attempted, number of layups made and attempted, number of corner 3-point shots made
and attempted and average distance of each shot
Exploratory Data Analysis (Correlation Graph)
• A graph can be plotted in python to analyze the relation
of each statistic with every other statistic in the data.
This is the correlation graph.
• On the right, less features have been used to show how
the graph works. In this graph, all the statistics have a
positive correlation with each other (correlation varies
from 0 to 1). The lightest color represents the highest
correlation
The correlation plot for the entire data
• For this graph, the darkest and the
lightest box represent the highest
correlation within the data members
• Here the correlations are both negative
and positive
Exploratory Data Analysis (Correlation Graph)
Feature Selection
• Feature selection is the method of
filtering out the most important
data features from the features that
hold less importance.
• In this program, we have used
feature selection to find the values
which have the highest correlation
with the target variable, ‘TOP4’
Most Important features after feature selection- NRtg
Net rating (NRtg) is the difference
between the number of points made
and defended for every 100
possessions.
True Shooting percentage(TS%) is a
measure of the number of points scored
by a team per every possession.
Most Important features after feature selection- TS%
PTS- Points per game
FGA- Field Goals attempted per game
FTA- Free throws attempted per game
Most Important features after feature selection- AST
and PPG
Assists per game (AST) and points
per game (PTS) are 2 interrelated
stats.
More assists lead to more points
Assists per game (AST)
Points per game (PTS)
Most Important features after feature selection- eFG%
Effective field goal percentage (eFG%)
is a statistic of the percentage of shots made
while providing extra weightage to the 3
pointers made.
FGM= Field Goals Made
3PM- 3 Pointers Made
FGA- Field Goals attempted
Methodology- Logistic Regression
• Logistic regression is a technique used for solving
classification problems (where the target variable is either 1 or
0, true or false). It divides a data set into a training set (the set
which gives the data to the program to help in its prediction)
and the test set (the set in which the target variable is predicted
using training data)
• Threshold- The threshold in logistic regression can be
explained with an example- In a school exam, if the passing
mark is 40%, then 0.4 is the threshold. If a student is predicted
to score above 0.4, then 1 will be returned, otherwise 0.
Sample Python Program
Code used for the prediction of data set
Implementation
Step 1- define the objective and the success criteria.
Step 2- understand and identify the data
Step 3- Collect and prepare the data set
Step 4- Determine the model’s features (Feature selection)
Step 5- Train the data set
Step 6- Evaluate the model’s performance
Step 7- Put the model in operation
Step 8- Check the results and adjust the model accordingly
Validation
The error that the program shows is that it will call out null values (inputs with no values entered) and
will not be able to execute them.
Output results
• The outputs on the right side of the page show the
accuracy of the prediction. The 2 most important terms in
it are PRECISION and RECALL.
• If there are 40 teams which are ‘1’, and only 20 have been
predicted to be ‘1’, then 20/40 is the recall.
• Whereas, if out of the 20 teams predicted to be ‘1’, only 16
of them are actually ‘1’, then the precision is 16/20.
• The teams predicted as 1 and are actually 1 are true
positives. The teams predicted as 0 and are actually 0
are true negatives. The teams predicted as 1 and are
actually 0 are false positives and the teams predicted as 0
and are actually 1 are false negatives.
• From the predicted data set, there were 78 true
negatives, 6 false negatives, 6 true positives , and
0 false positives.
Output of the train set
Prediction output for test set
Conclusion
• A logistic regression model was built for this project.
• Success in the model was predicting that a team reached the conference finals (Top 4).
• All the 6 teams that the model predicted to reach the conference finals did reach the top four. For
example, 2022 Golden State warriors, the 2022 Boston Celtics and 2023 Boston Celtics.
• On the other hand, teams like 2023 Miami heat and 2023 Los Angeles Lakers were teams which made the
conference finals but were not predicted by the model.
• In order to improve the model, new statistics like number of all NBA players from the previous season or
number of MVP level players should be taken into consideration.
• Other techniques like Random Forest could also be used to improve the model
• This type of model could be used for other sports like cricket, football, etc
Bibliography
• YUBI internship training for data science
• Basketball-reference.com
• www.nbastuffer.com
• Python for machine learning and Data science bootcamp – A Udemy Course
Any Questions?
Thank You

More Related Content

PPTX
ЕКОЛОГІЧНИЙ КВЕСТ гра з глядачами
PPTX
Basic Concepts Of OOPS/OOPS in Java,C++
PPTX
SSIS Connection managers and data sources
PDF
Introduction to integrated vehicle health management
PPTX
Whale optimizatio algorithm
PDF
Neural networks
PPTX
Загальна характеристика плазунів
PPTX
Muerte en el priorato 2
ЕКОЛОГІЧНИЙ КВЕСТ гра з глядачами
Basic Concepts Of OOPS/OOPS in Java,C++
SSIS Connection managers and data sources
Introduction to integrated vehicle health management
Whale optimizatio algorithm
Neural networks
Загальна характеристика плазунів
Muerte en el priorato 2

Similar to NBA playoff prediction Model.pptx (20)

PDF
MoneyBall
PDF
IRJET-V8I11270.pdf
PPTX
Advanced metrics in Basketball and SEO
PDF
m503 Project1 FINAL DRAFT
PDF
DSLIewgferfhbhthhthtrhtrhrtrthtrherNK1.pdf
PPT
Web Quest Baseball
PDF
CLanctot_DSlavin_JMiron_Stats415_Project
PDF
PDF
[系列活動] 資料探勘速遊 - Session4 case-studies
PDF
Predicting Football Match Results with Data Mining Techniques
PDF
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
PPTX
IPL match winning predicion using machine learnong
DOCX
IPL auction q1_q2.docx
PDF
Pierre massé portfolio
PDF
NIT1201 Introduction to Database System Assignment by USA Experts
PPTX
IPL Match winning prediction using machine learning
PPTX
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
PPTX
Predicting the NBA MVP
PPTX
IPL WIN PREDICTION.pptx
PPTX
IPL WIN .pptx
MoneyBall
IRJET-V8I11270.pdf
Advanced metrics in Basketball and SEO
m503 Project1 FINAL DRAFT
DSLIewgferfhbhthhthtrhtrhrtrthtrherNK1.pdf
Web Quest Baseball
CLanctot_DSlavin_JMiron_Stats415_Project
[系列活動] 資料探勘速遊 - Session4 case-studies
Predicting Football Match Results with Data Mining Techniques
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
IPL match winning predicion using machine learnong
IPL auction q1_q2.docx
Pierre massé portfolio
NIT1201 Introduction to Database System Assignment by USA Experts
IPL Match winning prediction using machine learning
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Predicting the NBA MVP
IPL WIN PREDICTION.pptx
IPL WIN .pptx
Ad

Recently uploaded (20)

PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
New ISO 27001_2022 standard and the changes
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPT
Predictive modeling basics in data cleaning process
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Transcultural that can help you someday.
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Business Analytics and business intelligence.pdf
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
A Complete Guide to Streamlining Business Processes
DOCX
Factor Analysis Word Document Presentation
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Introduction to the R Programming Language
PPTX
Introduction to Inferential Statistics.pptx
DU, AIS, Big Data and Data Analytics.ppt
[EN] Industrial Machine Downtime Prediction
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
New ISO 27001_2022 standard and the changes
Optimise Shopper Experiences with a Strong Data Estate.pdf
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Predictive modeling basics in data cleaning process
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Transcultural that can help you someday.
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Business Analytics and business intelligence.pdf
Navigating the Thai Supplements Landscape.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
A Complete Guide to Streamlining Business Processes
Factor Analysis Word Document Presentation
Qualitative Qantitative and Mixed Methods.pptx
Introduction to the R Programming Language
Introduction to Inferential Statistics.pptx
Ad

NBA playoff prediction Model.pptx

  • 1. A Statistical Model to Predict NBA Play-off Results Rishikesh Ravi Under the guidance of Divya Ma’am
  • 2. Contents No Title 1 Introduction 2 Specifications for Hardware and Software 3 Data Flow Diagram 4 Content 5 Validation 7 Output 11 Thank You 10 Questions 9 Bibliography 8 Conclusion 6 Implementation
  • 3. Introduction This research presentation is on a topic that I am interested in - Sports Analytics. The project can be important in the future as this algorithm or a modified version can be used for prediction of various sports events and the roster construction of NBA teams in the future. In this project, data from various sources have been collected. Various programming and statistics techniques have been used. These techniques have been used to determine the relevance and correlation of different variables in the data set. In addition, various approaches for feature selection have been used. Logistic regression has been used for the prediction model. The final objective was to Predict the Success of an NBA Team. The model was trained using a training set (14 seasons between 2000 and 2020) and the algorithm’s prediction capability was tested on testing data set(3 seasons between 2021 and 2023).
  • 4. Specifications (For the hardware and software) Hardware Specifications Processor- Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz System Type-64-bit operating system, x64-based processor Installed RAM- 8 GB Software Specifications Operating System- Windows 11 , version 22H2 Programming Language- Python 3.10.9 Developer platform- conda 23.3.1
  • 5. Data Flow Diagram Data set of NBA data Feature selection Data Cleaning Train the data Test the model on new dataset Assess accuracy of the output Non cleaned data Use trained model and test data
  • 6. About NBA • NBA is the National Basketball Association • It is a USA based basketball league where the best players from all over the world come and play • There are 2 conferences within NBA - the East and the West • Within each conference there are 15 teams
  • 7. About NBA Play-offs • Each NBA team's success in the regular season is tested in the playoffs • After regular season, based on performance, only 16 teams qualify for play-offs • A qualified team can play a maximum of 4 series in the playoffs- The 1st round, the conference semi-finals, the conference finals and the finals. • Each series consists of a maximum of 7 games and the first team to win 4 games is the winner • Using this program, I have tried to determine which regular season stats are predictors of the post season and have tried to predict the top 4 teams in the playoffs (the teams in the conference finals).
  • 8. About the Project • The dataset is built from NBA data • I have built a Logistic Regression based model to Predict success in the play-offs based on regular season player statistics • Success metric – The Team should finish in the top 4 in its conference after Play-offs. • This model will help us better understand what are the factors that drive success for a Team in NBA • They can use this for their team building and recruitment
  • 9. Key Statistics that were considered • 3 groups of statistics were used- Per game statistics, Advanced statistics and shooting statistics • The per game statistics include- Minutes played by the team’s players per game, Field goals attempted, Field Goals Made, Field goal percentage, 2-point attempts, 2 pointers made, 2-point percentage, 3-point attempts, 3-pointers made, 3-point percentage, Free throws attempted, free throws made, Free throw percentage, Offensive rebounds per game, defensive rebounds per game, Total rebounds per game, blocks per game, steals per game, assists per game, points per game and Personal fouls per game • The advanced statistics include- Average age of players, Offensive Rating, Net Rating, Defensive Rating, Pace of play, Free throw attempt rate, 3- point attempt rate, True shooting percentage, Effective field goal percentage of the team and their opponents, Turn over percentage of the team and their opponents, Offensive and defensive rebound percentage of the team and their opponents.
  • 10. Key Statistics that were considered (Page 2) • The shooting statistics include- Field Goal percentage, Shooting percentages and attempts from 0- 3 feet , 3-10 feet, 10-16 feet and 16 feet till the 3-point line away from the basket, Number of half court shots made and attempted, number of dunks made and attempted, number of layups made and attempted, number of corner 3-point shots made and attempted and average distance of each shot
  • 11. Exploratory Data Analysis (Correlation Graph) • A graph can be plotted in python to analyze the relation of each statistic with every other statistic in the data. This is the correlation graph. • On the right, less features have been used to show how the graph works. In this graph, all the statistics have a positive correlation with each other (correlation varies from 0 to 1). The lightest color represents the highest correlation
  • 12. The correlation plot for the entire data • For this graph, the darkest and the lightest box represent the highest correlation within the data members • Here the correlations are both negative and positive Exploratory Data Analysis (Correlation Graph)
  • 13. Feature Selection • Feature selection is the method of filtering out the most important data features from the features that hold less importance. • In this program, we have used feature selection to find the values which have the highest correlation with the target variable, ‘TOP4’
  • 14. Most Important features after feature selection- NRtg Net rating (NRtg) is the difference between the number of points made and defended for every 100 possessions.
  • 15. True Shooting percentage(TS%) is a measure of the number of points scored by a team per every possession. Most Important features after feature selection- TS% PTS- Points per game FGA- Field Goals attempted per game FTA- Free throws attempted per game
  • 16. Most Important features after feature selection- AST and PPG Assists per game (AST) and points per game (PTS) are 2 interrelated stats. More assists lead to more points Assists per game (AST) Points per game (PTS)
  • 17. Most Important features after feature selection- eFG% Effective field goal percentage (eFG%) is a statistic of the percentage of shots made while providing extra weightage to the 3 pointers made. FGM= Field Goals Made 3PM- 3 Pointers Made FGA- Field Goals attempted
  • 18. Methodology- Logistic Regression • Logistic regression is a technique used for solving classification problems (where the target variable is either 1 or 0, true or false). It divides a data set into a training set (the set which gives the data to the program to help in its prediction) and the test set (the set in which the target variable is predicted using training data) • Threshold- The threshold in logistic regression can be explained with an example- In a school exam, if the passing mark is 40%, then 0.4 is the threshold. If a student is predicted to score above 0.4, then 1 will be returned, otherwise 0.
  • 19. Sample Python Program Code used for the prediction of data set
  • 20. Implementation Step 1- define the objective and the success criteria. Step 2- understand and identify the data Step 3- Collect and prepare the data set Step 4- Determine the model’s features (Feature selection) Step 5- Train the data set Step 6- Evaluate the model’s performance Step 7- Put the model in operation Step 8- Check the results and adjust the model accordingly
  • 21. Validation The error that the program shows is that it will call out null values (inputs with no values entered) and will not be able to execute them.
  • 22. Output results • The outputs on the right side of the page show the accuracy of the prediction. The 2 most important terms in it are PRECISION and RECALL. • If there are 40 teams which are ‘1’, and only 20 have been predicted to be ‘1’, then 20/40 is the recall. • Whereas, if out of the 20 teams predicted to be ‘1’, only 16 of them are actually ‘1’, then the precision is 16/20. • The teams predicted as 1 and are actually 1 are true positives. The teams predicted as 0 and are actually 0 are true negatives. The teams predicted as 1 and are actually 0 are false positives and the teams predicted as 0 and are actually 1 are false negatives. • From the predicted data set, there were 78 true negatives, 6 false negatives, 6 true positives , and 0 false positives. Output of the train set Prediction output for test set
  • 23. Conclusion • A logistic regression model was built for this project. • Success in the model was predicting that a team reached the conference finals (Top 4). • All the 6 teams that the model predicted to reach the conference finals did reach the top four. For example, 2022 Golden State warriors, the 2022 Boston Celtics and 2023 Boston Celtics. • On the other hand, teams like 2023 Miami heat and 2023 Los Angeles Lakers were teams which made the conference finals but were not predicted by the model. • In order to improve the model, new statistics like number of all NBA players from the previous season or number of MVP level players should be taken into consideration. • Other techniques like Random Forest could also be used to improve the model • This type of model could be used for other sports like cricket, football, etc
  • 24. Bibliography • YUBI internship training for data science • Basketball-reference.com • www.nbastuffer.com • Python for machine learning and Data science bootcamp – A Udemy Course