NBA playoff prediction Model.pptx

A Statistical Model to Predict NBA
Play-off Results
Rishikesh Ravi
Under the guidance of Divya Ma’am

Contents
No Title
1 Introduction
2 Specifications for Hardware and Software
3 Data Flow Diagram
4 Content
5 Validation
7 Output
11 Thank You
10 Questions
9 Bibliography
8 Conclusion
6 Implementation

Introduction
This research presentation is on a topic that I am interested in - Sports Analytics. The project can
be important in the future as this algorithm or a modified version can be used for prediction of
various sports events and the roster construction of NBA teams in the future.
In this project, data from various sources have been collected. Various programming and statistics
techniques have been used. These techniques have been used to determine the relevance and
correlation of different variables in the data set. In addition, various approaches for feature
selection have been used. Logistic regression has been used for the prediction model.
The final objective was to Predict the Success of an NBA Team. The model was trained using a
training set (14 seasons between 2000 and 2020) and the algorithm’s prediction capability was
tested on testing data set(3 seasons between 2021 and 2023).

Specifications (For the hardware and software)
Hardware Specifications
Processor- Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz
System Type-64-bit operating system, x64-based processor
Installed RAM- 8 GB
Software Specifications
Operating System- Windows 11 , version 22H2
Programming Language- Python 3.10.9
Developer platform- conda 23.3.1

Data Flow Diagram
Data set of NBA data Feature selection Data Cleaning
Train the
data
Test the
model on
new
dataset
Assess accuracy of
the output
Non cleaned data
Use trained model and
test data

About NBA
• NBA is the National Basketball Association
• It is a USA based basketball league where the best players
from all over the world come and play
• There are 2 conferences within NBA - the East and the West
• Within each conference there are 15 teams

About NBA Play-offs
• Each NBA team's success in the regular season is tested in the playoffs
• After regular season, based on performance, only 16 teams qualify for play-offs
• A qualified team can play a maximum of 4 series in the playoffs- The 1st round, the conference
semi-finals, the conference finals and the finals.
• Each series consists of a maximum of 7 games and the first team to win 4 games is the winner
• Using this program, I have tried to determine which regular season stats are predictors of the
post season and have tried to predict the top 4 teams in the playoffs (the teams in the
conference finals).

About the Project
• The dataset is built from NBA data
• I have built a Logistic Regression based model to Predict success in the play-offs
based on regular season player statistics
• Success metric – The Team should finish in the top 4 in its conference after Play-offs.
• This model will help us better understand what are the factors that drive success for
a Team in NBA
• They can use this for their team building and recruitment

Key Statistics that were considered
• 3 groups of statistics were used- Per game statistics, Advanced statistics and shooting statistics
• The per game statistics include- Minutes played by the team’s players per game, Field
goals attempted, Field Goals Made, Field goal percentage, 2-point attempts, 2 pointers made,
2-point percentage, 3-point attempts, 3-pointers made, 3-point percentage, Free throws
attempted, free throws made, Free throw percentage, Offensive rebounds per game, defensive
rebounds per game, Total rebounds per game, blocks per game, steals per game, assists per
game, points per game and Personal fouls per game
• The advanced statistics include- Average age of players, Offensive Rating, Net Rating,
Defensive Rating, Pace of play, Free throw attempt rate, 3- point attempt rate, True shooting
percentage, Effective field goal percentage of the team and their opponents, Turn over
percentage of the team and their opponents, Offensive and defensive rebound percentage of
the team and their opponents.

Key Statistics that were considered (Page 2)
• The shooting statistics include- Field Goal percentage, Shooting percentages and
attempts from 0- 3 feet , 3-10 feet, 10-16 feet and 16 feet till the 3-point line away from the
basket, Number of half court shots made and attempted, number of dunks made and
attempted, number of layups made and attempted, number of corner 3-point shots made
and attempted and average distance of each shot

Exploratory Data Analysis (Correlation Graph)
• A graph can be plotted in python to analyze the relation
of each statistic with every other statistic in the data.
This is the correlation graph.
• On the right, less features have been used to show how
the graph works. In this graph, all the statistics have a
positive correlation with each other (correlation varies
from 0 to 1). The lightest color represents the highest
correlation

The correlation plot for the entire data
• For this graph, the darkest and the
lightest box represent the highest
correlation within the data members
• Here the correlations are both negative
and positive
Exploratory Data Analysis (Correlation Graph)

Feature Selection
• Feature selection is the method of
filtering out the most important
data features from the features that
hold less importance.
• In this program, we have used
feature selection to find the values
which have the highest correlation
with the target variable, ‘TOP4’

Most Important features after feature selection- NRtg
Net rating (NRtg) is the difference
between the number of points made
and defended for every 100
possessions.

True Shooting percentage(TS%) is a
measure of the number of points scored
by a team per every possession.
Most Important features after feature selection- TS%
PTS- Points per game
FGA- Field Goals attempted per game
FTA- Free throws attempted per game

Most Important features after feature selection- AST
and PPG
Assists per game (AST) and points
per game (PTS) are 2 interrelated
stats.
More assists lead to more points
Assists per game (AST)
Points per game (PTS)

Most Important features after feature selection- eFG%
Effective field goal percentage (eFG%)
is a statistic of the percentage of shots made
while providing extra weightage to the 3
pointers made.
FGM= Field Goals Made
3PM- 3 Pointers Made
FGA- Field Goals attempted

Methodology- Logistic Regression
• Logistic regression is a technique used for solving
classification problems (where the target variable is either 1 or
0, true or false). It divides a data set into a training set (the set
which gives the data to the program to help in its prediction)
and the test set (the set in which the target variable is predicted
using training data)
• Threshold- The threshold in logistic regression can be
explained with an example- In a school exam, if the passing
mark is 40%, then 0.4 is the threshold. If a student is predicted
to score above 0.4, then 1 will be returned, otherwise 0.

Sample Python Program
Code used for the prediction of data set

Implementation
Step 1- define the objective and the success criteria.
Step 2- understand and identify the data
Step 3- Collect and prepare the data set
Step 4- Determine the model’s features (Feature selection)
Step 5- Train the data set
Step 6- Evaluate the model’s performance
Step 7- Put the model in operation
Step 8- Check the results and adjust the model accordingly

Validation
The error that the program shows is that it will call out null values (inputs with no values entered) and
will not be able to execute them.

Output results
• The outputs on the right side of the page show the
accuracy of the prediction. The 2 most important terms in
it are PRECISION and RECALL.
• If there are 40 teams which are ‘1’, and only 20 have been
predicted to be ‘1’, then 20/40 is the recall.
• Whereas, if out of the 20 teams predicted to be ‘1’, only 16
of them are actually ‘1’, then the precision is 16/20.
• The teams predicted as 1 and are actually 1 are true
positives. The teams predicted as 0 and are actually 0
are true negatives. The teams predicted as 1 and are
actually 0 are false positives and the teams predicted as 0
and are actually 1 are false negatives.
• From the predicted data set, there were 78 true
negatives, 6 false negatives, 6 true positives , and
0 false positives.
Output of the train set
Prediction output for test set

Conclusion
• A logistic regression model was built for this project.
• Success in the model was predicting that a team reached the conference finals (Top 4).
• All the 6 teams that the model predicted to reach the conference finals did reach the top four. For
example, 2022 Golden State warriors, the 2022 Boston Celtics and 2023 Boston Celtics.
• On the other hand, teams like 2023 Miami heat and 2023 Los Angeles Lakers were teams which made the
conference finals but were not predicted by the model.
• In order to improve the model, new statistics like number of all NBA players from the previous season or
number of MVP level players should be taken into consideration.
• Other techniques like Random Forest could also be used to improve the model
• This type of model could be used for other sports like cricket, football, etc

Bibliography
• YUBI internship training for data science
• Basketball-reference.com
• www.nbastuffer.com
• Python for machine learning and Data science bootcamp – A Udemy Course

NBA playoff prediction Model.pptx

More Related Content

Similar to NBA playoff prediction Model.pptx (20)

Recently uploaded (20)

NBA playoff prediction Model.pptx