Class 20 - MODEL EVALUATION METRICS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 20 - MODEL EVALUATION METRICS

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Today discussion is dependent on Classification Models.

Accuracy:

It only apply to Classification Models not Regression Models.

Accuracy Example:

Consider that there are 98% samples of class A and 2% samples of class B in our training set. Then our model can easily get 98% training accuracy by simply predicting every training sample belonging to class A.

When the same model is tested on a test set with 60% samples of class A and 40% samples of class B, then the test accuracy would drop down to 60%.

Classification Accuracy is great, but gives us the false sense of achieving high accuracy.

The real problem arises, when the cost of misclassification of the minor class samples are very high.

Imbalance means: frequency of occurance of two classes, for categorical data

Skewed means: data distribution, when data is numerical

Balanced data means, both classes are 50%, 50%. It may be 60%, 40% depends on the situation.

If data is imbalanced, don't use accuracy.

Now, we will talk about Evaluation of Binary Classification.

e.g, Smoker & non-smoker, covid +ve and -ve etc.

Data contains two categories.

4 important things in Binary Classification:

TP (True Positive)

The cases in which we predicted YES and the actual output was also YES.

TN (True Negative)

The cases in which we predicted NO and the actual output was NO

FP (False Positive)

The cases in which we predicted YES and the actual output was NO.

FN (False Negative)

The cases in which we predicted NO and the actual output was YES.

These 4 matrix are CONFUSION MATRIX

Confusion Matrix - Precision:

It is the number of correct positive results divided by the number of positive results predicted by the classifier.

precision emphasizes minimizing false positives.

Confusion Matrix - Recall:

The ratio of correctly predicted positive instances to the total actual positive instances.

Recall focuses on minimizing false negatives.

{import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

import sklearn

import seaborn as sns

import warnings

warnings.filterwarnings('ignore')

plt.rcParams["figure.figsize"] = [10,5]

# Ignore warnings

import warnings

# Set the warning filter to ignore FutureWarning

warnings.simplefilter(action = "ignore", category = FutureWarning)}

{# Data shape

print('train data:',full_data.shape)}

{# View first few rows

full_data.head(5)}

{# Data Info

full_data.info()}

{# Split data to be used in the models

# Create matrix of features

x = full_data.drop('Survived', axis = 1) # grabs everything else but 'Survived'

# Create target variable

y = full_data['Survived'] # y is the column we're trying to predict

}

{# Use x and y variables to split the training data into train and test set

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = .20, random_state = 101)}

{print("Shape of x_train: ",x_train.shape)

print("Shape of y_train: ",y_train.shape)

print("---"*10)

print("Shape of x_test: ",x_test.shape)

print("Shape of y_test: ",y_test.shape)}

{# Import model

from sklearn.linear_model import LogisticRegression

print('Logistic Regression')

# Create instance of model

log_reg = LogisticRegression()

# Pass training data into model

log_reg.fit(x_train, y_train)}

{from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_log_reg = log_reg.predict(x_test)

# Score It

print('Logistic Regression')

# Accuracy

print('--'*30)

log_reg_accuracy = round(accuracy_score(y_test, y_pred_log_reg) * 100,2)

print('Accuracy', log_reg_accuracy,'%')}

{from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_log_reg)

recall = recall_score(y_test, y_pred_log_reg)

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_log_reg)

print(confusion)

sns.heatmap(confusion, annot=True, fmt="d")}

{from sklearn.tree import DecisionTreeClassifier

print('Decision Tree Classifier')

# Create instance of model

Dtree = DecisionTreeClassifier()

# Pass training data into model

Dtree.fit(x_train, y_train)

}

{from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_Dtree = Dtree.predict(x_test)

# Score It

print('Decision Tree Classifier')

# Accuracy

print('--'*30)

Dtree_accuracy = round(accuracy_score(y_test, y_pred_Dtree) * 100,2)

print('Accuracy', Dtree_accuracy,'%')}

{from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_Dtree)

recall = recall_score(y_test, y_pred_Dtree)

Dtree_accuracy = round(accuracy_score(y_test, y_pred_Dtree) * 100,2)

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_Dtree)

sns.heatmap(confusion, annot=True, fmt="d")}

{from sklearn.ensemble import RandomForestClassifier

print('Random Forest Classifier')

# Create instance of model

rfc = RandomForestClassifier()

# Pass training data into model

rfc.fit(x_train, y_train)

}

{from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_rfc = rfc.predict(x_test)

# Score It

print('Random Forest Classifier')

# Accuracy

print('--'*30)

rfc_accuracy = round(accuracy_score(y_test, y_pred_rfc) * 100,2)

print('Accuracy', rfc_accuracy,'%')}

{from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_rfc)

recall = recall_score(y_test, y_pred_rfc)

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_rfc)

sns.heatmap(confusion, annot=True, fmt="d")}

{from sklearn.ensemble import GradientBoostingClassifier

print('Gradient Boosting Classifier')

# Create instance of model

gbc = GradientBoostingClassifier()

# Pass training data into model

gbc.fit(x_train, y_train)

}

{from sklearn.metrics import accuracy_score

# prediction from the model

y_pred_gbc = gbc.predict(x_test)

# Score It

print('Gradient Boosting Classifier')

# Accuracy

print('--'*30)

gbc_accuracy = round(accuracy_score(y_test, y_pred_gbc) * 100,2)

print('Accuracy', gbc_accuracy,'%')}

{from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Calculate precision and recall

precision = precision_score(y_test, y_pred_gbc)

recall = recall_score(y_test, y_pred_gbc)

# Print the results

print(f'Precision: {precision:.2f}')

print(f'Recall: {recall:.2f}')

print("--"*30)

# Calculate confusion matrix

confusion = confusion_matrix(y_test, y_pred_gbc)

sns.heatmap(confusion, annot=True, fmt="d")}

{sns.countplot(x="Survived", data=full_data, palette="Blues");

plt.show()}

{# Sample model scores (replace these with your actual model scores)

model_scores = {

"Logistic Regression": 82.02,

"Decision Tree Classifier": 80.34,

"Random Forest Classifier": 82.58,

"Gradient Boosting Classifier": 84.27

}

# Sort the model scores in descending order based on their values (higher values first)

sorted_scores = sorted(model_scores.items(), key=lambda x: x[1], reverse=True)

# Display the ranking of the models

print("Model Rankings (Greater Values are better):")

for rank, (model_name, score) in enumerate(sorted_scores, start=1):

print(f"{rank}. {model_name}: {score}")

}

#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem

Class 20 - MODEL EVALUATION METRICS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Hamza Nadeem

Founder & CEO H-Tech | AI Enthusiastic

More articles by this author

Others also viewed

Support Vector Regression in Machine Learning: A Comprehensive Guide

Handling Imbalanced Datasets in Machine Learning

AN INTRODUCTION TO MULTIPLE LINEAR REGRESSION IN ML

K-Means Clustering Explained: How Machines Learn to Group Without Labels

XGboost

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 17 - LINEAR REGRESSION USING SCIKIT-LEARN Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

An Executive’s View: Overview of major Machine Learning Algorithms

A Tour of Machine Learning Algorithms

Explore topics

ARTIFICIAL NEURAL NETWORK Notes from the AI Advance course-Class 25 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

May 29, 2024

Basics of NumPy

May 16, 2024

DEEP LEARNING Notes from the AI Advance course-Class 24 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

May 14, 2024

Class 35 - CLASSIFICATION MODEL USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 23, 2024

Class 34 - REGRESSION USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 22, 2024

Class 33 - INTRODUCTION TO LLAMA INDEX Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 20, 2024

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 18, 2024

Class 31 - DOCUMENT GPT HANDS-ON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 9, 2024

Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 8, 2024

Class 29 - CHATBOT DEBUGGING IN VS CODE Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Apr 4, 2024

Others also viewed

Support Vector Regression in Machine Learning: A Comprehensive Guide

Handling Imbalanced Datasets in Machine Learning

AN INTRODUCTION TO MULTIPLE LINEAR REGRESSION IN ML

K-Means Clustering Explained: How Machines Learn to Group Without Labels

XGboost

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 17 - LINEAR REGRESSION USING SCIKIT-LEARN Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

An Executive’s View: Overview of major Machine Learning Algorithms

A Tour of Machine Learning Algorithms

Explore topics