SlideShare a Scribd company logo
Decision Tree
Modelling With
Orange
Identify Rules that Predict
Patient’s Heart Disease
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
Characteristics of Orange
Visual programming makes data
mining accessible to a broader
audience
Provides comprehensive data
preprocessing tools
A vast collection of machine learning
algorithms is available
Excels in interactive data visualisation
Scalable, and integrates with external
software packages
An open-source project with a vibrant
community
Project’s Context, Objective & Strategies
Make Insight-informed Decisions
Clinic collected data on heart
disease diagnosis and other
patient information, and wants to
use the data to make insight-
informed decisions
Predict Patient’s Well-being
To identify the rules that will
predict whether a patient will have
heart disease in the future, based
on the data collected on him/her
Deploy Decision Tree Model
Create a Decision Tree Model, with
rules, to predict whether a patient
will have a heart disease in the future
based on collected data
To train and evaluate the model
Boost the model’s performance
Conduct predictions
Exploratory Data Analysis (EDA)
Findings
Target = Heart Disease
This is a categorical variable,
which has a limited number of
possible values; making it easier
to predict than a continuous
variable, like blood pressure or
cholesterol level
Feature Columns = 9
Row Instances = 918
Blanks & Outliers = None
Decision Tree Workflow in Orange
Loading File, Selecting Columns & Splitting Data
Loading File
Medical.csv file was loaded into workflow with
‘Gender’, ‘FastingBS’ & ‘Exercise’ classified as
‘categorical’ data & given ‘feature’ role, and
‘HeartDisease’ classified as categorical data
&given the ‘target’ role in the ‘File’ Widget
Selecting Columns
In the ‘Select Column’ Widget,
all feature columns were posted
into the ‘Features’ box. The
‘HeartDisease’, which is the ‘target’ was
clicked into the ‘Target’ box in this widget
Splitting Data
Dataset divided into 70% for
training the model while
keeping the remaining
30% for testing the model
Initial Evaluation of Decision Tree Model
Evaluation of Model (30%)
Classification Accuracy for this
model, trained on 30% of the
dataset, is 76.4%
Tree Depth Limited to 10
For initial assessment of the
performance of the Decision Tree
Model, in the Tree Widget, the
maximal tree depth was limited to 10
Evaluation of Model (70%)
Classification Accuracy for this
model, trained on 70% of the
dataset, is 97.1%
Findings
At the Tree Depth of 10, the model
displayed a difference of 15%
when fed with training & testing dataset
Conclusion
Suggests that the Decision
Tree Model Has Been
Overfitted to the training data
Follow-up
To tune the hyperparameters of
the model to enable it to
generalise better to perform well
with the testing data
Tuning the Model to Improve Generalisation
Evaluation of Model (30%)
Classification Accuracy for this
model, trained on 30% of the
dataset, is 80.7%
Tree Depth Now Limited to 3
To tune the model, the maximal tree
depth was adjusted several times.
The depth of 3 was
chosen as Classification
Accuracy scores on training
and testing data are high (about 80%)
while the difference between scores
is negligible (at 1.6%)
Evaluation of Model (70%)
Classification Accuracy for this
model, trained on 70% of the
dataset, is 82.3%
Confusion Table: False Positives/Negatives
Tree Depth at 10 Tree Depth at 3
False Negative = 17.8%
False Positive = 27.4%
Patients may become untreatable when their conditions go untreated (for False Negatives) or may
have to pay for unwanted treatments and bare the consequences of unneedful side-effects from
the treatment (for False Positives). So, reducing the number of False Negatives and False Positives
in the model is beneficial
False Negative = 19.1%
False Positive = 19.4%
While False Negatives have increased by 1.3%, False
Positives have dropped by 8% with the overall model’s
Classification Accuracy improved by 4.3%
Rules Predicting Patient’s Heart Disease*
Sequence of splitting the criteria suggests that Exercise as the top priority
rule with Cholesterol and MaxHR as the two other influencers to
likelihood of Heart Disease in patients
* More details are found in the project report, which are not released at the request of the Clinic
Decision Tree
Modelling With
Orange
Identify Rules that Predict
Patient’s Heart Disease
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com

More Related Content

PPT
Draft AMCP 2006 Model Quality 4-4-06
PDF
Fundamentals of data science presentation
PDF
Multivariate sample similarity measure for feature selection with a resemblan...
PPTX
Short story_2.pptx
PDF
Diabetespredictionbyusingmachinelearning.pdf
DOCX
Heart disease prediction system
PDF
IRJET- Prediction and Analysis of Heart Disease using SVM Algorithm
DOCX
Introductionedited
Draft AMCP 2006 Model Quality 4-4-06
Fundamentals of data science presentation
Multivariate sample similarity measure for feature selection with a resemblan...
Short story_2.pptx
Diabetespredictionbyusingmachinelearning.pdf
Heart disease prediction system
IRJET- Prediction and Analysis of Heart Disease using SVM Algorithm
Introductionedited

Similar to Identify Rules that Predict Patient’s Heart Disease - An Application of Decision Tree Modelling in Orange (20)

PPTX
Sample size & meta analysis
PPTX
Short story.pptx
PDF
Predicting diabetes using a machine learning approach linked in
DOCX
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
PDF
Biostatistics clinical research & trials
PDF
Multi Disease Detection using Deep Learning
PDF
Chronic Kidney Disease Prediction Using Machine Learning
PDF
A Framework for Statistical Simulation of Physiological Responses (SSPR).
PDF
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
PDF
Enhanced Detection System for Trust Aware P2P Communication Networks
PDF
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
PDF
Heart disease classification using optimized Machine learning algorithms.pdf
DOCX
Dissertation
PDF
Predicting Heart Disease Using Machine Learning Algorithms.
PDF
Performance evaluation of random forest with feature selection methods in pre...
DOCX
Statistical ProcessesCan descriptive statistical processes b.docx
PPTX
KG_based pharma marketing.pptx
PPTX
Statistics in meta analysis
PPT
26738157 sampling-design
PPTX
Data science
Sample size & meta analysis
Short story.pptx
Predicting diabetes using a machine learning approach linked in
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Biostatistics clinical research & trials
Multi Disease Detection using Deep Learning
Chronic Kidney Disease Prediction Using Machine Learning
A Framework for Statistical Simulation of Physiological Responses (SSPR).
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Enhanced Detection System for Trust Aware P2P Communication Networks
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
Heart disease classification using optimized Machine learning algorithms.pdf
Dissertation
Predicting Heart Disease Using Machine Learning Algorithms.
Performance evaluation of random forest with feature selection methods in pre...
Statistical ProcessesCan descriptive statistical processes b.docx
KG_based pharma marketing.pptx
Statistics in meta analysis
26738157 sampling-design
Data science
Ad

More from ThinkInnovation (20)

PPTX
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
PDF
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
PDF
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
PDF
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
PDF
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
PDF
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
PDF
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
PDF
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
PDF
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
PDF
Optimal Decision Making - Cost Reduction in Logistics
PDF
Create Data Model & Conduct Visualisation in Power BI Desktop
PDF
Using DAX & Time-based Analysis in Data Warehouse
PDF
Creating Data Warehouse Using Power Query & Power Pivot
PPTX
Unlocking New Insights Into the World of European Soccer Through the European...
PPT
Breakfast Talk - Manage Projects
PPT
Think innovation issue 4 share - scamper
PPT
PPT
Reverse Assumption Method
PPT
Psyche of Facilitation - The New Language of Facilitating Conversations
PPT
Visual Connection - Ideation Through Word Association
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Optimal Decision Making - Cost Reduction in Logistics
Create Data Model & Conduct Visualisation in Power BI Desktop
Using DAX & Time-based Analysis in Data Warehouse
Creating Data Warehouse Using Power Query & Power Pivot
Unlocking New Insights Into the World of European Soccer Through the European...
Breakfast Talk - Manage Projects
Think innovation issue 4 share - scamper
Reverse Assumption Method
Psyche of Facilitation - The New Language of Facilitating Conversations
Visual Connection - Ideation Through Word Association
Ad

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Microsoft Core Cloud Services powerpoint
PDF
Lecture1 pattern recognition............
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
A Complete Guide to Streamlining Business Processes
PDF
How to run a consulting project- client discovery
PPTX
Modelling in Business Intelligence , information system
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Managing Community Partner Relationships
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Business Analytics and business intelligence.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
Introduction-to-Cloud-ComputingFinal.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Microsoft Core Cloud Services powerpoint
Lecture1 pattern recognition............
Pilar Kemerdekaan dan Identi Bangsa.pptx
climate analysis of Dhaka ,Banglades.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
A Complete Guide to Streamlining Business Processes
How to run a consulting project- client discovery
Modelling in Business Intelligence , information system
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Managing Community Partner Relationships
importance of Data-Visualization-in-Data-Science. for mba studnts
Business Analytics and business intelligence.pdf

Identify Rules that Predict Patient’s Heart Disease - An Application of Decision Tree Modelling in Orange

  • 1. Decision Tree Modelling With Orange Identify Rules that Predict Patient’s Heart Disease Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com
  • 2. Characteristics of Orange Visual programming makes data mining accessible to a broader audience Provides comprehensive data preprocessing tools A vast collection of machine learning algorithms is available Excels in interactive data visualisation Scalable, and integrates with external software packages An open-source project with a vibrant community
  • 3. Project’s Context, Objective & Strategies Make Insight-informed Decisions Clinic collected data on heart disease diagnosis and other patient information, and wants to use the data to make insight- informed decisions Predict Patient’s Well-being To identify the rules that will predict whether a patient will have heart disease in the future, based on the data collected on him/her Deploy Decision Tree Model Create a Decision Tree Model, with rules, to predict whether a patient will have a heart disease in the future based on collected data To train and evaluate the model Boost the model’s performance Conduct predictions
  • 4. Exploratory Data Analysis (EDA) Findings Target = Heart Disease This is a categorical variable, which has a limited number of possible values; making it easier to predict than a continuous variable, like blood pressure or cholesterol level Feature Columns = 9 Row Instances = 918 Blanks & Outliers = None
  • 6. Loading File, Selecting Columns & Splitting Data Loading File Medical.csv file was loaded into workflow with ‘Gender’, ‘FastingBS’ & ‘Exercise’ classified as ‘categorical’ data & given ‘feature’ role, and ‘HeartDisease’ classified as categorical data &given the ‘target’ role in the ‘File’ Widget Selecting Columns In the ‘Select Column’ Widget, all feature columns were posted into the ‘Features’ box. The ‘HeartDisease’, which is the ‘target’ was clicked into the ‘Target’ box in this widget Splitting Data Dataset divided into 70% for training the model while keeping the remaining 30% for testing the model
  • 7. Initial Evaluation of Decision Tree Model Evaluation of Model (30%) Classification Accuracy for this model, trained on 30% of the dataset, is 76.4% Tree Depth Limited to 10 For initial assessment of the performance of the Decision Tree Model, in the Tree Widget, the maximal tree depth was limited to 10 Evaluation of Model (70%) Classification Accuracy for this model, trained on 70% of the dataset, is 97.1% Findings At the Tree Depth of 10, the model displayed a difference of 15% when fed with training & testing dataset Conclusion Suggests that the Decision Tree Model Has Been Overfitted to the training data Follow-up To tune the hyperparameters of the model to enable it to generalise better to perform well with the testing data
  • 8. Tuning the Model to Improve Generalisation Evaluation of Model (30%) Classification Accuracy for this model, trained on 30% of the dataset, is 80.7% Tree Depth Now Limited to 3 To tune the model, the maximal tree depth was adjusted several times. The depth of 3 was chosen as Classification Accuracy scores on training and testing data are high (about 80%) while the difference between scores is negligible (at 1.6%) Evaluation of Model (70%) Classification Accuracy for this model, trained on 70% of the dataset, is 82.3%
  • 9. Confusion Table: False Positives/Negatives Tree Depth at 10 Tree Depth at 3 False Negative = 17.8% False Positive = 27.4% Patients may become untreatable when their conditions go untreated (for False Negatives) or may have to pay for unwanted treatments and bare the consequences of unneedful side-effects from the treatment (for False Positives). So, reducing the number of False Negatives and False Positives in the model is beneficial False Negative = 19.1% False Positive = 19.4% While False Negatives have increased by 1.3%, False Positives have dropped by 8% with the overall model’s Classification Accuracy improved by 4.3%
  • 10. Rules Predicting Patient’s Heart Disease* Sequence of splitting the criteria suggests that Exercise as the top priority rule with Cholesterol and MaxHR as the two other influencers to likelihood of Heart Disease in patients * More details are found in the project report, which are not released at the request of the Clinic
  • 11. Decision Tree Modelling With Orange Identify Rules that Predict Patient’s Heart Disease Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com