SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 685
Prediction of Dengue, Diabetes and Swine Flu Using Random Forest
Classification Algorithm
Amit Tate1, Ujwala Gavhane2, Jayanand Pawar3, Bajrang Rajpurohit4 , Gopal B. Deshmukh5
1,2,3,4UG Student, M.E.S. College of Engineering, Pune, SPPU
5Deparent of Computer Engineering, M.E.S. College of Engineering, Pune, SPPU
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract: - In this article we proposed disease prediction
system using Random Forest Algorithm (RFA). Training
dataset is used for prediction of particular disease. The main
aim of this article is that to predict the disease which input
symptoms is taken from patient or user. Recommend a
specialized doctor for particular disease if result positive. Our
algorithms are extendible to dealwithmobile/onlinesolutions
to support patients as well for medical diagnostics. As a first
step, we also developed web-based interfaces to support
patients in calculating risk level for each medical case.
Keywords: - Random Forest Algorithm (RFA), Machine
Learning, Out-Of-Bag (OOB).
1. INTRODUCTION
Supervised learning is part of Machine learning that consist
training dataset which is labelled. In proposed system with
the help of supervised learning we can predicttheclasslabel
which from user input. Disease prediction has become
important in a variety of applications such as health
insurance, tailored health communication and publichealth.
Disease prediction is usually performed using publicly
available datasets.
Disease involved in prediction system as following
A. Swine Flu
B. Diabetes
C. Dengue
Swine flu is a respiratory diseasecausedbyinfluenza viruses
that infect the respiratory tract of pigs and result in a
barking cough, decreased appetite, nasal secretions, and
listless behavior; the virus can be transmitted to humans.
Diabetes is a number of diseases that involve problems with
the hormone insulin. Normally, the pancreas (an organ
behind the stomach) releases insulin to helpyour bodystore
and use the sugar and fat from the food you eat.
Dengue fever is a painful, debilitating mosquito-borne
disease caused by any one of four closely related dengue
viruses. These viruses are related to the viruses that cause
West Nile infection and yellow fever.
Disease prediction using Random forest algorithm is
proposed for Dengue, Diabetes and Swine Flu diseases.
Training dataset is given for predict the particular disease.
Training dataset for each disease is described in III section.
Fig -1: Proposed system for disease prediction system
using Random Forest Algorithm.
In fig 1 contains detailed description of proposed disease
prediction system. Firstly, user log in into our system,ifuser
don’t have login then they need to register itself. After
successfully login user/patient can check their disease
details using prediction system. User/Patient need to enter
or select their symptoms for particular disease (Diabetes,
Swine flu, Dengue). Once user select the disease and their
symptoms then our prediction system will predict the
disease using training dataset. Result should be show
Positive either Negative result. If system gives to user
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 686
Positive result, then system should shows recommend
doctor for those disease and user details sent to specialized
doctor for that particular disease. Patients can make
appointment directly through recommendation system.
Doctor will response back to patient quickly. If system
gives to user negative result, then system should show
precautions for those disease.
2. RELATED WORK
Author introduced classification algorithmsusedfordisease
prediction and comparison of each algorithm is described in
details with graph analysis such as Prediction time, Recall,
Precision, TP (True Positive) rate, FP (False Positive) rate.
[1]. Here our system is using Random forest algorithm
because author introduced this algorithm is best for disease
prediction. [1]
The RandomForestapproachoffersthehighestaccurateness
rate reaching 99.9% with 9 measured factors after the
reduction step. In this paper, author describes four case
studies fromtheLebanesehealthdomain,AcuteAppendicitis
(AP), Premature Birth (PB), Coronary Heart Disease (CHD),
and Osteoporosis Disease (OD). For these applications,
prediction systems were developed for decision support
using data mining techniques. [2]
In this paper, author proposed a model based on four years’
medical analysis data to predict the change possibility ofthe
coming year’s FBG (Fasting blood glucose). Based on four
years’ historic medical examination data, a predictionmodel
of coming year’s FBG is offered using traditional data mining
techniques with Random forest algorithmandSVM(Support
Vector Algorithm). [3]
3. PROPOSED SYSTEM
Proposed system is Disease prediction system which can
predict the disease with the help of training datasetanduser
input.
Proposed system includes the Patient healthcare, Disease
prediction (Dengue, Diabetes, and Swine flu), Doctor
Recommendation for particular disease, Precaution for
disease, Doctor details for when patients need to contact
them directly, Make an appointment withspecializeddoctor.
Disease prediction means that user will gives some input to
our system in the form of yes or no for symptoms of
particular disease and this system will proceedonthatgiven
symptoms. Finally predict the accurate disease using
training dataset and given input.Randomforestwill proceed
on that given symptoms and predict the accurate output.
4. TECHNOLOGY OVERVIEW
Random Forest algorithm are an ensemble supervised
learning method which is used as predictor of data for
classification and regression. In the classification process
algorithm build a number of decision trees at training time
and construct the class that is the mode of the classes output
by using each single tree. (Random Forests is introduced by
Leo Breiman and Adele Cutler for an ensemble of decision
trees). [5]
Random Forest algorithm is a grouping of tree predictors
where each tree based on the values of a random vector
experimented independently with the equal distribution for
all trees in the forest. The basic principle is that a group of
“weak learners” can come together to form a “strong
learner”. Random Forests are a perfect tool for making
predictions considering they do not over fit. Presenting the
accurate kind of randomness makes them accurate
classifiers and regression. [5]
Single decision trees often have high variance or high bias.
Random Forests trying to moderate the high variance
problems and high bias by averaging to find a natural
balance between the two extremes. Considering that
Random Forests have few parameters to tune and can be
used simply with default parameter settings, they are a
simple tool to use without having a model or to produce a
reasonable model fast and efficiently.
Random Forests are easy to learn and use for both
professionals and lay people - with little research and
programming required and may be used by folks without a
strong statistical background. Simply put, you can safely
make more accurate predictions without most basic
mistakes common to other methods.
Random Forests produces several classifications for given
trees. Each tree is grown as follows:
1. If number of circumstances in the training data set is D,
sample D cases at random state but with replacement, from
the original dataset. This sample testing set will be the
training set for increasing the tree.
2. If there are input variables from training dataset, a
number is indicated such that at each node of the tree, m
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 687
variables are selected at random available for the and the
best splitting on these is used to splitting the node. The
value of is used as constant during entire the forest
growing.
3. Each tree is grown to the largest size as possible. There
is no pruning an overall grownup tree.
The random forest algorithm is an ensemble classifier
algorithm based on the decision tree model. It generates k
different training data subsets fromanoriginal datasetusing
a bootstrap sampling approach, and then, k decision trees
are built by training these subsets. A random forest is finally
constructed from these decision trees. Each sample of the
testing dataset is predicted by all decisiontrees,andthefinal
classification result is returned depending on the votes of
these trees. [11]
Fig 2. Process of the construction of the RF algorithm
The original training dataset is formalized as
S = {(ai, bj), i = 1, 2... D; j = 1, 2... },
Where A is a sample and b is a feature variable of S. Namely,
the original training dataset contains D samples, and there
are feature variables in each sample.
The main process of the construction of the RF algorithm is
presented in Fig. 2[1]
The steps of the construction of the random forestalgorithm
are as follows:
Step 1. Sampling k training subsets.
In the first step, k training datasets are experimented from
the original training dataset S in a bootstrap selection
manner. Namely, N records are selected from S by a random
sampling and replacement method in each sampling time.
After the current step, k trainingsubsetsareconstructedasa
collection of training subsets S Train:
S Train = {S1, S2…….Sk}.
At the same time, the records that are not to be selected in
each sampling period are composed as an Out-Of-Bag(OOB)
dataset.
In this way, k OOB sets are constructed as a collection of
SOOB:
SOOB = {OOB1, OOB2... OOBk},
Where k ≪ N, Si ∩ OOBi = ϕ and Si ∪ OOBi = S.
To obtain the classification accuracy of each tree model,
these OOB sets are used as testing sets after the training
process. [1]
Step 2. Constructing each decision tree model.
In an RF model, each Meta decision tree is created by a C4.5
or CART algorithm from each training subset Si. In the
growth process of each tree, m feature variablesofdatasetSi
are randomly selected from M variables. In each tree node’s
dividing process is done, then gain ratio of each feature
variable is computed, and the best one or most priority node
is chosen as the splitting node. This splitting process is
repeated until a leaf node is generated. Finally, k decision
trees are trained from k training subsetsinthesameway. [1]
Step 3. Collecting k trees into an RF model.
The k trained trees are collected into an RF model, which is
defined in Eq. (1):
where hi(x,Θj) is a meta decision tree classifier, X are the
input feature vectors of the training dataset, and Θj is an
independent and identically distributed random vector that
determines the growth process of the tree. [1]
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 688
5. HOW IT WORK
Proposed system disease prediction system fully based on
training dataset. If dataset incorrect then output could be
wrong. In table no 1, described symptoms of Swine flu. In
table no 3, described symptoms of Diabetes. In table no 5,
described symptoms of Dengue.
E.g. In table no 2 user inputs are all yes for all symptoms or
any input some of them is yes and some of them is no. If user
select the yes for all symptoms,thenresultshouldbepositive
and if user select the no for all symptoms then result should
be negative. Input from user can multiple possibilities such
as some input could be yes and some input could be no. After
that Random forest algorithm will proceedonuserinput and
using training dataset will predict the output as Positive or
Negative.
If output is positive, then specialized doctor will display on
same page. Patient can make an appointment directly with
specialized doctor for particular disease. Predicted result is
positive then Patients details will have sent to particular
doctor which is specialize in that disease.
If output is negative, then precaution of that disease will
display on same page. Patient can make use of online
healthcare for another disease also. Healthcare include the
main disease which spread over population quickly.
Table -1: Symptoms of swine flu disease
Symptoms
for Swine
Flu Disease
1. Chills
2. Fever
3. Coughing
4. Sore Throat
5. Fatigue
6. Nausea
7. Vomiting
8. Diarrhea
9. Runny Nose
10. Stuffy Nose
11. Body Aches
Table - 2: Default user input for positive or negative
output for swine flu
Input from user Result
yes, yes, yes, yes, yes, yes,
yes, yes, yes, yes, yes
Positive
no, no, no, no, no, no
no, no, no, no, no
Negative
In table no 2, user input can be all yes and can be no
for each symptom or can be some Yes and can be some No
depend on what patient givesinput.Wementionedhereonly
two test cases for predict result first one is if user select yes
for all symptoms and second is user select no for all
symptoms. We need more training dataset to predict
accurate output which is not mentioned in this article but
another training dataset is used in proposed system.
Table - 3: Symptoms of diabetes disease
Symptoms
for Diabetes
Disease
1. Heavy thirst
2. Increased hunger
3. Dry mouth
4. Pain in belly
5. Fatigue
6. Nausea
7. Vomiting
8. Frequent urination
9. Unexplained weight
loss
10. Blurred vision
11. Heavy laboured
breathing
Table – 4: Default user input for positive or negative
output for diabetes disease
Input from user Result
yes, yes, yes, yes, yes, yes,
yes, yes, yes, yes, yes
Positive
no, no, no, no, no, no
no, no, no, no, no
Negative
Table -5: Symptoms of dengue disease
Symptoms
for Dengue
1. Sudden high fever
2. Severe headaches
3. Severe joint and
muscle pain
4. Pain behind the eye
5. Fatigue
6. Nausea
7. Vomiting
8. Skin rash which
appears two to five
days after the onset
of fever
9. Mild bleeding
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 689
Table -6: Default user input for positive or negative
output for dengue
Input from user Result
yes, yes, yes, yes, yes, yes, yes, yes,
yes
Positive
no, no, no, no, no, no
no, no, no
Negative
System architecture of disease prediction system using
Random Forest Algorithm described in fig 1. Classification
process of disease prediction described below fig no 1. In
this process user input and training dataset is compared.
In Fig, no 1 User input will proceed using Random forest
algorithm. RFA will generates the maximum numberoftrees
with help of given training dataset and the output is in the
form of Yes (Positive prediction) or No (Negative
Prediction). Each tree would have single output. Fig no 3
Contains tree 1, tree 2…. tree N will have separate result.
Result 1, Result 2…. Result N shows you result of given
symptoms which is calculated by tree’s using training
dataset in the form of Yes or No. Finally, Result is combined
in two categories Yes or No, which tree has maximum
number of Yes or No this result will show the final output as
if Maximum tree is Yes then result should be Positive and if
Maximum tree is No then result should be Negative.
Fig -3: System architecture for disease prediction system
6. RESULTS
In table no 1, symptoms of swine flu are given. Using these
symptoms disease prediction system will predict result with
help of user input. Swine flu output in case of positive result.
In table no 2, two cases of swine flu are given for positive
and negative result. Positive result should display in case of
only input is Yes from user probably greater than No. Fig 4
and fig 5 contains positive and negative result respectively.
Fig -4: swine flu positive result
Negative result should display in case of only input is No
from user probably greater than Yes.
Fig -5: swine flu negative result
In table no 3, symptoms of diabetes are given. In table no 4,
two cases of diabetes are given for positive and negative
result. Fig 6 and fig 7 contains positive and negative result of
diabetes respectively.
Fig -6: Diabetes Positive Result
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 690
Fig -7: Diabetes Negative Result
In table no 5, symptoms of dengue are given. In table no 6,
two cases of dengue are given for positive and negative
result. Fig 6 and fig 7 contains positive and negative result of
dengue respectively.
Fig -8: Dengue Positive Result
Fig -9: Dengue Negative Result
7. CONCLUSION
In proposed Diseasepredictionsystemcanpredictparticular
disease using training dataset. In this article, we proposed
disease prediction system as web/mobile based online
applicationfor patient’shealthcare.Randomforestalgorithm
maintains best accuracy as compare to others classification
system. After result predict the disease, then
recommendation system will work on their predicted
disease. If positive result show to user, then recommended
doctor will display on same page. If negative result, then
precautions for same disease will display.
Reference
[1] Amit Tate, Bajrangsingh Rajpurohit, Jayanand
Pawar, Ujwala Gavhane,Gopal B.
Deshmukh."Comparative Analysis of Classification
Algorithms Used for Disease Prediction in Data
Minin" Vol. 2 - Issue 6 (Nov - Dec 2016),
International Journal of Engineering and
Techniques (IJET), ISSN: 2395 - 1303,
www.ijetjournal.org.
[2] Ahmad Shahin, Walid Moudani, Fadi Chakik,
Mohamad Khalil “Data Mining in Healthcare
Information Systems: Case Studies in Northern
Lebanon” Doctoral School for Science and
Technology The Lebanese University Tripoli,
Lebanon.
[3] Wenxiang Xiao, Jun Ji, Fengjing Shao* , Rencheng
Sun, Chunxiao Xing on “Fasting Blood Glucose
Change Prediction Model Based on Medical
Examination Data and Data Mining Techniques”
2015 IEEE International Conference on Smart City/
Social Com /Sustain Com together with Data Com
2015 and SC2 2015
[4] Mihail Popescu ; Mohammad Khalilia “Improving
disease prediction using ICD-9ontological features”
Fuzzy Systems (FUZZ), 2011 IEEE International
Conference on June 2011.
[5] http://www.datasciencecentral.com/profiles/blo
gs/random-forests-algorithm.
[6] Asmaa S. Hussein; Wail M. Omar; Xue Li ; Modafar
Ati “Efficient Chronic Disease Diagnosis prediction
and recommendation system. Biomedical
Engineering and Sciences (IECBES), 2012 IEEE
EMBS Conference.
[7] April Morton ; Eman Marzban ; Georgios
Giannoulis ; Ayush Patel ; Rajender Aparasu ;
Ioannis A. Kakadiaris “A Comparison of Supervised
Machine Learning Techniques for Predicting Short-
Term In-Hospital Length of Stay among Diabetic
Patients” Machine Learning and Applications
(ICMLA), 2014 13th International Conference on
Dec. 2014(IEEE).
[8] http://www.webmd.com.

More Related Content

PDF
IRJET- Predicting Heart Disease using Machine Learning Algorithm
PDF
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
PDF
Chronic Kidney Disease Prediction Using Machine Learning
PDF
IRJET- A Literature Review on Heart and Alzheimer Disease Prediction
PDF
IRJET- Hybrid Architecture of Heart Disease Prediction System using Genetic N...
PDF
50120140506016
PPTX
Final ppt
PDF
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
IRJET- Predicting Heart Disease using Machine Learning Algorithm
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
Chronic Kidney Disease Prediction Using Machine Learning
IRJET- A Literature Review on Heart and Alzheimer Disease Prediction
IRJET- Hybrid Architecture of Heart Disease Prediction System using Genetic N...
50120140506016
Final ppt
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...

What's hot (20)

PDF
IRJET- Plant Leaf Disease Detection using Image Processing
PDF
Comparing Data Mining Techniques used for Heart Disease Prediction
PDF
Identification of Disease in Leaves using Genetic Algorithm
PPT
Detection of plant diseases
PDF
Clustering Medical Data to Predict the Likelihood of Diseases
PDF
Two Layer k-means based Consensus Clustering for Rural Health Information System
PDF
IRJET- Disease Prediction using Machine Learning
PDF
A Novel Machine Learning Based Approach for Detection and Classification of S...
PDF
The International Journal of Engineering and Science (The IJES)
PDF
Heart Attack Prediction using Machine Learning
PDF
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
PDF
IRJET - Disease Detection in Plant using Machine Learning
PDF
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
PDF
IRJET- Random Forest Algorithm in Drug Selection in Medical Field
PDF
IRJET- Prediction and Analysis of Heart Disease using SVM Algorithm
PDF
IRJET- Heart Disease Prediction and Recommendation
PDF
IRJET- GDPS - General Disease Prediction System
PDF
IRJET- Heart Failure Risk Prediction using Trained Electronic Health Record
PDF
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
IRJET- Plant Leaf Disease Detection using Image Processing
Comparing Data Mining Techniques used for Heart Disease Prediction
Identification of Disease in Leaves using Genetic Algorithm
Detection of plant diseases
Clustering Medical Data to Predict the Likelihood of Diseases
Two Layer k-means based Consensus Clustering for Rural Health Information System
IRJET- Disease Prediction using Machine Learning
A Novel Machine Learning Based Approach for Detection and Classification of S...
The International Journal of Engineering and Science (The IJES)
Heart Attack Prediction using Machine Learning
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
IRJET - Disease Detection in Plant using Machine Learning
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
IRJET- Random Forest Algorithm in Drug Selection in Medical Field
IRJET- Prediction and Analysis of Heart Disease using SVM Algorithm
IRJET- Heart Disease Prediction and Recommendation
IRJET- GDPS - General Disease Prediction System
IRJET- Heart Failure Risk Prediction using Trained Electronic Health Record
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
Ad

Similar to Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classification Algorithm (20)

PDF
Disease Prediction Using Machine Learning
PDF
Performance evaluation of random forest with feature selection methods in pre...
PDF
PREDICTION OF DISEASE WITH MINING ALGORITHMS IN MACHINE LEARNING
PDF
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
PDF
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
PPTX
random forest.pptx
PDF
Decision Tree Models for Medical Diagnosis
PDF
IRJET- Disease Analysis and Giving Remedies through an Android Application
PDF
Heart Disease Prediction Using Random Forest Algorithm
PDF
Machine learning approach for predicting heart and diabetes diseases using da...
PDF
Module 5: Decision Trees
PPTX
An Introduction to Random Forest and linear regression algorithms
PPTX
Random_Forest_Presentation_Detailed.pptx
PDF
Heart failure prediction based on random forest algorithm using genetic algo...
PDF
PCOS_Disease_Prediction_Using_Machine_Learning_Alg.pdf
PPTX
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
PPTX
Machine learning session6(decision trees random forrest)
PDF
Weather Prediction Model using Random Forest Algorithm and Apache Spark
PDF
Heart disease classification using Random Forest
DOCX
Advance KNN classification of brain tumor
Disease Prediction Using Machine Learning
Performance evaluation of random forest with feature selection methods in pre...
PREDICTION OF DISEASE WITH MINING ALGORITHMS IN MACHINE LEARNING
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
random forest.pptx
Decision Tree Models for Medical Diagnosis
IRJET- Disease Analysis and Giving Remedies through an Android Application
Heart Disease Prediction Using Random Forest Algorithm
Machine learning approach for predicting heart and diabetes diseases using da...
Module 5: Decision Trees
An Introduction to Random Forest and linear regression algorithms
Random_Forest_Presentation_Detailed.pptx
Heart failure prediction based on random forest algorithm using genetic algo...
PCOS_Disease_Prediction_Using_Machine_Learning_Alg.pdf
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Machine learning session6(decision trees random forrest)
Weather Prediction Model using Random Forest Algorithm and Apache Spark
Heart disease classification using Random Forest
Advance KNN classification of brain tumor
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PPT on Performance Review to get promotions
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
web development for engineering and engineering
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
UNIT 4 Total Quality Management .pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CH1 Production IntroductoryConcepts.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
573137875-Attendance-Management-System-original
CYBER-CRIMES AND SECURITY A guide to understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Digital Logic Computer Design lecture notes
Automation-in-Manufacturing-Chapter-Introduction.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT on Performance Review to get promotions
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
web development for engineering and engineering
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx

Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classification Algorithm

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 685 Prediction of Dengue, Diabetes and Swine Flu Using Random Forest Classification Algorithm Amit Tate1, Ujwala Gavhane2, Jayanand Pawar3, Bajrang Rajpurohit4 , Gopal B. Deshmukh5 1,2,3,4UG Student, M.E.S. College of Engineering, Pune, SPPU 5Deparent of Computer Engineering, M.E.S. College of Engineering, Pune, SPPU ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract: - In this article we proposed disease prediction system using Random Forest Algorithm (RFA). Training dataset is used for prediction of particular disease. The main aim of this article is that to predict the disease which input symptoms is taken from patient or user. Recommend a specialized doctor for particular disease if result positive. Our algorithms are extendible to dealwithmobile/onlinesolutions to support patients as well for medical diagnostics. As a first step, we also developed web-based interfaces to support patients in calculating risk level for each medical case. Keywords: - Random Forest Algorithm (RFA), Machine Learning, Out-Of-Bag (OOB). 1. INTRODUCTION Supervised learning is part of Machine learning that consist training dataset which is labelled. In proposed system with the help of supervised learning we can predicttheclasslabel which from user input. Disease prediction has become important in a variety of applications such as health insurance, tailored health communication and publichealth. Disease prediction is usually performed using publicly available datasets. Disease involved in prediction system as following A. Swine Flu B. Diabetes C. Dengue Swine flu is a respiratory diseasecausedbyinfluenza viruses that infect the respiratory tract of pigs and result in a barking cough, decreased appetite, nasal secretions, and listless behavior; the virus can be transmitted to humans. Diabetes is a number of diseases that involve problems with the hormone insulin. Normally, the pancreas (an organ behind the stomach) releases insulin to helpyour bodystore and use the sugar and fat from the food you eat. Dengue fever is a painful, debilitating mosquito-borne disease caused by any one of four closely related dengue viruses. These viruses are related to the viruses that cause West Nile infection and yellow fever. Disease prediction using Random forest algorithm is proposed for Dengue, Diabetes and Swine Flu diseases. Training dataset is given for predict the particular disease. Training dataset for each disease is described in III section. Fig -1: Proposed system for disease prediction system using Random Forest Algorithm. In fig 1 contains detailed description of proposed disease prediction system. Firstly, user log in into our system,ifuser don’t have login then they need to register itself. After successfully login user/patient can check their disease details using prediction system. User/Patient need to enter or select their symptoms for particular disease (Diabetes, Swine flu, Dengue). Once user select the disease and their symptoms then our prediction system will predict the disease using training dataset. Result should be show Positive either Negative result. If system gives to user
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 686 Positive result, then system should shows recommend doctor for those disease and user details sent to specialized doctor for that particular disease. Patients can make appointment directly through recommendation system. Doctor will response back to patient quickly. If system gives to user negative result, then system should show precautions for those disease. 2. RELATED WORK Author introduced classification algorithmsusedfordisease prediction and comparison of each algorithm is described in details with graph analysis such as Prediction time, Recall, Precision, TP (True Positive) rate, FP (False Positive) rate. [1]. Here our system is using Random forest algorithm because author introduced this algorithm is best for disease prediction. [1] The RandomForestapproachoffersthehighestaccurateness rate reaching 99.9% with 9 measured factors after the reduction step. In this paper, author describes four case studies fromtheLebanesehealthdomain,AcuteAppendicitis (AP), Premature Birth (PB), Coronary Heart Disease (CHD), and Osteoporosis Disease (OD). For these applications, prediction systems were developed for decision support using data mining techniques. [2] In this paper, author proposed a model based on four years’ medical analysis data to predict the change possibility ofthe coming year’s FBG (Fasting blood glucose). Based on four years’ historic medical examination data, a predictionmodel of coming year’s FBG is offered using traditional data mining techniques with Random forest algorithmandSVM(Support Vector Algorithm). [3] 3. PROPOSED SYSTEM Proposed system is Disease prediction system which can predict the disease with the help of training datasetanduser input. Proposed system includes the Patient healthcare, Disease prediction (Dengue, Diabetes, and Swine flu), Doctor Recommendation for particular disease, Precaution for disease, Doctor details for when patients need to contact them directly, Make an appointment withspecializeddoctor. Disease prediction means that user will gives some input to our system in the form of yes or no for symptoms of particular disease and this system will proceedonthatgiven symptoms. Finally predict the accurate disease using training dataset and given input.Randomforestwill proceed on that given symptoms and predict the accurate output. 4. TECHNOLOGY OVERVIEW Random Forest algorithm are an ensemble supervised learning method which is used as predictor of data for classification and regression. In the classification process algorithm build a number of decision trees at training time and construct the class that is the mode of the classes output by using each single tree. (Random Forests is introduced by Leo Breiman and Adele Cutler for an ensemble of decision trees). [5] Random Forest algorithm is a grouping of tree predictors where each tree based on the values of a random vector experimented independently with the equal distribution for all trees in the forest. The basic principle is that a group of “weak learners” can come together to form a “strong learner”. Random Forests are a perfect tool for making predictions considering they do not over fit. Presenting the accurate kind of randomness makes them accurate classifiers and regression. [5] Single decision trees often have high variance or high bias. Random Forests trying to moderate the high variance problems and high bias by averaging to find a natural balance between the two extremes. Considering that Random Forests have few parameters to tune and can be used simply with default parameter settings, they are a simple tool to use without having a model or to produce a reasonable model fast and efficiently. Random Forests are easy to learn and use for both professionals and lay people - with little research and programming required and may be used by folks without a strong statistical background. Simply put, you can safely make more accurate predictions without most basic mistakes common to other methods. Random Forests produces several classifications for given trees. Each tree is grown as follows: 1. If number of circumstances in the training data set is D, sample D cases at random state but with replacement, from the original dataset. This sample testing set will be the training set for increasing the tree. 2. If there are input variables from training dataset, a number is indicated such that at each node of the tree, m
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 687 variables are selected at random available for the and the best splitting on these is used to splitting the node. The value of is used as constant during entire the forest growing. 3. Each tree is grown to the largest size as possible. There is no pruning an overall grownup tree. The random forest algorithm is an ensemble classifier algorithm based on the decision tree model. It generates k different training data subsets fromanoriginal datasetusing a bootstrap sampling approach, and then, k decision trees are built by training these subsets. A random forest is finally constructed from these decision trees. Each sample of the testing dataset is predicted by all decisiontrees,andthefinal classification result is returned depending on the votes of these trees. [11] Fig 2. Process of the construction of the RF algorithm The original training dataset is formalized as S = {(ai, bj), i = 1, 2... D; j = 1, 2... }, Where A is a sample and b is a feature variable of S. Namely, the original training dataset contains D samples, and there are feature variables in each sample. The main process of the construction of the RF algorithm is presented in Fig. 2[1] The steps of the construction of the random forestalgorithm are as follows: Step 1. Sampling k training subsets. In the first step, k training datasets are experimented from the original training dataset S in a bootstrap selection manner. Namely, N records are selected from S by a random sampling and replacement method in each sampling time. After the current step, k trainingsubsetsareconstructedasa collection of training subsets S Train: S Train = {S1, S2…….Sk}. At the same time, the records that are not to be selected in each sampling period are composed as an Out-Of-Bag(OOB) dataset. In this way, k OOB sets are constructed as a collection of SOOB: SOOB = {OOB1, OOB2... OOBk}, Where k ≪ N, Si ∩ OOBi = ϕ and Si ∪ OOBi = S. To obtain the classification accuracy of each tree model, these OOB sets are used as testing sets after the training process. [1] Step 2. Constructing each decision tree model. In an RF model, each Meta decision tree is created by a C4.5 or CART algorithm from each training subset Si. In the growth process of each tree, m feature variablesofdatasetSi are randomly selected from M variables. In each tree node’s dividing process is done, then gain ratio of each feature variable is computed, and the best one or most priority node is chosen as the splitting node. This splitting process is repeated until a leaf node is generated. Finally, k decision trees are trained from k training subsetsinthesameway. [1] Step 3. Collecting k trees into an RF model. The k trained trees are collected into an RF model, which is defined in Eq. (1): where hi(x,Θj) is a meta decision tree classifier, X are the input feature vectors of the training dataset, and Θj is an independent and identically distributed random vector that determines the growth process of the tree. [1]
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 688 5. HOW IT WORK Proposed system disease prediction system fully based on training dataset. If dataset incorrect then output could be wrong. In table no 1, described symptoms of Swine flu. In table no 3, described symptoms of Diabetes. In table no 5, described symptoms of Dengue. E.g. In table no 2 user inputs are all yes for all symptoms or any input some of them is yes and some of them is no. If user select the yes for all symptoms,thenresultshouldbepositive and if user select the no for all symptoms then result should be negative. Input from user can multiple possibilities such as some input could be yes and some input could be no. After that Random forest algorithm will proceedonuserinput and using training dataset will predict the output as Positive or Negative. If output is positive, then specialized doctor will display on same page. Patient can make an appointment directly with specialized doctor for particular disease. Predicted result is positive then Patients details will have sent to particular doctor which is specialize in that disease. If output is negative, then precaution of that disease will display on same page. Patient can make use of online healthcare for another disease also. Healthcare include the main disease which spread over population quickly. Table -1: Symptoms of swine flu disease Symptoms for Swine Flu Disease 1. Chills 2. Fever 3. Coughing 4. Sore Throat 5. Fatigue 6. Nausea 7. Vomiting 8. Diarrhea 9. Runny Nose 10. Stuffy Nose 11. Body Aches Table - 2: Default user input for positive or negative output for swine flu Input from user Result yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes Positive no, no, no, no, no, no no, no, no, no, no Negative In table no 2, user input can be all yes and can be no for each symptom or can be some Yes and can be some No depend on what patient givesinput.Wementionedhereonly two test cases for predict result first one is if user select yes for all symptoms and second is user select no for all symptoms. We need more training dataset to predict accurate output which is not mentioned in this article but another training dataset is used in proposed system. Table - 3: Symptoms of diabetes disease Symptoms for Diabetes Disease 1. Heavy thirst 2. Increased hunger 3. Dry mouth 4. Pain in belly 5. Fatigue 6. Nausea 7. Vomiting 8. Frequent urination 9. Unexplained weight loss 10. Blurred vision 11. Heavy laboured breathing Table – 4: Default user input for positive or negative output for diabetes disease Input from user Result yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes Positive no, no, no, no, no, no no, no, no, no, no Negative Table -5: Symptoms of dengue disease Symptoms for Dengue 1. Sudden high fever 2. Severe headaches 3. Severe joint and muscle pain 4. Pain behind the eye 5. Fatigue 6. Nausea 7. Vomiting 8. Skin rash which appears two to five days after the onset of fever 9. Mild bleeding
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 689 Table -6: Default user input for positive or negative output for dengue Input from user Result yes, yes, yes, yes, yes, yes, yes, yes, yes Positive no, no, no, no, no, no no, no, no Negative System architecture of disease prediction system using Random Forest Algorithm described in fig 1. Classification process of disease prediction described below fig no 1. In this process user input and training dataset is compared. In Fig, no 1 User input will proceed using Random forest algorithm. RFA will generates the maximum numberoftrees with help of given training dataset and the output is in the form of Yes (Positive prediction) or No (Negative Prediction). Each tree would have single output. Fig no 3 Contains tree 1, tree 2…. tree N will have separate result. Result 1, Result 2…. Result N shows you result of given symptoms which is calculated by tree’s using training dataset in the form of Yes or No. Finally, Result is combined in two categories Yes or No, which tree has maximum number of Yes or No this result will show the final output as if Maximum tree is Yes then result should be Positive and if Maximum tree is No then result should be Negative. Fig -3: System architecture for disease prediction system 6. RESULTS In table no 1, symptoms of swine flu are given. Using these symptoms disease prediction system will predict result with help of user input. Swine flu output in case of positive result. In table no 2, two cases of swine flu are given for positive and negative result. Positive result should display in case of only input is Yes from user probably greater than No. Fig 4 and fig 5 contains positive and negative result respectively. Fig -4: swine flu positive result Negative result should display in case of only input is No from user probably greater than Yes. Fig -5: swine flu negative result In table no 3, symptoms of diabetes are given. In table no 4, two cases of diabetes are given for positive and negative result. Fig 6 and fig 7 contains positive and negative result of diabetes respectively. Fig -6: Diabetes Positive Result
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 690 Fig -7: Diabetes Negative Result In table no 5, symptoms of dengue are given. In table no 6, two cases of dengue are given for positive and negative result. Fig 6 and fig 7 contains positive and negative result of dengue respectively. Fig -8: Dengue Positive Result Fig -9: Dengue Negative Result 7. CONCLUSION In proposed Diseasepredictionsystemcanpredictparticular disease using training dataset. In this article, we proposed disease prediction system as web/mobile based online applicationfor patient’shealthcare.Randomforestalgorithm maintains best accuracy as compare to others classification system. After result predict the disease, then recommendation system will work on their predicted disease. If positive result show to user, then recommended doctor will display on same page. If negative result, then precautions for same disease will display. Reference [1] Amit Tate, Bajrangsingh Rajpurohit, Jayanand Pawar, Ujwala Gavhane,Gopal B. Deshmukh."Comparative Analysis of Classification Algorithms Used for Disease Prediction in Data Minin" Vol. 2 - Issue 6 (Nov - Dec 2016), International Journal of Engineering and Techniques (IJET), ISSN: 2395 - 1303, www.ijetjournal.org. [2] Ahmad Shahin, Walid Moudani, Fadi Chakik, Mohamad Khalil “Data Mining in Healthcare Information Systems: Case Studies in Northern Lebanon” Doctoral School for Science and Technology The Lebanese University Tripoli, Lebanon. [3] Wenxiang Xiao, Jun Ji, Fengjing Shao* , Rencheng Sun, Chunxiao Xing on “Fasting Blood Glucose Change Prediction Model Based on Medical Examination Data and Data Mining Techniques” 2015 IEEE International Conference on Smart City/ Social Com /Sustain Com together with Data Com 2015 and SC2 2015 [4] Mihail Popescu ; Mohammad Khalilia “Improving disease prediction using ICD-9ontological features” Fuzzy Systems (FUZZ), 2011 IEEE International Conference on June 2011. [5] http://www.datasciencecentral.com/profiles/blo gs/random-forests-algorithm. [6] Asmaa S. Hussein; Wail M. Omar; Xue Li ; Modafar Ati “Efficient Chronic Disease Diagnosis prediction and recommendation system. Biomedical Engineering and Sciences (IECBES), 2012 IEEE EMBS Conference. [7] April Morton ; Eman Marzban ; Georgios Giannoulis ; Ayush Patel ; Rajender Aparasu ; Ioannis A. Kakadiaris “A Comparison of Supervised Machine Learning Techniques for Predicting Short- Term In-Hospital Length of Stay among Diabetic Patients” Machine Learning and Applications (ICMLA), 2014 13th International Conference on Dec. 2014(IEEE). [8] http://www.webmd.com.