SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 08 | Aug 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 89
Titanic Survival Analysis using Logistic Regression
Vaishnav Kshirsagar1, Nahush Phalke2
1Graduate student, University of San Francisco, California, USA
2Software Engineer, Accenture, Pune, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - The sinking of the Titanic ship caused the death
of about thousands of passengers and crew is one of the fatal
accidents in history. The loss of lives was mostly caused due to
the shortage of the life boats. The mind shaking observation
came out from the incident is that some people were more
sustainable to endure than many others, like children, women
were the one who got the more priority to be rescued. The
main objective of the algorithm is to firstly find predictable or
previously unknown data by implementing exploratory data
analytics on the available training data and then apply
different machine learning models and classifiers to complete
the analysis. This will predict which people are more likely to
survive. After this the result of applying machine learning
algorithm is analyzed on the basis of performance and
accuracy.
Key Words: Logistic Regression, Data Analysis, Kaggle
Titanic Dataset, Data pre-processing. Cross validation,
Confusion Matrix
1. INTRODUCTION
The field of machine learning has allowed analysts to
uncover insights from historical data andpastevents. Titanic
disaster is one of the most famous shipwrecks in the world
history. Titanic is a British ship liner that sank in the North
Atlantic Ocean, a few hours after colliding with an iceberg.
While there are facts available to support the cause of the
incident of ship breaking, there are various speculations
regarding the survival rate of passengers in the Titanic
disaster. Over the years, data of survived as well asdeceased
passengers has been collected. The dataset is publicly
available on a website called Kaggle.com.
This dataset has been studied and analyzed using
various machine learning algorithms like Random Forest,
SVM etc. Various languages and tools are used to implement
these algorithms including Weka, Python, R, Java etc. The
approach of the research paper is centered on R and Python
for executing algorithms- Nave Bayes, Logistic Regression,
Decision Tree, and Random Forest. The prime objective of
the research is to analyze Titanic disaster to determine a
correlation between the survival of passengers and
characteristics of the passengers using various machine
learning algorithms. In particular, this research work
compares the algorithms on the basis of the percentage of
accuracy on a test dataset.
2. ALGORITHM
2.1 Data Pre-processing
In the dataset available for the prediction some of the data
values are missing or unknown. This missing data was
resulting in reducing the accuracy of the overall prediction
model and also reduces the size of pure training data which
in turn reduces accuracy. Data preprocessing is a technique
that involves transforming raw data into an understandable
format.
Real-world data is often incomplete, inconsistent, and/or
lacking in certain behaviors or trends,andislikelytocontain
many errors. Data preprocessing is a proven method of
resolving such issues. Data preprocessingpreparesrawdata
for further processing. Missing values are replaced by
average of that column. So, the missing and unknowndata of
the passengers which is easily predictable is filled up by this
step.
2.2 Classification
Logistic Regression:
Second step of the algorithm is using a classifier to classify
the available information. Logistic Regression is the
appropriate regression analysis to conduct when the
dependent variable is dichotomous (binary). Like all
regression analyses, the logistic regression is a predictive
analysis.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 08 | Aug 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 90
Logistic regression is used to describe data and to explain
the relationship between one dependent binaryvariable and
one or more nominal, ordinal, interval or ratio-level
independent variables. It uses a method of using he
regression line between dependent and independent
variable to predict the value of the dependent variable.
2.3 Cross validation
Dataset is divided into two mainpartsnamely Trainand Test
data. Training data will be considered for the training of the
machine. Test data will be used for validating the machine.
Cross validation technique used here is K-Fold.
The method has only one parameter called k that refers to
the number of groups into which a given data sampleistobe
split. As such, the method is also called k-fold cross-
validation. When a particular value for k is chosen, it may be
used in place of k in the reference to the model, such as k=10
becoming 10-fold cross-validation.
2.4 Analysis of confusion matrix
Confusion matrix is used to show the performance of the
algorithm. Accuracy of the model can be predicted using the
confusion matrix. It is a plotting of relation betweenreal and
predicted outputs. It allows us to check the accuracy and
performance of the algorithm. In this case we are using two
attributes at a time for the confusion matrix plotting. Test
case data is used to build the confusion matrix.
The values shown in the confusion matrix are the
probability of survival of the individual considering only
those parameters. As shown in fig [2] the cell on firstcolumn
and is of age and the 7th row is sex_male i.e. the probability
of surviving the individual is depending on the age and the
gender as if he is male is 0.081. As it is positive there is a
possibility that the person with this attribute survives.
3. RESULTS
The logistic regression gives the accuracy of 95% which is
based on the confusion matrix. The parameters used here
are accuracy and false discovery rate. Accuracy is a measure
of the correctness of the prediction of the model. Higher
accuracy is always better and is calculated by
(TN + TP)/Total number of rows *100
False discovery rate are the false positive measures
of confusion matrix where the model predicts that the
passenger would survive but in reality,itdoesn’t. This would
prove dangerous as the prediction may go wrong and
hampers the accuracy of the results. The attempts are being
made to increase the accuracy rate and reduce the false
discovery rates.
4. CONCLUSIONS
The logistic regression provides a better accuracyi.e.almost
of about 95%. It works better with binary dependent
variable which means the variable has a binary value as its
output like yes or no, true or false.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 08 | Aug 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 91
Fig 2: Confusion Matrix
The ROC curve is the plotting of the outputbasedonthefalse
positive rate and the true positive rate plotted along x and
the y-axes. The Curve depicts the performance of various
algorithms on the same data which helps to compare the
performance, accuracy and efficiency of the algorithm. It
helps to decide the best algorithmwhichissuitableforuser’s
requirement.
REFERENCES
[1] Analyzing Titanic disaster using machine learning
algorithms-Computing,CommunicationandAutomation
(ICCCA), 2017International Conferenceon21December
2017, IEEE.
[2] Prediction of Survivors in Titanic Dataset: A
Comparative Study using Machine LearningAlgorithms,
Tryambak Chatterlee, IJERMT-2017.
[3] MICHAEL AARON WHITLEY, using statistical learningto
predict survival of passengers on the RMS Titanic by
Michael Aaron Whitley, 2015.
[4] Lonnie Stevans, David L. Gleicher,” Who Survived the
Titanic? A logistic regression analysis”-Article in
International Journal of Maritime History, December
2004.
[5] MICHAEL AARON WHITLEY, using statistical learningto
predict survival of passengers on the RMS Titanic by
Michael Aaron Whitley, 2015.
[6] Bircan H., Logistic Regression Analysis: Practice in
Medical Data,KocaeliUniversitySocial SciencesInstitute
Journal, 2004 / 2: 185- 208
[7] Atakurt, Y., 1999, Logistic Regression Analysis and an
Implementation in Its Use in Medicine, Ankara
University Faculty of Medicine Journal, C.52, Issue 4,
P.195, Ankara
[8] Kaggle, Titanic: Machine Learning form Disaster
[Online]. Available: http://www.kaggle.com/

More Related Content

PDF
IRJET- Supervised Learning Classification Algorithms Comparison
PDF
5 parallel implementation 06299286
PDF
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
PDF
Feature selection using modified particle swarm optimisation for face recogni...
PDF
Employee mode of commuting
PDF
Sca a sine cosine algorithm for solving optimization problems
PDF
IRJET- Privacy Preservation using Apache Spark
PDF
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
IRJET- Supervised Learning Classification Algorithms Comparison
5 parallel implementation 06299286
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
Feature selection using modified particle swarm optimisation for face recogni...
Employee mode of commuting
Sca a sine cosine algorithm for solving optimization problems
IRJET- Privacy Preservation using Apache Spark
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...

What's hot (19)

PDF
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...
PDF
Threshold benchmarking for feature ranking techniques
PDF
710201909
PDF
Conference_Paper
PDF
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
PDF
EGR Expo 2016 - EGR 402 - 38x48 poster - Easter (1)
PDF
Neural Network-Based Actuator Fault Diagnosis for a Non-Linear Multi-Tank System
PPSX
Psat toolbox-8631349
PPTX
Optimization in power system
PPTX
A hybrid sine cosine optimization algorithm for solving global optimization p...
PDF
IRJET - Rainfall Forecasting using Weka Data Mining Tool
PPSX
Optimization for-power-sy-8631549
PDF
PDF
IMPL Data Analysis
PPT
Traffic Classification
PDF
Hh3512801283
PDF
IRJET- A Comprehensive Study of Artificial Bee Colony (ABC) Algorithms and it...
PDF
Developing effective meta heuristics for a probabilistic
PDF
How to Validate you Client Churn Model
GENERATION OF SYNTHETIC POPULATION USING MARKOV CHAIN MONTE CARLO SIMULATION ...
Threshold benchmarking for feature ranking techniques
710201909
Conference_Paper
Optimization of Automatic Voltage Regulator Using Genetic Algorithm Applying ...
EGR Expo 2016 - EGR 402 - 38x48 poster - Easter (1)
Neural Network-Based Actuator Fault Diagnosis for a Non-Linear Multi-Tank System
Psat toolbox-8631349
Optimization in power system
A hybrid sine cosine optimization algorithm for solving global optimization p...
IRJET - Rainfall Forecasting using Weka Data Mining Tool
Optimization for-power-sy-8631549
IMPL Data Analysis
Traffic Classification
Hh3512801283
IRJET- A Comprehensive Study of Artificial Bee Colony (ABC) Algorithms and it...
Developing effective meta heuristics for a probabilistic
How to Validate you Client Churn Model
Ad

Similar to IRJET- Titanic Survival Analysis using Logistic Regression (20)

PDF
IRJET- Supervised Learning Classification Algorithms Comparison
PDF
TRAFFIC FORECAST FOR INTELLECTUAL TRANSPORTATION SYSTEM USING MACHINE LEARNING
PDF
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
PDF
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
PDF
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)
PDF
IRJET - Stock Market Prediction using Machine Learning Algorithm
PDF
AIRLINE FARE PRICE PREDICTION
PDF
IRJET- Novel based Stock Value Prediction Method
PDF
Visualizing and Forecasting Stocks Using Machine Learning
PDF
IRJET- Financial Analysis using Data Mining
PDF
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
PDF
Machine Learning Aided Breast Cancer Classification
PDF
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
PDF
Comparative Analysis of Various Algorithms for Fetal Risk Prediction
PDF
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
PDF
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
PDF
Credit Card Fraud Detection Using Machine Learning & Data Science
PDF
Credit Card Fraud Detection Using Machine Learning & Data Science
PDF
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
PDF
IRJET- Accident Information Mining and Insurance Dispute Resolution
IRJET- Supervised Learning Classification Algorithms Comparison
TRAFFIC FORECAST FOR INTELLECTUAL TRANSPORTATION SYSTEM USING MACHINE LEARNING
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)
IRJET - Stock Market Prediction using Machine Learning Algorithm
AIRLINE FARE PRICE PREDICTION
IRJET- Novel based Stock Value Prediction Method
Visualizing and Forecasting Stocks Using Machine Learning
IRJET- Financial Analysis using Data Mining
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Machine Learning Aided Breast Cancer Classification
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
Comparative Analysis of Various Algorithms for Fetal Risk Prediction
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
IRJET- Smart Automated Modelling using ECLAT Algorithm for Traffic Accident P...
IRJET- Accident Information Mining and Insurance Dispute Resolution
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPT
introduction to datamining and warehousing
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Well-logging-methods_new................
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Geodesy 1.pptx...............................................
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Current and future trends in Computer Vision.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT
Project quality management in manufacturing
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
web development for engineering and engineering
introduction to datamining and warehousing
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
bas. eng. economics group 4 presentation 1.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Well-logging-methods_new................
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Geodesy 1.pptx...............................................
Lecture Notes Electrical Wiring System Components
Current and future trends in Computer Vision.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Project quality management in manufacturing
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT 4 Total Quality Management .pptx
CH1 Production IntroductoryConcepts.pptx
web development for engineering and engineering

IRJET- Titanic Survival Analysis using Logistic Regression

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 08 | Aug 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 89 Titanic Survival Analysis using Logistic Regression Vaishnav Kshirsagar1, Nahush Phalke2 1Graduate student, University of San Francisco, California, USA 2Software Engineer, Accenture, Pune, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - The sinking of the Titanic ship caused the death of about thousands of passengers and crew is one of the fatal accidents in history. The loss of lives was mostly caused due to the shortage of the life boats. The mind shaking observation came out from the incident is that some people were more sustainable to endure than many others, like children, women were the one who got the more priority to be rescued. The main objective of the algorithm is to firstly find predictable or previously unknown data by implementing exploratory data analytics on the available training data and then apply different machine learning models and classifiers to complete the analysis. This will predict which people are more likely to survive. After this the result of applying machine learning algorithm is analyzed on the basis of performance and accuracy. Key Words: Logistic Regression, Data Analysis, Kaggle Titanic Dataset, Data pre-processing. Cross validation, Confusion Matrix 1. INTRODUCTION The field of machine learning has allowed analysts to uncover insights from historical data andpastevents. Titanic disaster is one of the most famous shipwrecks in the world history. Titanic is a British ship liner that sank in the North Atlantic Ocean, a few hours after colliding with an iceberg. While there are facts available to support the cause of the incident of ship breaking, there are various speculations regarding the survival rate of passengers in the Titanic disaster. Over the years, data of survived as well asdeceased passengers has been collected. The dataset is publicly available on a website called Kaggle.com. This dataset has been studied and analyzed using various machine learning algorithms like Random Forest, SVM etc. Various languages and tools are used to implement these algorithms including Weka, Python, R, Java etc. The approach of the research paper is centered on R and Python for executing algorithms- Nave Bayes, Logistic Regression, Decision Tree, and Random Forest. The prime objective of the research is to analyze Titanic disaster to determine a correlation between the survival of passengers and characteristics of the passengers using various machine learning algorithms. In particular, this research work compares the algorithms on the basis of the percentage of accuracy on a test dataset. 2. ALGORITHM 2.1 Data Pre-processing In the dataset available for the prediction some of the data values are missing or unknown. This missing data was resulting in reducing the accuracy of the overall prediction model and also reduces the size of pure training data which in turn reduces accuracy. Data preprocessing is a technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends,andislikelytocontain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessingpreparesrawdata for further processing. Missing values are replaced by average of that column. So, the missing and unknowndata of the passengers which is easily predictable is filled up by this step. 2.2 Classification Logistic Regression: Second step of the algorithm is using a classifier to classify the available information. Logistic Regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 08 | Aug 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 90 Logistic regression is used to describe data and to explain the relationship between one dependent binaryvariable and one or more nominal, ordinal, interval or ratio-level independent variables. It uses a method of using he regression line between dependent and independent variable to predict the value of the dependent variable. 2.3 Cross validation Dataset is divided into two mainpartsnamely Trainand Test data. Training data will be considered for the training of the machine. Test data will be used for validating the machine. Cross validation technique used here is K-Fold. The method has only one parameter called k that refers to the number of groups into which a given data sampleistobe split. As such, the method is also called k-fold cross- validation. When a particular value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation. 2.4 Analysis of confusion matrix Confusion matrix is used to show the performance of the algorithm. Accuracy of the model can be predicted using the confusion matrix. It is a plotting of relation betweenreal and predicted outputs. It allows us to check the accuracy and performance of the algorithm. In this case we are using two attributes at a time for the confusion matrix plotting. Test case data is used to build the confusion matrix. The values shown in the confusion matrix are the probability of survival of the individual considering only those parameters. As shown in fig [2] the cell on firstcolumn and is of age and the 7th row is sex_male i.e. the probability of surviving the individual is depending on the age and the gender as if he is male is 0.081. As it is positive there is a possibility that the person with this attribute survives. 3. RESULTS The logistic regression gives the accuracy of 95% which is based on the confusion matrix. The parameters used here are accuracy and false discovery rate. Accuracy is a measure of the correctness of the prediction of the model. Higher accuracy is always better and is calculated by (TN + TP)/Total number of rows *100 False discovery rate are the false positive measures of confusion matrix where the model predicts that the passenger would survive but in reality,itdoesn’t. This would prove dangerous as the prediction may go wrong and hampers the accuracy of the results. The attempts are being made to increase the accuracy rate and reduce the false discovery rates. 4. CONCLUSIONS The logistic regression provides a better accuracyi.e.almost of about 95%. It works better with binary dependent variable which means the variable has a binary value as its output like yes or no, true or false.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 08 | Aug 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 91 Fig 2: Confusion Matrix The ROC curve is the plotting of the outputbasedonthefalse positive rate and the true positive rate plotted along x and the y-axes. The Curve depicts the performance of various algorithms on the same data which helps to compare the performance, accuracy and efficiency of the algorithm. It helps to decide the best algorithmwhichissuitableforuser’s requirement. REFERENCES [1] Analyzing Titanic disaster using machine learning algorithms-Computing,CommunicationandAutomation (ICCCA), 2017International Conferenceon21December 2017, IEEE. [2] Prediction of Survivors in Titanic Dataset: A Comparative Study using Machine LearningAlgorithms, Tryambak Chatterlee, IJERMT-2017. [3] MICHAEL AARON WHITLEY, using statistical learningto predict survival of passengers on the RMS Titanic by Michael Aaron Whitley, 2015. [4] Lonnie Stevans, David L. Gleicher,” Who Survived the Titanic? A logistic regression analysis”-Article in International Journal of Maritime History, December 2004. [5] MICHAEL AARON WHITLEY, using statistical learningto predict survival of passengers on the RMS Titanic by Michael Aaron Whitley, 2015. [6] Bircan H., Logistic Regression Analysis: Practice in Medical Data,KocaeliUniversitySocial SciencesInstitute Journal, 2004 / 2: 185- 208 [7] Atakurt, Y., 1999, Logistic Regression Analysis and an Implementation in Its Use in Medicine, Ankara University Faculty of Medicine Journal, C.52, Issue 4, P.195, Ankara [8] Kaggle, Titanic: Machine Learning form Disaster [Online]. Available: http://www.kaggle.com/