SlideShare a Scribd company logo
3
Most read
6
Most read
Eleftherios Mitsimponas
“DATA ANALYST “

Import CSV file and start exploring the dataset
Check the dimensions of the dataset
Call the first rows to understand the data
Check the variables and their types
Explore the dataset for Missing values and identify their location and their
number
Subset our data to obtain observations that contain no missing data by replacing
the lines with N/A with some specific values
Create new variables to help to train our data
Exploratory DATA Analysis

 My predictive model is based on some new variables, that I had to
create due to make my predictions more accurate.
CREATE
New variables
Out-of-bag (OOB) error :
is a method of measuring the
prediction error of random forests .So
the smaller the error, the more
accurate my model.
Random Forest:
Random forest builds multiple
decision trees and merges them
together to get a more accurate and
stable prediction.
IMPORTANCE OF VALUES
Random Forest has a feature of
presenting the
important variables.
83.95%
Accuracy
rf.label <- as.factor(train$Survived)
0:perish
1:survive
Using 10 fold Cross-validation divide my
train.data(=891 obs.) into 10 folds with almost the
same length each one
Fold1(89 obs.), Fold2(89 obs.),…..,Fold10(90obs.)
makeCluster(6,type=“SOCK”)
We call these groups 1 to 10 .The analysis is performed 10 times . The first
time the analysis is performed, groups 1 to 9 are used to train the
algorithm and group 10 is used to test the model.
I categorize the data (Clustering) into 6
sockets . Every socket commits CPU and it’s
working without waiting at the same time.
Performing 10 fold C-V find the cp-accuracy
of the model

Visualization
Visualize the data is a powerful machine to understand well
your data and find the correlation between them.
This plot help me to find the survivng
rate based on Pclass and new title.
My final rpart model which give me the
most important variables of my
predictive model. As a result the best
accuracy for my model.

More Related Content

PPTX
WEKA: Credibility Evaluating Whats Been Learned
PPTX
WEKA: Practical Machine Learning Tools And Techniques
PPT
Learning On The Border:Active Learning in Imbalanced classification Data
PDF
Lecture7 cross validation
PPTX
Machine Learning - Simple Linear Regression
PDF
Machine learning (5)
PPTX
Heart disease classification
ODP
Linear Regression Ex
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Practical Machine Learning Tools And Techniques
Learning On The Border:Active Learning in Imbalanced classification Data
Lecture7 cross validation
Machine Learning - Simple Linear Regression
Machine learning (5)
Heart disease classification
Linear Regression Ex

What's hot (19)

PPTX
Random forest
PPTX
An Introduction to Simulation in the Social Sciences
PDF
Aaa ped-14-Ensemble Learning: About Ensemble Learning
PDF
Classification Based Machine Learning Algorithms
PPTX
Intro to Machine Learning for non-Data Scientists
PDF
Assessing Model Performance - Beginner's Guide
PDF
Binary classification metrics_cheatsheet
PPTX
Borderline Smote
PDF
Data Science - Part IX - Support Vector Machine
PDF
Random Forest / Bootstrap Aggregation
PPTX
Cross-validation aggregation for forecasting
PDF
Classification using L1-Penalized Logistic Regression
PPTX
Linear Regression, Machine learning term
PDF
Linear Regression in R
PPTX
Machine Learning using Support Vector Machine
PPTX
Machine learning algorithms and business use cases
Random forest
An Introduction to Simulation in the Social Sciences
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Classification Based Machine Learning Algorithms
Intro to Machine Learning for non-Data Scientists
Assessing Model Performance - Beginner's Guide
Binary classification metrics_cheatsheet
Borderline Smote
Data Science - Part IX - Support Vector Machine
Random Forest / Bootstrap Aggregation
Cross-validation aggregation for forecasting
Classification using L1-Penalized Logistic Regression
Linear Regression, Machine learning term
Linear Regression in R
Machine Learning using Support Vector Machine
Machine learning algorithms and business use cases
Ad

Similar to Data Analysis project "TITANIC SURVIVAL" (20)

PDF
Peterson_-_Machine_Learning_Project
PDF
Human_Activity_Recognition_Predictive_Model
PDF
Simple rules for building robust machine learning models
PDF
Classification examp
PPTX
Validation and Over fitting , Validation strategies
PPTX
Introduction to RandomForests 2004
PDF
Cheatsheet machine-learning-tips-and-tricks
PDF
Course Project for Coursera Practical Machine Learning
PPTX
Predicting Hospital Readmission Using TreeNet
PDF
Module 6: Ensemble Algorithms
PDF
PPTX
How to Win Machine Learning Competitions ?
PPTX
6 Evaluating Predictive Performance and ensemble.pptx
PPT
MLlectureMethod.ppt
PPT
MLlectureMethod.ppt
PPTX
wk5ppt2_Iris
PPTX
Presentation_Malware Analysis.pptx
PPTX
Ensemble methods in machine learning
PPTX
Building and deploying analytics
PDF
4_2_Ensemble models and grad boost part 1.pdf
Peterson_-_Machine_Learning_Project
Human_Activity_Recognition_Predictive_Model
Simple rules for building robust machine learning models
Classification examp
Validation and Over fitting , Validation strategies
Introduction to RandomForests 2004
Cheatsheet machine-learning-tips-and-tricks
Course Project for Coursera Practical Machine Learning
Predicting Hospital Readmission Using TreeNet
Module 6: Ensemble Algorithms
How to Win Machine Learning Competitions ?
6 Evaluating Predictive Performance and ensemble.pptx
MLlectureMethod.ppt
MLlectureMethod.ppt
wk5ppt2_Iris
Presentation_Malware Analysis.pptx
Ensemble methods in machine learning
Building and deploying analytics
4_2_Ensemble models and grad boost part 1.pdf
Ad

Recently uploaded (20)

PDF
Introduction to Data Science and Data Analysis
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Transcultural that can help you someday.
PPTX
Database Infoormation System (DBIS).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Lecture1 pattern recognition............
PDF
Introduction to the R Programming Language
PDF
Microsoft Core Cloud Services powerpoint
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPT
Predictive modeling basics in data cleaning process
PPTX
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Data Science and Data Analysis
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
importance of Data-Visualization-in-Data-Science. for mba studnts
Pilar Kemerdekaan dan Identi Bangsa.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Transcultural that can help you someday.
Database Infoormation System (DBIS).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Lecture1 pattern recognition............
Introduction to the R Programming Language
Microsoft Core Cloud Services powerpoint
STERILIZATION AND DISINFECTION-1.ppthhhbx
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
annual-report-2024-2025 original latest.
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Predictive modeling basics in data cleaning process
Qualitative Qantitative and Mixed Methods.pptx

Data Analysis project "TITANIC SURVIVAL"

  • 2.  Import CSV file and start exploring the dataset Check the dimensions of the dataset Call the first rows to understand the data Check the variables and their types Explore the dataset for Missing values and identify their location and their number Subset our data to obtain observations that contain no missing data by replacing the lines with N/A with some specific values Create new variables to help to train our data Exploratory DATA Analysis
  • 3.   My predictive model is based on some new variables, that I had to create due to make my predictions more accurate. CREATE New variables
  • 4. Out-of-bag (OOB) error : is a method of measuring the prediction error of random forests .So the smaller the error, the more accurate my model. Random Forest: Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. IMPORTANCE OF VALUES Random Forest has a feature of presenting the important variables. 83.95% Accuracy rf.label <- as.factor(train$Survived) 0:perish 1:survive
  • 5. Using 10 fold Cross-validation divide my train.data(=891 obs.) into 10 folds with almost the same length each one Fold1(89 obs.), Fold2(89 obs.),…..,Fold10(90obs.) makeCluster(6,type=“SOCK”) We call these groups 1 to 10 .The analysis is performed 10 times . The first time the analysis is performed, groups 1 to 9 are used to train the algorithm and group 10 is used to test the model. I categorize the data (Clustering) into 6 sockets . Every socket commits CPU and it’s working without waiting at the same time. Performing 10 fold C-V find the cp-accuracy of the model
  • 6.  Visualization Visualize the data is a powerful machine to understand well your data and find the correlation between them. This plot help me to find the survivng rate based on Pclass and new title. My final rpart model which give me the most important variables of my predictive model. As a result the best accuracy for my model.