SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Insurance Customer
Response Prediction
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Agenda
• Dataset Overview
• Exploratory Data Analysis (EDA)
• Data Preprocessing
• Model Selection and Cross-Validation
• Model Training and Model Evaluation
• Results and Interpretation
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Dataset overview
Introduction :
In this section, we will analyze and predict customer responses to a health insurance
marketing campaign using a dataset containing 50,882 instances and 14 variables.
Dataset Description :
The dataset includes key features such as ID, City code, Region, Accommodation and
Recommended insurance type. The primary target variable, Response , indicates
whether a customer accepted or rejected the recommended insurance. We will explore
the dataset’s structure, summarize the unique values of each feature, and examine
their data types to get a deeper understanding
Data Source from [ Kaggle ]
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Exploratory Data Analysis (EDA)
Examine dataset dimensions:
• The dataset contains 50,882 instances (rows) and 14
variables (columns).
• Provides an overview of the data size and structure.
Preview dataset:
• Display the first few rows to inspect data entries and get an
understanding of features.
Check for missing values:
•Identify missing or incomplete data by using functions like
isnul().
•Assess how many entries are missing in each column and
determine how to handle them.
Analyze target variable :
•Investigate the distribution of the target variable ‘Response’
to understand how many customers accepted (1) or rejected
(0) the recommended insurance.
Missing Values
Target Variable
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Explore unique values of categorical features:
• Analyze variables like accommodation type , reco
health and Health indicator
Visualize relationships with target variable :
• Use count plots and bar charts to visualize how
categorical features relate to ‘response’
• Examine how features like accommodation
type , health indicator etc
Generate correlation heatmap:
• Create a correlation heatmap to identify
relationships between numerical and encoded
categorical variables.
• Helps to understand which features are strongly
correlated with each other and with the target
variable.
Compare by target
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Data Preprocessing
Handle Missing Values:
• Identify and fill missing values in key columns like Holding policy and Duration
• Replace NaN values with 0 for new customers who don't have an existing policy
Fill Missing Values in Categorical Features:
• For Health indicator, fill missing values with a placeholder (x0), indicating missing health data
Data Type Conversion:
• Convert Holding policy duration and other columns with inconsistent types into numeric values for
model compatibility.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Drop Irrelevant Columns:
• Remove columns such as ID, City Code, and Region to avoid high cardinality and unnecessary data in
the model
Feature Encoding:
• Apply Label Encoding to categorical variables like Health Indicator, Accommodation Type, and Reco
Insurance Type for numerical modeling.
Scaling:
• Use StandardScaler to normalize numerical features like Upper Age, Lower Age, Reco Policy
Premium, etc., ensuring they have a mean of 0 and standard deviation of 1 for better model
performance.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Model Selection and Cross-Validation
• Model Comparison:
• Evaluate multiple models including
SVM (Support Vector Machine),
Decision Tree, Random Forest,
XGBoost, and CatBoost to identify the
best-performing model for the
insurance response prediction task.
• Cross-Validation:
• Use K-Fold Cross Validation (5-fold) to
assess each model's performance
with accuracy as the evaluation
metric.
• Compute fold-wise accuracy and
mean accuracy for each model to
determine robustness
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Click to edit
Master title
style
Model Training and
Evaluation
Data Split:
Split the dataset into training (80%) and testing (20%) sets
to train the model and evaluate its performance.
Model Training:
Train the selected model (SVM with a linear kernel) on the
training data using the optimal parameters.
Prediction:
Generate predictions on the test set and evaluate how well
the model generalizes.
Evaluation Metrics:
•Accuracy Score: Calculate overall accuracy on the test data.
•Confusion Matrix: Visualize model performance with true
positive, true negative, false positive, and false negative
rates.
•Classification Report: Analyze detailed metrics, including
precision, recall, and F1-score, for both classes (Accepted
and Rejected)
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Results and Interpretation
Final Model Accuracy:
The SVM model achieved an accuracy of 75% on the test set, showing it performs moderately
well in predicting customer insurance responses based on this dataset.
Confusion Matrix Insights:
The confusion matrix indicates balanced prediction for customers likely to accept or reject insurance
offers, with relatively few misclassifications. This balance reflects the model's ability to reasonably handle
both positive and negative responses in the dataset.
Conclusion :
Based on the data, the model captures patterns in customer demographics and insurance preferences, helping
predict purchase likelihood. With further tuning—such as refining features or trying other algorithms—the model’s
performance could improve. This would support more efficient decision-making, helping the company better target
high-potential customers
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Questions ?
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Thank You!

More Related Content

PPTX
Association rules apriori algorithm
PPT
Association Rule.ppt
PPTX
Priyanka S Predictive Accounting
PPT
Less11 auditing
PPTX
Database Programming
PDF
IT and Internet Law
PPTX
Database Security Methods, DAC, MAC,View
PDF
CLUSTERING IN DATA MINING.pdf
Association rules apriori algorithm
Association Rule.ppt
Priyanka S Predictive Accounting
Less11 auditing
Database Programming
IT and Internet Law
Database Security Methods, DAC, MAC,View
CLUSTERING IN DATA MINING.pdf

Similar to Predicting Insurance Responses: Leveraging Data Science for Better Outcomes (20)

PDF
Bank Customer Segmentation & Insurance Claim Prediction
PPTX
Salary Prediction: Harnessing Data for Informed Compensation Insights
PPTX
Employee Retention Prediction: Enhancing Workforce Stability
PPTX
Purchase Prediction for Insurance Company
PPTX
Travel Insurance Prediction - Mehataab Shaikh.pptx
DOCX
Credit Card Marketing Classification Trees Fr.docx
PDF
The four trends driving the insurance industry
PPTX
Strategies for Employee Retention: Building a Resilient Workforce
PPTX
Fraud Detection: Innovative Approaches to Safeguarding Integrity
PDF
Insurance Premium Estimation: Data-Driven Modeling and Real-Time Predictions
PPTX
Insurance Churn Prediction Data Analysis Project
PDF
Predictive analytics. overview of skills and opportunities
PDF
How do insurers convert data to value
PPTX
Data analysis on bank data
PPTX
BMDSE v1 - Data Scientist Deck
PPTX
Prediction of customer propensity to churn - Telecom Industry
PPTX
Predict Your Profits: Optimizing Ad Campaigns with Data-Driven Insights
PDF
A Medical Price Prediction System using Boosting Algorithms through Machine L...
DOCX
Insurance Optimization
PDF
Predictive modeling
Bank Customer Segmentation & Insurance Claim Prediction
Salary Prediction: Harnessing Data for Informed Compensation Insights
Employee Retention Prediction: Enhancing Workforce Stability
Purchase Prediction for Insurance Company
Travel Insurance Prediction - Mehataab Shaikh.pptx
Credit Card Marketing Classification Trees Fr.docx
The four trends driving the insurance industry
Strategies for Employee Retention: Building a Resilient Workforce
Fraud Detection: Innovative Approaches to Safeguarding Integrity
Insurance Premium Estimation: Data-Driven Modeling and Real-Time Predictions
Insurance Churn Prediction Data Analysis Project
Predictive analytics. overview of skills and opportunities
How do insurers convert data to value
Data analysis on bank data
BMDSE v1 - Data Scientist Deck
Prediction of customer propensity to churn - Telecom Industry
Predict Your Profits: Optimizing Ad Campaigns with Data-Driven Insights
A Medical Price Prediction System using Boosting Algorithms through Machine L...
Insurance Optimization
Predictive modeling
Ad

More from Boston Institute of Analytics (20)

PPTX
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
PPTX
"Ecommerce Customer Segmentation & Prediction: Enhancing Business Strategies ...
PPTX
Music Recommendation System: A Data Science Project for Personalized Listenin...
PPTX
Mental Wellness Analyzer: Leveraging Data for Better Mental Health Insights -...
PPTX
Suddala-Scan: Enhancing Website Analysis with AI for Capstone Project at Bost...
PPTX
Fraud Detection in Cybersecurity: Advanced Techniques for Safeguarding Digita...
PPTX
Enhancing Brand Presence Through Social Media Marketing: A Strategic Approach...
PPTX
Employee Retention Prediction: Leveraging Data for Workforce Stability
PPTX
Predicting Movie Success: Unveiling Box Office Potential with Data Analytics
PPTX
Financial Fraud Detection: Identifying and Preventing Financial Fraud
PPTX
Smart Driver Alert: Predictive Fatigue Detection Technology
PPTX
Smart Driver Alert: Predictive Fatigue Detection Technology
PPTX
E-Commerce Customer Segmentation and Prediction: Unlocking Insights for Smart...
PPTX
Predictive Maintenance: Revolutionizing Vehicle Care with Demographic and Sen...
PPTX
Smart Driver Alert: Revolutionizing Road Safety with Predictive Fatigue Detec...
PDF
Water Potability Prediction: Ensuring Safe and Clean Water
PDF
Developing a Training Program for Employee Skill Enhancement
PPTX
Website Scanning: Uncovering Vulnerabilities and Ensuring Cybersecurity
PPTX
Analyzing Open Ports on Websites: Functions, Benefits, Threats, and Detailed ...
PPTX
Designing a Simple Python Tool for Website Vulnerability Scanning
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
"Ecommerce Customer Segmentation & Prediction: Enhancing Business Strategies ...
Music Recommendation System: A Data Science Project for Personalized Listenin...
Mental Wellness Analyzer: Leveraging Data for Better Mental Health Insights -...
Suddala-Scan: Enhancing Website Analysis with AI for Capstone Project at Bost...
Fraud Detection in Cybersecurity: Advanced Techniques for Safeguarding Digita...
Enhancing Brand Presence Through Social Media Marketing: A Strategic Approach...
Employee Retention Prediction: Leveraging Data for Workforce Stability
Predicting Movie Success: Unveiling Box Office Potential with Data Analytics
Financial Fraud Detection: Identifying and Preventing Financial Fraud
Smart Driver Alert: Predictive Fatigue Detection Technology
Smart Driver Alert: Predictive Fatigue Detection Technology
E-Commerce Customer Segmentation and Prediction: Unlocking Insights for Smart...
Predictive Maintenance: Revolutionizing Vehicle Care with Demographic and Sen...
Smart Driver Alert: Revolutionizing Road Safety with Predictive Fatigue Detec...
Water Potability Prediction: Ensuring Safe and Clean Water
Developing a Training Program for Employee Skill Enhancement
Website Scanning: Uncovering Vulnerabilities and Ensuring Cybersecurity
Analyzing Open Ports on Websites: Functions, Benefits, Threats, and Detailed ...
Designing a Simple Python Tool for Website Vulnerability Scanning
Ad

Recently uploaded (20)

PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Business Analytics and business intelligence.pdf
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Leprosy and NLEP programme community medicine
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
climate analysis of Dhaka ,Banglades.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Qualitative Qantitative and Mixed Methods.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Analytics and business intelligence.pdf
Microsoft Core Cloud Services powerpoint
Pilar Kemerdekaan dan Identi Bangsa.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Data_Analytics_and_PowerBI_Presentation.pptx
ISS -ESG Data flows What is ESG and HowHow
Leprosy and NLEP programme community medicine
DATA COLLECTION METHODS-ppt for nursing research
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg

Predicting Insurance Responses: Leveraging Data Science for Better Outcomes

  • 1. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Insurance Customer Response Prediction
  • 2. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Agenda • Dataset Overview • Exploratory Data Analysis (EDA) • Data Preprocessing • Model Selection and Cross-Validation • Model Training and Model Evaluation • Results and Interpretation
  • 3. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Dataset overview Introduction : In this section, we will analyze and predict customer responses to a health insurance marketing campaign using a dataset containing 50,882 instances and 14 variables. Dataset Description : The dataset includes key features such as ID, City code, Region, Accommodation and Recommended insurance type. The primary target variable, Response , indicates whether a customer accepted or rejected the recommended insurance. We will explore the dataset’s structure, summarize the unique values of each feature, and examine their data types to get a deeper understanding Data Source from [ Kaggle ]
  • 4. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Exploratory Data Analysis (EDA) Examine dataset dimensions: • The dataset contains 50,882 instances (rows) and 14 variables (columns). • Provides an overview of the data size and structure. Preview dataset: • Display the first few rows to inspect data entries and get an understanding of features. Check for missing values: •Identify missing or incomplete data by using functions like isnul(). •Assess how many entries are missing in each column and determine how to handle them. Analyze target variable : •Investigate the distribution of the target variable ‘Response’ to understand how many customers accepted (1) or rejected (0) the recommended insurance. Missing Values Target Variable
  • 5. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Explore unique values of categorical features: • Analyze variables like accommodation type , reco health and Health indicator Visualize relationships with target variable : • Use count plots and bar charts to visualize how categorical features relate to ‘response’ • Examine how features like accommodation type , health indicator etc Generate correlation heatmap: • Create a correlation heatmap to identify relationships between numerical and encoded categorical variables. • Helps to understand which features are strongly correlated with each other and with the target variable. Compare by target
  • 6. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Data Preprocessing Handle Missing Values: • Identify and fill missing values in key columns like Holding policy and Duration • Replace NaN values with 0 for new customers who don't have an existing policy Fill Missing Values in Categorical Features: • For Health indicator, fill missing values with a placeholder (x0), indicating missing health data Data Type Conversion: • Convert Holding policy duration and other columns with inconsistent types into numeric values for model compatibility.
  • 7. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Drop Irrelevant Columns: • Remove columns such as ID, City Code, and Region to avoid high cardinality and unnecessary data in the model Feature Encoding: • Apply Label Encoding to categorical variables like Health Indicator, Accommodation Type, and Reco Insurance Type for numerical modeling. Scaling: • Use StandardScaler to normalize numerical features like Upper Age, Lower Age, Reco Policy Premium, etc., ensuring they have a mean of 0 and standard deviation of 1 for better model performance.
  • 8. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Model Selection and Cross-Validation • Model Comparison: • Evaluate multiple models including SVM (Support Vector Machine), Decision Tree, Random Forest, XGBoost, and CatBoost to identify the best-performing model for the insurance response prediction task. • Cross-Validation: • Use K-Fold Cross Validation (5-fold) to assess each model's performance with accuracy as the evaluation metric. • Compute fold-wise accuracy and mean accuracy for each model to determine robustness
  • 9. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Click to edit Master title style Model Training and Evaluation Data Split: Split the dataset into training (80%) and testing (20%) sets to train the model and evaluate its performance. Model Training: Train the selected model (SVM with a linear kernel) on the training data using the optimal parameters. Prediction: Generate predictions on the test set and evaluate how well the model generalizes. Evaluation Metrics: •Accuracy Score: Calculate overall accuracy on the test data. •Confusion Matrix: Visualize model performance with true positive, true negative, false positive, and false negative rates. •Classification Report: Analyze detailed metrics, including precision, recall, and F1-score, for both classes (Accepted and Rejected)
  • 10. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Results and Interpretation Final Model Accuracy: The SVM model achieved an accuracy of 75% on the test set, showing it performs moderately well in predicting customer insurance responses based on this dataset. Confusion Matrix Insights: The confusion matrix indicates balanced prediction for customers likely to accept or reject insurance offers, with relatively few misclassifications. This balance reflects the model's ability to reasonably handle both positive and negative responses in the dataset. Conclusion : Based on the data, the model captures patterns in customer demographics and insurance preferences, helping predict purchase likelihood. With further tuning—such as refining features or trying other algorithms—the model’s performance could improve. This would support more efficient decision-making, helping the company better target high-potential customers
  • 11. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Questions ?
  • 12. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Thank You!