SlideShare a Scribd company logo
A/B Testing - Customer Experience Platformexperimentation using Pearson’s Chi-
Squared Test
Aurangzeb Khan
Senior Data Analyst
rana.aurangzeb@hotmail.com
MBA, University of Wollongong, Australia
Abstract: E-commerce giants design and run frequent campaigns on their touch points which includes
website to attract more and more customers. The purpose of this paper is to investigate the effectiveness
of a newly launched web page for consumers and find out if the new page is resulting into different
consumer behavior and/or more website visits and conversion. The ‘Chi-Square Test of Independence’
helps us find out if the different user groups of old and new web page are significantly different from each
other based on conversion rate or not!
The Business Problem
As described in Kaggle (Kaggle link is here), an e-commerce company has designed a new web
page for a website to attract more customers. The e-commerce company wants to investigate if
it should implement the new page or keep the old web page.
Many of the times the consumer/user groups are exposed to and/ or studied based on different
situation (before and after a change) to find out if there is a significant difference in terms of their
performance/consumer behavior using some set of metrics like web site visits, click-through-rate
and the conversion rate. The ‘control’ consumer group is exposed to the ‘old page’ and the
‘treatment’ consumer group is exposed to the ‘new page’ of the website. Now the e-commerce
company wants to know that if the two consumergroups are significantly different from each other
in terms of conversion rate & hence consumer behavior.
Analytical Problem
To determine if the two user/consumer groups exposed to ‘old page’ vs ‘new page’ (consumer
group being the categorical variable) are different in term of click-through-rate and conversion
rates we recommend using Pearson’s Chi-Squared Test for Independence. The Chi-Square
Test is suitable for quantifying the independence of pairs of categorical variables i.e the click and
non-click behavior of the consumers against the website page design.
Chi Square also tell us if the input variable has significant impact on the output variable and hence
will let us choose or drop certain variables when we decide to continue feature selection for the
Analysis.
Formula : Chi Square
Image source: Author
Fo: Observed Frequencies
Fe: Expected Frequencies
Steps to conduct Chi-Square of Independence
1. Data wrangling & data consolidation in the shape of contingency table
2. Hypothesis Formulation & Decision Rule
3. Data Visualization
4. Test Statistics calculation
5. Conclusion
1. Data wrangling & Data consolidation in the shape of contingency table
The pairs of categorical variables i.e user group and the click/non-click variables will be displayed
in a contingency table to help us visualize the frequency distribution of the variables.
Below is an example of how the overall data should look like:
Click No-Click Click + No-
Click
Old Page 17489 127785 145,274
New Page 17264 128047 145,311
Old Page + New
Page
34,753 255,832 290,585
Table 1.0: Sample Format of the Data
For Chi Square Test we need the Data in below format
Click No-Click
Old Page 17489 127785
New Page 17264 128047
Table 2.0: Sample format for Contingency Table
First of all, we need to import the Python (Data Analysis programming language) libraries and
the data from Kaggle and visualize it.
Python Code:
# Import necessary Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import scipy
import matplotlib.pyplot asplt
# The data has been taken fromKaggle per below link
# https://www.kaggle.com/zhangluyuan/ab-testing
df = pd.read_csv('ab_data.csv')
df.head()
Table 3.0: Simple Data Visualization
Let's perform few steps to validate if the data is clean and is ready for Chi Square Test
Python Code:
# The control group represents the users of Old Page
# The treatment group represents the users of new Page
# Let’s see how the data looks like
df.groupby(['group','landing_page']).count()
Table 4.0: Data Aggregation for Visualization
We have noticed above that some users in ‘control group’ have visited ‘new page’ and the data
is wrongly classifying against our objectives. We have also noticed that some users in
‘treatment group’ have visited ‘old page’ and the data is wrongly classified against our
objectives. Now instead of cleaning the data we can only pick the relevant correct data
(control/new_page and treatment/old_page) with the help of below Python code.
Python Code:
# from 'Control Group' we only need old page
# From 'Treatment Group' we only neednew page
df_cleaned= df.loc[(df['group'] == 'control') & (df['landing_page'] == 'old_page') |(df['group'] == 'treatment')
& (df['landing_page'] == 'new_page')]
df_cleaned.groupby(['group','landing_page']).count()
Table 5.0: Cleansed and consolidated Data for both user groups
Finding Duplicates
Python Code:
# Checking for duplicate values
print(df_cleaned['user_id'].duplicated().sum())
# Finding user_idfor duplicate value
df_cleaned[df_cleaned.duplicated(['user_id'],keep=False)]['user_id']
# Now we need to drop the Duplicates
df_cleaned= df.drop_duplicates(subset='user_id',keep="first")
Preparing the Contingency Tabe for Chi-Square Test
### To prepare and arrange the Data for Chi-Square Contigency Table
# 1) Take out the Control group
control = df_cleaned[df_cleaned['group'] == 'control']
# 2) Take out the Treatment group
treatment = df_cleaned[df_cleaned['group'] == 'treatment']
# 2A) A-click -i.e The ones who convertedfrom Control group
control_click = control.converted.sum()
# 2B) No-click,i.e The one who did not click fromControl group
control_noclick = control.converted.size -control.converted.sum()
#3 B-click, B-noclick
# 3A) A-click -i.e The ones who convertedfrom Treatment group
treatment_click = treatment.converted.sum()
# 2B) No-click,i.e The one who did not click fromTreatment group
treatment_noclick = treatment.converted.size - treatment.converted.sum()
# 3) Create np array
Table = np.array([[control_click, control_noclick], [treatment_click, treatment_noclick]])
print(Table)
2. Hypothesis Formulation & Decision Rule
Null Hypothesis
H0:
The ‘control’ user group and ‘treatment’ user group are independent in terms of their conversion rate.
Alterative hypothesis
H1:
The ‘control’ user group and ‘treatment’ user group are dependent and different in terms of their
conversion rate
Level of significance
For this test, we assume that α = 0.05 or Confidence Interval = 95%
Decision Rule
If p-value is less than Level of significance (5%) then we will Reject Null Hypothesis (H0).
3. Data Visualization
Let’s printthe multidimensional array thatwe created in Python :
Click No-Click
Old Page 17471 127761
New Page 17274 128078
Table 6.0: Chi-Square Test Contingency Table
4. Test Statistics Calculations
To perform the Test let’s import the necessary Python libraries and get the following parameters
1. Test Statistics
2. P- Value
3. Degree of Freedom
4. Expected Frequencies
Python Code:
import scipy
from scipy importstats
# The correction will Adjustthe observerdvalue by .5 towards the corressponding ExpectedValues
stat,p,dof,expected = scipy.stats.chi2_contingency(Table,correction=True)
print('nStat : ',stat)
print('nP-Value : ',p)
print('nDegree of Freedom : ',dof)
print('nObservedFrequencies: ',Table)
print('nExpectedFrequencies: ',expected)
Snapshot 1.0: Chi-Square Test Results
Python Code:
# interpret p-value
alpha = 1.0 - .95
if p <= alpha:
print('Dependent(reject H0)')
else:
print('Independent(fail to reject H0)')
5. Conclusion:
The p-value is 22.9% at 5% level of significance. As the p-value is greater than alpha so we do
not Reject the Null Hypothesis
The old and new page's users did not behave significantly different and the conversion ratio is
not significantly different. Hence, the new web page is not different from the old one.
The conversion rate is considered independent as the observed and expected frequencies are
similar, the variables do not interact and are not dependent.
THE END

More Related Content

PPTX
Mba2216 week 11 data analysis part 02
PDF
linear regression analysis in spss (procedure and output)
PDF
multiple linear regression in spss (procedure and output)
PDF
Yelp Rating Prediction
PDF
Two way anova in spss (procedure and output)
PDF
chi square goodness of fit test (expected ratio) (procedure and output)
PPTX
Spss &amp; rsm copy
PDF
chi square goodness of fit test (equal ratio) (procedure and output)
Mba2216 week 11 data analysis part 02
linear regression analysis in spss (procedure and output)
multiple linear regression in spss (procedure and output)
Yelp Rating Prediction
Two way anova in spss (procedure and output)
chi square goodness of fit test (expected ratio) (procedure and output)
Spss &amp; rsm copy
chi square goodness of fit test (equal ratio) (procedure and output)

What's hot (15)

PPTX
Marketing Optimization Augmented Analytics Use Cases - Smarten
PPTX
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
PPTX
Model Calibration and Uncertainty Analysis
PPT
Gordoncorr
DOCX
Lab report templante for 10th and 9th grade
DOCX
Lab report templete
DOCX
PRM project report
PPSX
Multivariate Analysis An Overview
PPTX
RapidMiner: Nested Subprocesses
PDF
Lecture 1 practical_guidelines_assignment
PPT
Lobsters, Wine and Market Research
PPTX
Ch. 4-demand-estimation(2)
PPTX
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
PPTX
Conjoint ppt final one
PDF
MidTerm memo
Marketing Optimization Augmented Analytics Use Cases - Smarten
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Model Calibration and Uncertainty Analysis
Gordoncorr
Lab report templante for 10th and 9th grade
Lab report templete
PRM project report
Multivariate Analysis An Overview
RapidMiner: Nested Subprocesses
Lecture 1 practical_guidelines_assignment
Lobsters, Wine and Market Research
Ch. 4-demand-estimation(2)
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
Conjoint ppt final one
MidTerm memo
Ad

Similar to A/B Testing - Customer Experience Platform experimentation using Pearson’s Chi-Squared Test (20)

PPTX
Hypothesis test business research methods presentation
PPTX
Chi squared test for digital analytics
PDF
Implementing and analyzing online experiments
PDF
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
PPTX
Chi-Square Test assignment Stat ppt.pptx
PPTX
statistical measurement project presentation
PPTX
Combinatorial testing ppt
PDF
Combinatorial testing
PDF
A/B Testing - Design, Analysis and Pitfals
DOCX
TOPIC Bench-marking Testing1. Windows operating system (Microso.docx
PDF
Chi sqaure test
PDF
LEARN SPSS (Statistical Package for the Social Sciences) RESEARCH GRADE 9
PPTX
Final presentation
PPT
Feature-selection-techniques to be used in machine learning algorithms
PDF
Building a Regression Model using SPSS
PDF
Unit-4.-Chi-squjkljl;jj;ljl;jlm;lml;mare.pdf
PPTX
Chi-square test.pptx
PPTX
Some nonparametric statistic for categorical &amp; ordinal data
PPTX
Non parametric test- Muskan (M.Pharm-3rd semester)
PDF
Andrii Belas: A/B testing overview: use-cases, theory and tools
Hypothesis test business research methods presentation
Chi squared test for digital analytics
Implementing and analyzing online experiments
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
Chi-Square Test assignment Stat ppt.pptx
statistical measurement project presentation
Combinatorial testing ppt
Combinatorial testing
A/B Testing - Design, Analysis and Pitfals
TOPIC Bench-marking Testing1. Windows operating system (Microso.docx
Chi sqaure test
LEARN SPSS (Statistical Package for the Social Sciences) RESEARCH GRADE 9
Final presentation
Feature-selection-techniques to be used in machine learning algorithms
Building a Regression Model using SPSS
Unit-4.-Chi-squjkljl;jj;ljl;jlm;lml;mare.pdf
Chi-square test.pptx
Some nonparametric statistic for categorical &amp; ordinal data
Non parametric test- Muskan (M.Pharm-3rd semester)
Andrii Belas: A/B testing overview: use-cases, theory and tools
Ad

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Clinical guidelines as a resource for EBP(1).pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction-to-Cloud-ComputingFinal.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Reliability_Chapter_ presentation 1221.5784
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to machine learning and Linear Models
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
IBA_Chapter_11_Slides_Final_Accessible.pptx
Database Infoormation System (DBIS).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
Business Acumen Training GuidePresentation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Clinical guidelines as a resource for EBP(1).pdf

A/B Testing - Customer Experience Platform experimentation using Pearson’s Chi-Squared Test

  • 1. A/B Testing - Customer Experience Platformexperimentation using Pearson’s Chi- Squared Test Aurangzeb Khan Senior Data Analyst rana.aurangzeb@hotmail.com MBA, University of Wollongong, Australia Abstract: E-commerce giants design and run frequent campaigns on their touch points which includes website to attract more and more customers. The purpose of this paper is to investigate the effectiveness of a newly launched web page for consumers and find out if the new page is resulting into different consumer behavior and/or more website visits and conversion. The ‘Chi-Square Test of Independence’ helps us find out if the different user groups of old and new web page are significantly different from each other based on conversion rate or not! The Business Problem As described in Kaggle (Kaggle link is here), an e-commerce company has designed a new web page for a website to attract more customers. The e-commerce company wants to investigate if it should implement the new page or keep the old web page. Many of the times the consumer/user groups are exposed to and/ or studied based on different situation (before and after a change) to find out if there is a significant difference in terms of their performance/consumer behavior using some set of metrics like web site visits, click-through-rate and the conversion rate. The ‘control’ consumer group is exposed to the ‘old page’ and the ‘treatment’ consumer group is exposed to the ‘new page’ of the website. Now the e-commerce company wants to know that if the two consumergroups are significantly different from each other in terms of conversion rate & hence consumer behavior. Analytical Problem To determine if the two user/consumer groups exposed to ‘old page’ vs ‘new page’ (consumer group being the categorical variable) are different in term of click-through-rate and conversion rates we recommend using Pearson’s Chi-Squared Test for Independence. The Chi-Square Test is suitable for quantifying the independence of pairs of categorical variables i.e the click and non-click behavior of the consumers against the website page design. Chi Square also tell us if the input variable has significant impact on the output variable and hence will let us choose or drop certain variables when we decide to continue feature selection for the Analysis.
  • 2. Formula : Chi Square Image source: Author Fo: Observed Frequencies Fe: Expected Frequencies Steps to conduct Chi-Square of Independence 1. Data wrangling & data consolidation in the shape of contingency table 2. Hypothesis Formulation & Decision Rule 3. Data Visualization 4. Test Statistics calculation 5. Conclusion 1. Data wrangling & Data consolidation in the shape of contingency table The pairs of categorical variables i.e user group and the click/non-click variables will be displayed in a contingency table to help us visualize the frequency distribution of the variables. Below is an example of how the overall data should look like: Click No-Click Click + No- Click Old Page 17489 127785 145,274 New Page 17264 128047 145,311 Old Page + New Page 34,753 255,832 290,585 Table 1.0: Sample Format of the Data
  • 3. For Chi Square Test we need the Data in below format Click No-Click Old Page 17489 127785 New Page 17264 128047 Table 2.0: Sample format for Contingency Table First of all, we need to import the Python (Data Analysis programming language) libraries and the data from Kaggle and visualize it. Python Code: # Import necessary Libraries import numpy as np import pandas as pd import seaborn as sns import scipy import matplotlib.pyplot asplt # The data has been taken fromKaggle per below link # https://www.kaggle.com/zhangluyuan/ab-testing df = pd.read_csv('ab_data.csv') df.head() Table 3.0: Simple Data Visualization
  • 4. Let's perform few steps to validate if the data is clean and is ready for Chi Square Test Python Code: # The control group represents the users of Old Page # The treatment group represents the users of new Page # Let’s see how the data looks like df.groupby(['group','landing_page']).count() Table 4.0: Data Aggregation for Visualization We have noticed above that some users in ‘control group’ have visited ‘new page’ and the data is wrongly classifying against our objectives. We have also noticed that some users in ‘treatment group’ have visited ‘old page’ and the data is wrongly classified against our objectives. Now instead of cleaning the data we can only pick the relevant correct data (control/new_page and treatment/old_page) with the help of below Python code. Python Code: # from 'Control Group' we only need old page # From 'Treatment Group' we only neednew page df_cleaned= df.loc[(df['group'] == 'control') & (df['landing_page'] == 'old_page') |(df['group'] == 'treatment') & (df['landing_page'] == 'new_page')] df_cleaned.groupby(['group','landing_page']).count() Table 5.0: Cleansed and consolidated Data for both user groups
  • 5. Finding Duplicates Python Code: # Checking for duplicate values print(df_cleaned['user_id'].duplicated().sum()) # Finding user_idfor duplicate value df_cleaned[df_cleaned.duplicated(['user_id'],keep=False)]['user_id'] # Now we need to drop the Duplicates df_cleaned= df.drop_duplicates(subset='user_id',keep="first") Preparing the Contingency Tabe for Chi-Square Test ### To prepare and arrange the Data for Chi-Square Contigency Table # 1) Take out the Control group control = df_cleaned[df_cleaned['group'] == 'control'] # 2) Take out the Treatment group treatment = df_cleaned[df_cleaned['group'] == 'treatment'] # 2A) A-click -i.e The ones who convertedfrom Control group control_click = control.converted.sum() # 2B) No-click,i.e The one who did not click fromControl group control_noclick = control.converted.size -control.converted.sum() #3 B-click, B-noclick # 3A) A-click -i.e The ones who convertedfrom Treatment group treatment_click = treatment.converted.sum() # 2B) No-click,i.e The one who did not click fromTreatment group treatment_noclick = treatment.converted.size - treatment.converted.sum() # 3) Create np array Table = np.array([[control_click, control_noclick], [treatment_click, treatment_noclick]]) print(Table) 2. Hypothesis Formulation & Decision Rule Null Hypothesis H0: The ‘control’ user group and ‘treatment’ user group are independent in terms of their conversion rate. Alterative hypothesis H1: The ‘control’ user group and ‘treatment’ user group are dependent and different in terms of their conversion rate Level of significance For this test, we assume that α = 0.05 or Confidence Interval = 95% Decision Rule If p-value is less than Level of significance (5%) then we will Reject Null Hypothesis (H0).
  • 6. 3. Data Visualization Let’s printthe multidimensional array thatwe created in Python : Click No-Click Old Page 17471 127761 New Page 17274 128078 Table 6.0: Chi-Square Test Contingency Table 4. Test Statistics Calculations To perform the Test let’s import the necessary Python libraries and get the following parameters 1. Test Statistics 2. P- Value 3. Degree of Freedom 4. Expected Frequencies Python Code: import scipy from scipy importstats # The correction will Adjustthe observerdvalue by .5 towards the corressponding ExpectedValues stat,p,dof,expected = scipy.stats.chi2_contingency(Table,correction=True) print('nStat : ',stat) print('nP-Value : ',p) print('nDegree of Freedom : ',dof) print('nObservedFrequencies: ',Table) print('nExpectedFrequencies: ',expected) Snapshot 1.0: Chi-Square Test Results
  • 7. Python Code: # interpret p-value alpha = 1.0 - .95 if p <= alpha: print('Dependent(reject H0)') else: print('Independent(fail to reject H0)') 5. Conclusion: The p-value is 22.9% at 5% level of significance. As the p-value is greater than alpha so we do not Reject the Null Hypothesis The old and new page's users did not behave significantly different and the conversion ratio is not significantly different. Hence, the new web page is not different from the old one. The conversion rate is considered independent as the observed and expected frequencies are similar, the variables do not interact and are not dependent. THE END