SlideShare a Scribd company logo
amshar@microsoft.com
1http://www.github.com/amit-sharma/causal-inference-tutorial
2
3
4
5
Use these correlations to make a predictive model.
Future Activity ->
f(number of friends, logins in past month)

6
7
8
9
10
11
12
13
14
15
16
17
18
19
Old Algorithm (A) New Algorithm (B)
50/1000 (5%) 54/1000 (5.4%)
20
Old Algorithm (A) New Algorithm (B)
10/400 (2.5%) 4/200 (2%)
Old Algorithm (A) New Algorithm (B)
40/600 (6.6%) 50/800 (6.2%)
0
2
4
6
8
Low-activity High-activity
CTR
Is Algorithm A better?
Old algorithm (A) New Algorithm
(B)
CTR for Low-
Activity users
10/400 (2.5%) 4/200 (2%)
CTR for High-
Activity users
40/600 (6.6%) 50/800 (6.2%)
Total CTR 50/1000 (5%) 54/1000 (5.4%)
21
22
Average comment length decreases over time.
23
But for each yearly cohort of users, comment length
increases over time.
24
25
26
27http://plato.stanford.edu/entries/causation-mani/
28http://plato.stanford.edu/entries/causation-counterfactual/
29
30
31
32
33
34
35
36
37
38
39
40
41Dunning (2002), Rosenzweig-Wolpin (2000)
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Does new Algorithm B increase CTR for recommendations on
Windows Store, compared to old algorithm A?
Does new Algorithm B increase CTR for recommendations on
Windows Store, compared to old algorithm A?
56
57
58
59
60
61
62
63
64
65
𝑷𝒓𝒐𝒑𝒆𝒏𝒔𝒊𝒕𝒚 𝑁𝑒𝑤𝐴𝑙𝑔𝑜 𝑈𝑠𝑒𝑟𝑖 = 𝑳𝒐𝒈𝒊𝒔𝒕𝒊𝒄(𝑎 𝑐𝑎𝑡1, 𝑎 𝑐𝑎𝑡2, … 𝑎 𝑐𝑎𝑡𝑛)
Compare CTR between users with the same propensity score.
66
67
68
69
Non-FriendsEgo Network
f5
u
f1
f4
f3f2
n5
u
n1
n4
n3n2
70
71
72
73http://tylervigen.com/spurious-correlations
74
http://www.github.com/amit-sharma/causal-inference-
tutorial
amshar@microsoft.com
75
https://www.github.com/amit-sharma/causal-inference-tutorial
76
77
78
79
80
81
> nrow(user_app_visits_A)
[1] 1,000,000
> length(unique(user_app_visits_A$user_id))
[1] 10,000
> length(unique(user_app_visits_A$product_id))
[1] 990
> length(unique(user_app_visits_A$category))
[1] 10
82
83
84
> user_app_visits_B = read.csv("user_app_visits_B.csv")
> naive_observational_estimate <- function(user_visits){
# Naive observational estimate
# Simply the fraction of visits that resulted in a recommendation click-
through.
est =
summarise(user_visits,
naive_estimate=sum(is_rec_visit)/length(is_rec_visit))
return(est)
}
> naive_observational_estimate(user_app_visits_A)
naive_estimate
[1] 0.200768
> naive_observational_estimate(user_app_visits_B)
naive_estimate
[1] 0.226467
85
86
> stratified_by_activity_estimate(user_app_visits_A)
Source: local data frame [4 x 2]
activity_level stratified_estimate
1 1 0.1248852
2 2 0.1750483
3 3 0.2266394
4 4 0.2763522
> stratified_by_activity_estimate(user_app_visits_B)
Source: local data frame [4 x 2]
activity_level stratified_estimate
1 1 0.1253469
2 2 0.1753933
3 3 0.2257211
4 4 0.2749867
87
> stratified_by_category_estimate(user_app_visits_A)
Source: local data frame [10 x 2]
category stratified_estimate
1 1 0.1758294
2 2 0.2276829
3 3 0.2763157
4 4 0.1239860
5 5 0.1767163
… … …
> stratified_by_category_estimate(user_app_visits_B)
Source: local data frame [10 x 2]
category stratified_estimate
1 1 0.2002127
2 2 0.2517528
3 3 0.3021371
4 4 0.1503150
5 5 0.1999519
… … …
88
89
90
91
92
> naive_observational_estimate(user_app_visits_A)
naive_estimate
[1] 0.200768
> ranking_discontinuity_estimate(user_app_visits_A)
discontinuity_estimate
[1] 0.121362
40% of app visits coming from recommendation click-
throughs are not causal.
Could have happened even without the
recommendation system.
93
94
95
amshar@microsoft.com

More Related Content

PPTX
Privacy preserving computing and secure multi-party computation ISACA Atlanta
PPTX
Buffer overflow attacks
PPTX
Diffie hellman key algorithm
PPTX
Causal inference in data science
PPT
Elgamal Digital Signature
PPTX
Coldplay’s Hymn For The Weekend
PDF
Alamo ACE - Threat Hunting with CVAH
Privacy preserving computing and secure multi-party computation ISACA Atlanta
Buffer overflow attacks
Diffie hellman key algorithm
Causal inference in data science
Elgamal Digital Signature
Coldplay’s Hymn For The Weekend
Alamo ACE - Threat Hunting with CVAH

Similar to Causal inference in online systems: Methods, pitfalls and best practices (11)

PPTX
causal_inference_extended_tutorial.pptx
PPTX
The Impact of Computing Systems | Causal inference in practice
PPTX
Measuring effectiveness of machine learning systems
PDF
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
PDF
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
PDF
Predictive Analytics with UX Research Data: Yes We Can!
PDF
The User Side of Personalization: How Personalization Affects the Users
PDF
Supercharge your AB testing with automated causal inference - Community Works...
PPTX
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
PDF
Business Optimization via Causal Inference
PDF
Causal reasoning and Learning Systems
causal_inference_extended_tutorial.pptx
The Impact of Computing Systems | Causal inference in practice
Measuring effectiveness of machine learning systems
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
Predictive Analytics with UX Research Data: Yes We Can!
The User Side of Personalization: How Personalization Affects the Users
Supercharge your AB testing with automated causal inference - Community Works...
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Business Optimization via Causal Inference
Causal reasoning and Learning Systems
Ad

More from Amit Sharma (18)

PPTX
Dowhy: An end-to-end library for causal inference
PPTX
Alleviating Privacy Attacks Using Causal Models
PPTX
DoWhy Python library for causal inference: An End-to-End tool
PPTX
Artificial Intelligence for Societal Impact
PPTX
Causal data mining: Identifying causal effects at scale
PPTX
Auditing search engines for differential satisfaction across demographics
PPTX
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
PPTX
Estimating the causal impact of recommender systems
PPTX
Predictability of popularity on online social media: Gaps between prediction ...
PPTX
Data mining for causal inference: Effect of recommendations on Amazon.com
PPTX
Estimating influence of online activity feeds on people's actions
PPTX
From prediction to causation: Causal inference in online systems
PPTX
Causal inference in practice
PPTX
Causal inference in practice: Here, there, causality is everywhere
PPTX
The interplay of personal preference and social influence in sharing networks...
PDF
The role of social connections in shaping our preferences
PDF
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
PDF
RSWEB 2013: A research platform for social recommendation
Dowhy: An end-to-end library for causal inference
Alleviating Privacy Attacks Using Causal Models
DoWhy Python library for causal inference: An End-to-End tool
Artificial Intelligence for Societal Impact
Causal data mining: Identifying causal effects at scale
Auditing search engines for differential satisfaction across demographics
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Estimating the causal impact of recommender systems
Predictability of popularity on online social media: Gaps between prediction ...
Data mining for causal inference: Effect of recommendations on Amazon.com
Estimating influence of online activity feeds on people's actions
From prediction to causation: Causal inference in online systems
Causal inference in practice
Causal inference in practice: Here, there, causality is everywhere
The interplay of personal preference and social influence in sharing networks...
The role of social connections in shaping our preferences
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
RSWEB 2013: A research platform for social recommendation
Ad

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Lecture1 pattern recognition............
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
1_Introduction to advance data techniques.pptx
Business Analytics and business intelligence.pdf
.pdf is not working space design for the following data for the following dat...
STUDY DESIGN details- Lt Col Maksud (21).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
oil_refinery_comprehensive_20250804084928 (1).pptx
Database Infoormation System (DBIS).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Reliability_Chapter_ presentation 1221.5784
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Fluorescence-microscope_Botany_detailed content
Lecture1 pattern recognition............
ISS -ESG Data flows What is ESG and HowHow
Galatica Smart Energy Infrastructure Startup Pitch Deck
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data

Causal inference in online systems: Methods, pitfalls and best practices