SlideShare a Scribd company logo
Data-Driven Off a Cliff
Anti-patterns in evidence-based decision making
Ketan Gangatirkar & Tom Wilbur
Data-Driven Off a Cliff
Anti-patterns in evidence-based decision making
Ketan Gangatirkar & Tom Wilbur
I help
people
get jobs.
Indeed is the #1 job site worldwide
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Headquartered in Austin, Texas
We have tons of ideas
We have tons of bad ideas
Occasionally, we have good ideas
It’s hard to tell the difference
What helps people get jobs?
The only reliable way is to see what works
XKCD http://bit.ly/1JWz6Qh
We set up experiments
We collect results
We use the data to decide what to do
We’ve used data to make good decisions
But having data is not a silver bullet
We’ve also used data to make bad decisions
Science is hard
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Problem
Running an experiment can ruin the experiment
Wikipedia http://bit.ly/1LkLPiP
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Change Effect on productivity
Brighter light UP
Dimmer light UP
Warmer UP
Cooler UP
Shorter breaks UP
Longer breaks UP
Change Effect on productivity
Brighter light UP (temporarily)
Dimmer light UP (temporarily)
Warmer UP (temporarily)
Cooler UP (temporarily)
Shorter breaks UP (temporarily)
Longer breaks UP (temporarily)
Change Effect on productivity
Brighter light UP (temporarily)
Dimmer light UP (temporarily)
Warmer UP (temporarily)
Cooler UP (temporarily)
Shorter breaks UP (temporarily)
Longer breaks UP (temporarily)
Problem
Statistics are hard
Anscombe’s Quartet
Wikipedia http://bit.ly/2dlTUci
Simpson’s Paradox
Simpson’s Paradox
Wikipedia http://bit.ly/1OHFSOk
Using data is more than just statistics
+ + + +
=
Good math. Bad idea.
Bad practices can undermine good math
You don’t need me to teach you
to be bad at math
I’ll teach you to be bad at everything else
Anti-Lesson 01
Be impatient
p-value is the standard measure of
statistical significance
p-value is by measurement, not experiment
If you check results on Monday,
that’s one measurement
If you check results on Tuesday,
that’s another measurement
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Got the result you want?
Declare victory!
Move quickly! Because
results and p-values can shift fast
of “winning” A/B tests stopped early
are false-positives
80%
http://bit.ly/1LtaLkV
Anti-Lesson 02
Sampling is easy
Beware the IEdes of March
Story
Building Used Cars Search
Shoppers specifying price, mileage
or year do better
Nudge shoppers to specify price,
mileage or year
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
+3% conversion
After rollout, conversion > +3%
Why?
We’d taken a shortcut in our
test assignment code
X
Users on oldest browsers got ignored
Distorted sample Distorted results
Anti-Lesson 03
Look only at one metric
If a little bit is good, a lot is great
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Indeed has a heart
Story
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
❤ > ★ ?
+16% Saves on search results page
Everyone ❤s ❤s!
❤s everywhere!
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Hearted
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Not so fast
Did ❤ help people get jobs?
❤ jobs: +16%
Clicks: no change
Applies: no change
Hires: no change
I help
people
❤ jobs.
Upsell team
Story
We formed an “upsell team”
and measured their results
+ =
Success measure
It’s working!
Upsells
So why isn’t revenue moving?
Overall Revenue
+ 0 -
= ⅓+⅓ -⅓
What you measure is what you motivate
Redefine success to include all outcomes
Upsell Team revenue +200%
Anti-Lesson 03: Reloaded
Look at all the metrics
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
It's better for them. Is it better for us?
Job applications: Up
Job clicks: Down
Recommended Jobs traffic: Up
Job views: Sideways
New resumes: Up
Return visits: Down
Logins: Up
Revenue: Down
(and it goes on…)
We didn’t really know what we wanted
Too much noise from too many metrics
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
I help
people
get jobs.
Anti-Lesson 04
Be sloppy with your analysis
We engineer features rigorously
Specification
Source control
Code review
Automated tests
Manual QA
Metrics
Monitors
...
But analysis…
Bad analysis won’t take down Indeed.com
200 million job seekers don’t care
about our sales projections
So we don’t try as hard with analysis code
Specification
Source control
Code review
Automated tests
Manual QA
Metrics
Monitors
...
Dubliners
Story
Indeed reports on economic trends
South Carolinians wanted to move to Dublin
Dublin?
No, the other one
Incorrect IP location mapping
IP blocks for South Carolina
got reallocated to London, England
Worse things can happen
Growth and Debt
Story
“Growth in a Time of Debt”
Carmen Reinhart and Kenneth Rogoff
2010
Public debt > 90% GDP
leads to slower economic growth
Governments made policy based on this
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Fixing the error eliminated the effect
Source: https://goo.gl/zAcd1e
Genetic Mutation
Story
20% of genetics papers have Excel errors
Source: http://wapo.st/2cWyrpJ
SEPT2 to a geneticist is Septin 2
SEPT2 to Excel is 42615
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Does your company use spreadsheets?
How do you know they’re correct?
Under-spending Advertisers
Story
Employer budgets ran out
before the end of the day
So no evening job seekers saw the jobs
How big was this missed opportunity?
Clicks received
1260
Out of budget time
20:00
% of day w/o budget
0.1667
Potential clicks
1260 / (1 - 0.1667) = 1512
Missed clicks
1512 * 0.1667 = 260
Missed Clicks Report
Dear Customer,
You got 1,260 clicks yesterday.
Your daily budget ran out at 8:00pm.
If you funded your budget through
the whole day, you’d get another
260 clicks - a +20% improvement!
Get More Clicks
Assumption
100
75
50
25
0
0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00
Missed = 260 clicks (+20%)
0:00
Reality
100
75
50
25
0
Missed = 100 clicks (+8%)
2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:000:00
Naive analysis bad recommendation
Anti-Lesson 05
Only look for expected outcomes
Zero results pages from misspelled locations
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Goals: fewer ZRPs, more job clicks
Zero-results pages
-2.7%
Job clicks
+8%
+1,410%
Ad revenue
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
+1,410%
Ad revenue
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
ads
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Ad revenue after fix
Treatment on
homepage
Effect on
search page
Anti-Lesson 06
Metrics, not stories
I help
people
get jobs.
How do I know if people got jobs?
I need employers to tell me
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
One employer hired 4500 people in 45 minutes!
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Nope
Accurate recording of outcomes helps us
It doesn’t help employers
They don't care about using
the product “right”
Go away!
There is no “user story”
Right metrics + wrong story = wrong conclusion
Anti-Lesson 06: Parte Deux
Story over metrics
Stories are seductive
Even incorrect stories are seductive
Taste Buds
Story
Taste map
Totally wrong
Every bite you eat proves it’s wrong
People still believe it
Job Alerts
Story
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Success for emails is well understood
New subscriptions: Good
Email opens: Good
Clicking on stuff: Good
Unsubscribing: Bad
I help
people
get emails.
I help
people
get jobs.
What does job seeker success look like?
01
Search for jobs
02
Sign up for alerts
03
Click on some jobs
04
Apply to some jobs
05
Get a job!
06
Unsubscribe from emails
People with new jobs don't need job alerts
The standard story for email fails here
Light and Dark Redux
Story
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
It’s a persuasive story
But the original study was flawed
Hawthorne Revisited
… the variance in productivity could be fully accounted for
by the fact that the lighting changes were made on
Sundays and therefore followed by Mondays when
workers’ productivity was refreshed by a day off.”
https://en.wikipedia.org/wiki/Hawthorne_effect
We con people with stories
We con ourselves with stories
Anti-Lesson 07
Believe in yourself
Believing in yourself can be good
“My startup will succeed.”
Often it’s bad
“I’d never fall for a scam like that.”
“I knew it all along.”
“I’m too smart to make that mistake.”
Every story of mistakes is deceptive
We tell stories with 20/20 hindsight
When we live the story, we live in the fog
You won’t think you’re making a mistake
Search your past for mistakes
Painful, embarrassing mistakes
If you didn’t find any, you’re exceptional
Either you’re making mistakes you find
Or you’re making mistakes you don’t find
How do you defend against mistakes?
The first step is admitting you have a problem
There are 174 cognitive biases
[citation needed]
Data can help you make better decisions
Or more confidently make bad decisions
Data can’t make you a better decision-maker
Good data + bad decision-maker = bad decision
Our anti-lessons teach you
how to use data badly
Do the opposite to do better
Lesson 01
Lesson 02
Lesson 03
Lesson 04
Lesson 05
Lesson 06
Lesson 07
Be patient
Sampling is hard
Focus on a few, carefully chosen metrics
Be rigorous with your analysis
Watch out for side effects
Use metrics and stories
Plan for fallibility
Learn from our mistakes
Be prepared for your own
Learn More
Engineering blog & talks http://indeed.tech
Open Source http://opensource.indeedeng.io
Careers http://indeed.jobs
Twitter @IndeedEng
Questions?
Contact us
ketan@indeed.com | twilbur@indeed.com
Seriously, that was the end
Contact us
ketan@indeed.com | twilbur@indeed.com
There are no more slides
Contact us
ketan@indeed.com | twilbur@indeed.com
Stop here
Contact us
ketan@indeed.com | twilbur@indeed.com

More Related Content

PDF
Weapons of Math Instruction: Evolving from Data0-Driven to Science-Driven
PDF
Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...
PDF
Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...
PDF
Improving the development process with metrics driven insights presentation
PDF
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
 
PDF
Be Data Informed Without Being a Data Scientist
PDF
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
PDF
Validation and hypothesis based product management by Abdallah Al-Khalidi
Weapons of Math Instruction: Evolving from Data0-Driven to Science-Driven
Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...
Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...
Improving the development process with metrics driven insights presentation
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
 
Be Data Informed Without Being a Data Scientist
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
Validation and hypothesis based product management by Abdallah Al-Khalidi

What's hot (20)

PPTX
Is data visualisation bullshit?
PDF
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
PPTX
Prioritization – 10 different techniques for optimizing what to start next ...
PDF
Test & Learn: The Alchemy & Science of Product Metrics - Choosing Metrics Tha...
PDF
[CXL Live 16] Opening Keynote by Peep Laja
 
PDF
The Optimisation Grand Unified Theory @ ConversionXL Live
PPTX
Master the Essentials of Conversion Optimization
PPTX
A/B testing, optimization and results analysis by Mariia Bocheva, ATD'18
PDF
[CXL Live 16] What to Test Next - Prioritizing Your Tests by Pauline Marol
 
PDF
LKCE18 Nicolas Brown - Coaching in a data-driven world
PDF
Gaps in the algorithm
PPTX
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
PPTX
Acceptance, accessible, actionable and auditable
PPTX
Lean Metrics
PDF
A/B Testing and the Infinite Monkey Theory
PDF
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
 
PDF
Always Be Testing - Learn from Every A/B Test (Hiten Shah)
PDF
[CXL Live 16] How to Create Landing Pages That Address the Emotional Needs of...
 
PPTX
Data Science and Goodhart's Law
PPTX
Cognifide content usabilitytesting-csa2017-v0.1
Is data visualisation bullshit?
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
Prioritization – 10 different techniques for optimizing what to start next ...
Test & Learn: The Alchemy & Science of Product Metrics - Choosing Metrics Tha...
[CXL Live 16] Opening Keynote by Peep Laja
 
The Optimisation Grand Unified Theory @ ConversionXL Live
Master the Essentials of Conversion Optimization
A/B testing, optimization and results analysis by Mariia Bocheva, ATD'18
[CXL Live 16] What to Test Next - Prioritizing Your Tests by Pauline Marol
 
LKCE18 Nicolas Brown - Coaching in a data-driven world
Gaps in the algorithm
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
Acceptance, accessible, actionable and auditable
Lean Metrics
A/B Testing and the Infinite Monkey Theory
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
 
Always Be Testing - Learn from Every A/B Test (Hiten Shah)
[CXL Live 16] How to Create Landing Pages That Address the Emotional Needs of...
 
Data Science and Goodhart's Law
Cognifide content usabilitytesting-csa2017-v0.1
Ad

Viewers also liked (11)

PPTX
Automation and Developer Infrastructure — Empowering Engineers to Move from I...
PDF
[@IndeedEng Talk] Diving deeper into data-driven product design
PPTX
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
PDF
[@IndeedEng] Imhotep Workshop
PDF
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
PDF
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
PDF
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
PDF
[@IndeedEng] Large scale interactive analytics with Imhotep
PDF
[@IndeedEng] Boxcar: A self-balancing distributed services protocol
PDF
[@IndeedEng] Logrepo: Enabling Data-Driven Decisions
PDF
[@IndeedEng] Managing Experiments and Behavior Dynamically with Proctor
Automation and Developer Infrastructure — Empowering Engineers to Move from I...
[@IndeedEng Talk] Diving deeper into data-driven product design
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
[@IndeedEng] Imhotep Workshop
[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
[@IndeedEng] Large scale interactive analytics with Imhotep
[@IndeedEng] Boxcar: A self-balancing distributed services protocol
[@IndeedEng] Logrepo: Enabling Data-Driven Decisions
[@IndeedEng] Managing Experiments and Behavior Dynamically with Proctor
Ad

Similar to Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making (20)

PDF
PPTX
How to Systematize Your Job Search in 2024
PPTX
It’s Not What You Did, Its Who You Are
PPTX
How to Systematize Your Job Search During Uncertain Times
PDF
Career by choice not by chance
PDF
THE SKILLS CURRENCY: UNLOCKING INTERNAL MOBILITY WITH REAL SKILLS DATA
PPTX
Non-Academic Job Search Presentation
PDF
Beyond the agile mindset agile 2012
PPT
Motivation & Challenges
PPT
How to find a job in another EU country?
PDF
Positioning yourself for success in technical careers
PDF
Looking for ajob
PDF
Searching For A Job 2010
PDF
You Need a Job?
PPTX
Concordia University Version 4
PPTX
Lynn's Truisms: Keys to Career Success
PDF
UXSG2014 Workshop (Day 1) - Lean Startup (Bryan Long)
PDF
Random thoughts about a changing business world
PDF
Mountaineering the agile transformation
PPT
Finding a job is a journey not a destination
How to Systematize Your Job Search in 2024
It’s Not What You Did, Its Who You Are
How to Systematize Your Job Search During Uncertain Times
Career by choice not by chance
THE SKILLS CURRENCY: UNLOCKING INTERNAL MOBILITY WITH REAL SKILLS DATA
Non-Academic Job Search Presentation
Beyond the agile mindset agile 2012
Motivation & Challenges
How to find a job in another EU country?
Positioning yourself for success in technical careers
Looking for ajob
Searching For A Job 2010
You Need a Job?
Concordia University Version 4
Lynn's Truisms: Keys to Career Success
UXSG2014 Workshop (Day 1) - Lean Startup (Bryan Long)
Random thoughts about a changing business world
Mountaineering the agile transformation
Finding a job is a journey not a destination

More from indeedeng (6)

PDF
Alchemy and Science: Choosing Metrics That Work
PPTX
Indeed My Jobs: A case study in ReactJS and Redux (Meetup talk March 2016)
PDF
Data Day Texas - Recommendations
PDF
Vectorized VByte Decoding
PDF
[@IndeedEng] Redundant Array of Inexpensive Datacenters
PDF
[@IndeedEng] Building Indeed Resume Search
Alchemy and Science: Choosing Metrics That Work
Indeed My Jobs: A case study in ReactJS and Redux (Meetup talk March 2016)
Data Day Texas - Recommendations
Vectorized VByte Decoding
[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Building Indeed Resume Search

Recently uploaded (20)

PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
TLE Review Electricity (Electricity).pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
The various Industrial Revolutions .pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
STKI Israel Market Study 2025 version august
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPT
What is a Computer? Input Devices /output devices
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Modernising the Digital Integration Hub
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
project resource management chapter-09.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
Final SEM Unit 1 for mit wpu at pune .pptx
Assigned Numbers - 2025 - Bluetooth® Document
TLE Review Electricity (Electricity).pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
OMC Textile Division Presentation 2021.pptx
The various Industrial Revolutions .pptx
Getting Started with Data Integration: FME Form 101
Univ-Connecticut-ChatGPT-Presentaion.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
STKI Israel Market Study 2025 version august
A novel scalable deep ensemble learning framework for big data classification...
Web App vs Mobile App What Should You Build First.pdf
1 - Historical Antecedents, Social Consideration.pdf
What is a Computer? Input Devices /output devices
Hindi spoken digit analysis for native and non-native speakers
Modernising the Digital Integration Hub
NewMind AI Weekly Chronicles – August ’25 Week III
observCloud-Native Containerability and monitoring.pptx
project resource management chapter-09.pdf
Zenith AI: Advanced Artificial Intelligence

Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making

Editor's Notes

  • #2: Good evening, thanks for coming to our @IndeedEng Tech Talk tonight.
  • #3: This is “Data-driven off a cliff, anti-patterns in evidence-based decision making”. I’m Tom Wilbur, and I’m a product manager at Indeed, and...
  • #4: I help people get jobs.
  • #5: Indeed is the #1 job site worldwide. We serve over 200M monthly unique users, across more than 60 countries and in 29 languages.
  • #6: The primary place that jobseekers start on Indeed is here - the search experience. It’s simple -- you type in some keywords and a location and you get a ranked list of jobs that are relevant to you..
  • #7: Indeed is headquartered here in Austin, Texas, the capital of the Lone Star State. Austin is also the location of our largest engineering office, and we have engineering offices around the world in Tokyo, Seattle, San Francisco and Hyderabad. So we have tons of smart engineers and product teams working around the clock to make a better Indeed. https://en.wikipedia.org/wiki/Flag_of_Texas#/media/File:Flag_of_Texas.svg
  • #8: We have tons of ideas.... BUT
  • #9: We have tons of bad ideas, too.
  • #10: Now occasionally we do have good ideas, but
  • #11: It’s hard to tell the difference. What we really want to know, is --
  • #12: What helps people get jobs? We believe...
  • #13: The only reliable way to know is just try stuff and see what works. (NEXT TO JOKE)
  • #14: (pause) So at Indeed,
  • #15: We set up experiments. We run A/B tests on our site where users are randomly assigned to different experiences.
  • #16: We collect results. We observe the users’ behavior. Our LogRepo system adds about 6TB of new data every day.
  • #17: And we use that data to decide what to do. To see which features and capabilities do help people get jobs, and which don’t.
  • #18: We’ve used data to make good decisions,
  • #19: But having a ton of data is not a silver bullet.
  • #20: We’ve also used data to make bad decisions. Because the truth is,
  • #21: Science is hard. (NEXT TO JOKE)
  • #22: (pause) For example, one serious problem is that the very act of just
  • #23: Running an experiment, can ruin the experiment itself. Let me tell you a quick story.
  • #24: There was a famous experiment conducted in the late 1920s at an electrical factory outside of Chicago, Illinois. Called the Hawthorne Works. The factory managers wanted to improve worker productivity, so they decided to try some changes to the worker environment.
  • #25: They changed the lighting conditions, sometimes brighter, sometimes dimmer. They changed the temperature in the factory, and length of breaks. Initially they were excited, as their early experiments resulted in improvements in worker productivity.
  • #26: Brighter lights? Productivity goes up! Dimmer lights? Productivity goes up! Warmer? Up! Cooler? Up. Shorter breaks, longer breaks, it seemed that everything they tried improved worker productivity. And on top of that, none of these improvements stuck.
  • #27: It all quickly faded. Ultimately the conclusion of the researchers was that the very fact of changing the conditions, of running the test, of observing the results, affected the workers’ behavior. This effect is now known as -- the Hawthorne Effect. Those of us that run experiments to optimize websites all over the world know this well. When we see a change in user behavior, we often ask the question, “but will it last? Is that change real, or is it just the Hawthorne Effect?” So science is hard. And if that wasn’t enough,
  • #28: It all quickly faded. Ultimately the conclusion of the researchers was that the very fact of changing the conditions, of running the test, of observing the results, affected the workers’ behavior. This effect is now known as -- the Hawthorne Effect. Those of us that run experiments to optimize websites all over the world know this well. When we see a change in user behavior, we often ask the question, “but will it last? Is that change real, or is it just the Hawthorne Effect?” So science is hard. And if that wasn’t enough,
  • #29: Statistics are hard. There are plenty of ways where an analysis can produce surprising if not contradictory results.
  • #30: For example, consider “Anscombe’s quartet”. In 1973, statistician Francis Anscombe described four very different sets of 11 points that all have the same basic statistical properties -- mean, variance, correlation, and as the blue line shows, regression. This demonstrates that looking at a statistical calculation isn’t at all sufficient to understand your data, especially when there are outliers. https://en.wikipedia.org/wiki/Anscombe%27s_quartet
  • #31: Another example is Simpson’s Paradox. This is where a statistician goes back in time with his toaster and starts accidentally changing the future and the more he tries to fix it, the worse it gets. There are no donuts, and people have lizard-tongues, and that’s just no way to make data-driven decisions. Wait no, that’s Homer Simpson’s Paradox from Treehouse of Horror V. Sorry.
  • #32: Edwin Simpson’s Paradox is something else. This result describes the situation where individual groups of data tell a different story than when the data are combined. On this chart for example, the four blue dots and four red dots each show a positive trend, but when combined, you get the black dotted line that shows a negative trend overall. Imagine if you saw that revenue for mobile was increasing, and revenue for desktop was increasing, but overall revenue appears to be decreasing. Now what do you do? Usually this situation means you don’t understand underlying causal relationships in your data. Because statistics are hard.
  • #33: But using data correctly is more than just statistics. If you apply good math to a bad idea...
  • #34: Just because it’s mathematically correct, doesn’t mean you won’t seriously regret the outcome of that test. http://www.glamour.com/images/health-fitness/2011/06/0606-tequila_at.jpg
  • #35: So bad practices can undermine good math.
  • #36: You don’t need me to teach you how to be bad at math.
  • #37: But tonight, I’ll teach you to be bad at everything else. On top of the inherent challenges of science, statistics and bad ideas, we’ll share with you our powerful techniques of how to make data-driven decisions… the wrong way.
  • #38: So, we’ll start with Anti-Lesson number 1. Be Impatient. One of the best ways to be bad at evidence-based decision making is to be impatient.
  • #39: A p-value is the standard measure of statistical significance. It represents the probability that the observed result would happen if the null hypothesis were true, or, informally, the chance that what you’ve measured is just random chance. For a successful A/B test, we want to see positive results with a p-value below some threshold, typically 5% or .05.
  • #40: But the calculation of p-value is by measurement, not the whole experiment. It only tells you how confident to be in your results given the circumstances of the test thus far.
  • #42: If you check results on Tuesday, that’s another measurement. Now your boss is asking if it’s significant yet. So you keep checking and checking,
  • #43: And your data scientist is muttering, saying you should just wait to get to the necessary sample size she estimated. It’s really frustrating. (pause) There’s a better way.
  • #44: Got the result you want? On that test that you knew was a good idea. Are the results already positive after only two days? And when you checked the p-value on your phone while in line at Starbucks was it at less than 0.05?
  • #45: Declare victory! Turn off the test and roll it 100%. Don’t waste your valuable time with that statistical wah wah wah about regression to the mean and probability of null hypothesis something.
  • #47: http://www.qubit.com/sites/default/files/pdf/mostwinningabtestresultsareillusory_0.pdf (Martin Goodson, Research Lead at Qubit, a UK web consultancy) In fact, Martin Goodson shows that if you were to do a check for significance every day, and stop positive tests as soon as they show significance, 80% of those “winning” A/B tests are likely false-positives. Are bogus results. And that’s why being impatient is a great way to make bad decisions.
  • #48: Another great way to do data-driven product development wrong is to believe that sampling is easy. I mean, it’s hard and time-consuming to make sure that you’ve got representative users in your A/B tests.
  • #49: Let me illustrate this with a story I call, “Beware the IEdes of March.” And you’ll see how well this anti-pattern worked for me.
  • #50: At a previous company where I worked, we were building Used Car search experiences for major media brands, and we were doing A/B tests to try to increase the probability that we successfully connect a car shopper to a dealer with matching inventory.
  • #51: One of the things we had observed when we analyzed successful user behavior, was that shoppers specifying price, mileage or year in their search do better. They’re more successful at finding cars they are interested in. So we had a hypothesis --
  • #52: Could we encourage shoppers to specify price, mileage or year, and improve conversion?
  • #53: We tried a couple ideas, including moving the price, mileage and year facets up in the search UI to make it easier to find, and we also tried a tooltip nudge, directly encouraging users to add these terms to their search.
  • #54: Of all the variants, the tooltip nudge wins, we saw a 3% lift in unique conversion (with a p-value of .04). So we decided to roll it out.
  • #57: It turns out, we’d taken a shortcut in our test assignment code. This was the summer of 2009, when IE had 60%+ of the US browser market, and my company, like many others, was sick and tired of supporting IE6 (the browser that PC World called “the least secure software on the planet”). So to work around a problem in our code that assigned users to test variants, we just didn’t handle IE6.
  • #58: So the users on the oldest browsers got ignored. This turned out to be 20%+ of users. And even worse, we learned those 20% didn’t behave the same as the remaining 80%. From later analysis and user research, we came to believe that users on the oldest browsers also shopped differently, for different cars. They were on average more price sensitive and benefitted more from that nudge.
  • #59: We’d depended on a distorted sample of the population. We went through all the effort to run a test, and a technical shortcut we took meant that we didn’t measure the results accurately. And we made an ill-informed decision. Because we thought sampling was easy.
  • #60: Which brings us to the third way I’ll teach you how to do data-driven decision making wrong. Look only at one metric. If there’s one thing we know in life, it’s that
  • #61: If a little bit is good, a lot is great. Anything worth doing is worth overdoing. I mean, there’s never a downside to that, is there?
  • #62: http://www.magpictures.com/resources/presskits/bsf/10.jpg
  • #63: Our first story I want to share about looking only at one metric, is called “Indeed has a heart,” and it’s about a test we ran in our mobile app.
  • #64: As jobseekers explore available jobs, they have the option to Save a job so they can easily come back to it later. We decided to test changing the icon associated with a Save from a star to a heart. We did this on job details page,
  • #65: And on the search results page.
  • #66: So, were hearts better than stars?
  • #67: They were! We observed a 16% increase in Saves on the search results page.
  • #68: Now, everyone loves hearts! We rolled our test out 100%. But why stop there? The obvious thing to do is
  • #69: To have hearts everywhere!
  • #70: Stars on your Amazon reviews?
  • #71: Nope! Hearts now.
  • #72: We sent our test results to Google, and in the next version of Gmail the Starred folder will be replaced with Hearted!
  • #73: And we’ve got a bill in front of the new state legislature. We’re all gonna live and work in the Lone Heart State!
  • #74: [sigh] Not so fast. Changing the stars to hearts improved the one metric we were looking at - usage of the “Save this job” feature, but
  • #75: Did Hearts help people get jobs?
  • #76: Sadly, no. There was no discernable impact on job seeker success. When we analyzed longer-term behavior of jobseekers, there was no evidence of an improvement in the primary metrics -- clicks, applies, hires. Which is unfortunate, because that’s our goal, not
  • #77: To help people heart jobs. What we had done was to focus only on one metric.
  • #78: If you really want to do evidence-based decision making wrong, you should make sure you look only at one metric in situations beyond your A/B tests. This anti-lesson can do damage all across your company. For example, at Indeed, we have a talented client services team that works with our customers to keep them engaged and highlight the value they’re receiving. Growing revenue from existing customers is clearly important, and we had a hypothesis that if we had a team focused only on that, we could be more successful.
  • #79: So, we formed a dedicated “upsell team” and measured their results on a dashboard.
  • #80: What we looked for was when there was a upsell contact with a customer, and then subsequently the customer’s spend went up, we credited the rep for that increase on the dashboard. This was also tied to a bonus program. So we started off, and
  • #81: the dashboard told us it’s working! Reported upsells on the dashboard showed lots of wins, 10s of thousands of $$.
  • #82: But when we stepped back, revenue for the total pool of accounts wasn’t increasing.
  • #83: As it turned out, not every contact between a rep and a customer results in an increase in spend. Our naive dashboard looked only at one metric - the positive outcomes.
  • #84: But in reality, Some are neutral. Some are negative. And so it didn’t measure the right result. In fact, when you’re showing people a metric about their performance,
  • #85: What you measure is what you motivate. In talking to the reps, because our dashboard only looked at the positive outcomes, they were less interested in contacting customers who were planning to lower their spend. The incentives were only about getting to an increase, nothing else mattered. So we made a change.
  • #86: We redefined success to include all the outcomes, updated the dashboard and continued the experiment of the upsell team. After that one change, we saw more diverse interactions, and better results!
  • #87: The Upsell Team’s revenue increased by 200%, and we decided to continue the experiment and grow the team. So we saw two examples there about how looking only at one metric, especially when it’s an easily-computed feature metric or maybe the first metric you thought of, is a great way to do evidence-based decision making wrong. Now, that anti-lesson has a flip-side, too -- Caveats: not an A/B test, lots of confounding factors, small sample size, team got better at their job, grain of salt, etc. But we also can directly observe the actors in this story, so we focus on how the metric affected behavior.
  • #88: Because another secret to making bad data-driven decisions is to look at all the metrics. For this anti-lesson, we’ll return to Indeed’s mobile app.
  • #89: We were comparing our mobile app to other companies’ apps and noticed a growing adoption of a particular way to indicate a menu. They were using what’s now popularly known as the “hamburger menu”. One of our product managers stole the idea...
  • #90: (pause) And we decided to test a hamburger menu to improve Indeed’s mobile app.
  • #91: It’s better for them, is it better for us? Let’s look at the results.
  • #92: [read through list, growing more confused] <click> at Logins (pause) What we realized was...
  • #93: We didn’t really know what we wanted. We didn’t start our test with a goal in mind for what the hamburger menu was supposed to do. So when the metrics came back with conflicting answers, we couldn’t know if the change was any good.
  • #94: There was too much noise from too many metrics. We ended up leaving this test running for a looong time hoping the right decision would become clear. It didn’t. We had lots of discussions and email threads and meetings where “seriously we need to make a decision about the hamburger test.” In the end, we turned it off, so there’s no hamburger in the Indeed mobile app. In this case, by not starting with a clear goal, and by looking at all the metrics, we spent a lot of time and energy and failed at making a good evidence-based decision.
  • #95: Tom: Now I’d like to introduce my colleague Ketan who will teach us about even more exciting ways to make bad decisions. Ketan?
  • #100: Who’s got time for rigorous analysis? Just give me an Excel spreadsheet.
  • #103: We often don’t even see it as code
  • #120: https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-alarming-number-of-scientific-papers-contain-excel-errors/ http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
  • #123: https://www.buzzfeed.com/scott/disaster-girl
  • #127: Do you have a spec?
  • #128: Do you have a spec?
  • #129: Do you have a spec?
  • #139: http://go.indeed.com/RZtb4csgenvm
  • #140: http://go.indeed.com/RZ3rtm7ddtot http://go.indeed.com/RZbcaq1a72dd <<<
  • #141: http://go.indeed.com/RZ2sqmo2u6kk http://go.indeed.com/RZg3llr4991l << TODO
  • #143: http://go.indeed.com/RZ2sqmo2u6kk http://go.indeed.com/RZg3llr4991l
  • #144: There are no keyword ads on this page
  • #150: http://go.indeed.com/RZ6afqh4pci2 changed to: http://go.indeed.com/RZmmebqb3uvj
  • #170: http://winetimeshk.com/admin/wp-content/uploads/2015/08/tongue-map.gif
  • #206: We didn't.
  • #208: Some of the mistakes I'm telling you about were painful and embarrassing for us.
  • #209: Or you think you are
  • #220: …. But there’s one big lesson remaining
  • #222: …. But there’s one big lesson remaining
  • #223: …. But there’s one big lesson remaining