SlideShare a Scribd company logo
1
It’s the cornerstone of many of the biggest businesses in the US, including
Google & Amazon, and the backbone of most scientific undertakings.

2
There’s plenty of cheerleading for data so I want to spend today on
cautionary tales & advice, in the hope of helping keep data more on the Iron
Man side of the Robert Downey Jr spectrum.

3
I love using numbers & testing to understand the world, I’m not a data hater by any
means.
If you want to know about medieval Transylvania or the Ottoman invasion of
Hungary, I’m you’re woman. That didn’t get me far on the job market.
Partly to do something different, partly to do games. But despite that intention it
hasn’t been as different as I expected – user acquisition for games and catalogs is
fundamentally the same. But Kongregate, because we’re an open platform with over
70k games and now a mobile game publisher, has provided some incredibly rich
opportunities for data mining.

4
Part of the reason I’m telling you this is to make my first point:
And for an organization to do data right you can’t toss analysis back and forth
over a wall to quants. It takes intimate knowledge of a game (and the
development) to do good analysis and multiple perspectives and theories are
good.

5
Sometimes it’s immediately obvious – we had a mobile game we launched
recently, an endless runner, that wasn’t filtering purchases from jailbroken
phones and was showing an ARPPU of $500, not very plausible and easily
caught. But most issues are much more subtle – tracking pixels not firing
correctly for a particular game on a particular browser, tutorial steps being
completed twice by some players but not by others, clients reporting strange
timestamps, etc.
For this reason you should never rely on any analytic system where you can’t
go in and inspect individual records. If you can’t check the detail you’ll never
be able to find and fix problems. We use Google Analytics for reference and
corroboration but nothing very crucial, and are using it less and less because
of this.

6
This looks like 4 separate pictures photoshopped together to create an
appealing color grid, right?

7
Wrong.
So much of data is like these pictures – a set-up that appears
straightforwardly to be one thing from one angle, turns out to be completely
different from another.

8
Except of course you know I’m setting you up

9
I mentioned lifetime conversion and showed daily ARPPU. Lifetime
conversion may be similar between the two games, but daily conversion is
40% higher for game 1.
This is why $/DAU is not a very interesting stat on its own. If someone
quotes just D1 retention and $/DAU that’s not enough information to judge
how a game monetizes.

10
It’s a living, changing system. Flat views are not enough.

11
So here are a series of likely traps analysts can fall into. I know I have.
They’re not in a particular order of importance because they’re all important.
We tend to think of playerbases as monolithic but really they are
aggregations of all sorts of subgroups.
It’s sort of like watching a meal go through a snake.
Though with time cohorts it’s easy to lose track of events and changes in the
game, so you can’t rely on those, either.

12
13
14
You may have noticed that win rates got a bit wacky towards the later missions of the graph
of the last chart – this is a sample size issue.
Even games that overall have very substantial playerbases like Tyrant may end up with
small sample sizes when you’re looking at uncommon behavior in subgroups.
Early test market data is often tantalizing & fascinating, but it’s often the most unreliable
because you’re combining small sample sizes and a non-representative subgroup – the
people who discover you first are the most hard-core.

15
not normal (bell-curve) distributions, which affects everything.
Theoretically it’s not even possible to calculate the average value of a power distribution
since the infrequent top values could be infinite.
The sample size depends on the frequency of the event – tutorial completion & D1
retention should be fine with just a few hundred users, % buyer with 500+, but I don’t like
to look at ARPPU with much less than 5,000. These are just my rules of thumb based on
experience and probably have no mathematical basis.

16
17
18
If you ask small questions you’ll usually get small answers. And the dirty
secret of testing is that most test are inconclusive anyway. It’s hard to move
important metrics. So prioritize tests that significanly affect the game, like
energy limits and regeneration over button colors.

19
Your existing players are used to things working a certain way – a change in
UI that makes things clearer for a new player may annoy/confuse an elder
player. Where possible I like to test disruptive changes on new players only,
and then roll out the test to other players if the test proves successful. A
pricing change that increases non-buyer conversion might reduce repeat
buyer revenue.
For example if you’re A/B testing your store, don’t assign people to the test
unless they interact with the store. It’s often easier to split people as they
arrive in your game, or some other thing, but a) there’s a chance you would
end up with non-equal distribution of interaction with the tested feature and
b)any signal from the test group would get lost in the noise of a larger
sample.

20
Early results tend to be both volatile and fascinating – differences are
exaggerated or totally change direction. People tend to remember the early,
interesting results rather than the actual results. People also often want to
end the test early if they see a big swing, which is a bad idea.
We tested to see what gain we were getting from bonusing buying larger
currency packages, which had to be judged on total revenue to make sure
we were capturing both transaction size and repeat purchase factors. To
make sure the 15% lift was real we broke buyers into cohorts by how much
they’d spent ($0-$10, $100-$200, $200-$500, etc) and checked the
distribution in each test. On the bonus side of the test we saw fewer buyer
<$20 and 30%+ gains on all the cohorts above $100+, so we were confident
that the gain was not being driven but a few big spenders.
Again this should be worked backward from the frequency and distribution of
the metric you’re judging the test on. There’s internet calculators to help you
figure out what you need to get to statistical significance given an expected
lift. My advice (if you have the playerbase and patience) is to then double or
triple that. Why do I want my sample sizes so much bigger than the
minimum?

21
It comes down to some of the issues with judging results by statistical
significance itself. It doesn’t mean what you probably think it means.
Statistical significance tests assume that there is some true difference in lift,
and that if you test there will be a bell curve distribution of results, with the
true lift as the average. Your 5% result could be right on the mean, or it could
be an outlier on either end. If it’s statistically significant then the chance is
low (usually 5% or less) that there’s no lift at all. But the true lift could be 1%
or 10%.
It’s possible you’d get two outlier results in the same direction, but becomes less and less
likely, and more likely that your test results represent the true mean. And the size of the
effect you are testing does matter as it helps you understand the relative importance of
different factors, and what to prioritize testing next.

22
For example we’ve had a lot of tests that increased registration and reduced
retention, so much so that we now judge tests based on % retained
registrations because that’s what we really care about, but that’s not always
possible.

23
A good example of this is adding a Facebook login button on our website. If a
player comes back on a different browser they need to be able to login.

24
25
This is about how you think about your business.

26

More Related Content

PDF
Don't Call Them Whales: Free-to-Play Spenders & Virtual Value GDC 2015
PPTX
Core Games, Real Numbers: Asian & Western Games
PDF
Effective Testing of Free-to-Play Games
PDF
Emily Greer at GDC 2018: Data-Driven or Data-Blinded?
PDF
Breaking Labels: Core, Casual, and Other Misconceptions -- Casual Connect Eur...
PDF
GDC 2014 Core Games, Real Numbers: Going Cross-Platform
PDF
Building Games for the Long-Term
PPTX
Maximizing Monetization - Casual Connect SF 2013
Don't Call Them Whales: Free-to-Play Spenders & Virtual Value GDC 2015
Core Games, Real Numbers: Asian & Western Games
Effective Testing of Free-to-Play Games
Emily Greer at GDC 2018: Data-Driven or Data-Blinded?
Breaking Labels: Core, Casual, and Other Misconceptions -- Casual Connect Eur...
GDC 2014 Core Games, Real Numbers: Going Cross-Platform
Building Games for the Long-Term
Maximizing Monetization - Casual Connect SF 2013

What's hot (19)

PPT
F2P Design Crash Course (Casual Connect Kyiv 2013)
PDF
GDC Talk: Lifetime Value: The long tail of Mid-Core games
PDF
GDC Talk - Nature vs Nurture: Unpacking Player Spending in F2P Games
PPTX
The Rise and Rise of Idle Games
PDF
Kongregate Web Games Partnership Opportunities
PDF
Taptica facts and figures january
PDF
Epic Games Author Info Pack - Vince Cavin web
PPT
Metrics for a Brave New Whirled
PPTX
Benchmarks and metrics
PPT
A mysterious adventure_in_social_games_final
PPTX
DavidPChiu Kongregate - Maximizing Player Retention and Monetization in Free-...
PPTX
Transmedia, Gamification, Advergaming
PPTX
Top Reasons Why Your Mobile Game Will (Likely) Fail | Chris Olson, Ethan Einhorn
PDF
Korean Market: small country, huge potential for gaming revenue
PPTX
Idle Chatter - GDC 2016
PDF
UC San Diego's Big Data Specialization Capstone
PPTX
Idle Games: The Mechanics and Monetization of Self-Playing Games
PPTX
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
PPTX
Emily Greer at GDC 2018: Data-Driven or Data-Blinded?
F2P Design Crash Course (Casual Connect Kyiv 2013)
GDC Talk: Lifetime Value: The long tail of Mid-Core games
GDC Talk - Nature vs Nurture: Unpacking Player Spending in F2P Games
The Rise and Rise of Idle Games
Kongregate Web Games Partnership Opportunities
Taptica facts and figures january
Epic Games Author Info Pack - Vince Cavin web
Metrics for a Brave New Whirled
Benchmarks and metrics
A mysterious adventure_in_social_games_final
DavidPChiu Kongregate - Maximizing Player Retention and Monetization in Free-...
Transmedia, Gamification, Advergaming
Top Reasons Why Your Mobile Game Will (Likely) Fail | Chris Olson, Ethan Einhorn
Korean Market: small country, huge potential for gaming revenue
Idle Chatter - GDC 2016
UC San Diego's Big Data Specialization Capstone
Idle Games: The Mechanics and Monetization of Self-Playing Games
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Emily Greer at GDC 2018: Data-Driven or Data-Blinded?
Ad

Similar to Data Gone Wrong - GDCNext 2013 (20)

PPTX
UK GIAF: Winter 2015
PPTX
Massively multiplayer data challenges in mobile game analytics
PPTX
Massively multiplayer data challenges in mobile game analytics
PPTX
Big Data in Mobile Gaming - Eric Seufert presentation from IGExpo Feb 1 2013
PPTX
PDF
Knowing How People Are Playing Your Game Gives You the Winning Hand
PDF
13 jun13 gaming-webinar
PPTX
Quick Introduction to F2P Mobile Game Analytics
PDF
What can media learn from game analytics
PPT
20 Questions You Should Ask Yourself and Your Team If You Want To Be A Succes...
PPT
Vlad Micu “20 things succesful game developers do beyond making games”
PDF
Product analytics evolution: analytics approaches on different stages of the ...
PDF
How user segmentation and personalizing offers improved monetization for Geewa?
PPT
20 Things Successful Game Developers Do Beyond Making Games
PPTX
Playtest from AAA to III
PPT
Product manager and game analytics
PPT
Sergei Romanenko, PlayFlock
PPTX
A Case for Predictive Analytics - Aren Arakelyan - Fiveonenine games
PPTX
Free2 play soft launch obtaining tangible results through action-oriented a...
PPTX
[DSC Europe 23][Pandora] Siyu SUN Data Science Enter The Game.pptx
UK GIAF: Winter 2015
Massively multiplayer data challenges in mobile game analytics
Massively multiplayer data challenges in mobile game analytics
Big Data in Mobile Gaming - Eric Seufert presentation from IGExpo Feb 1 2013
Knowing How People Are Playing Your Game Gives You the Winning Hand
13 jun13 gaming-webinar
Quick Introduction to F2P Mobile Game Analytics
What can media learn from game analytics
20 Questions You Should Ask Yourself and Your Team If You Want To Be A Succes...
Vlad Micu “20 things succesful game developers do beyond making games”
Product analytics evolution: analytics approaches on different stages of the ...
How user segmentation and personalizing offers improved monetization for Geewa?
20 Things Successful Game Developers Do Beyond Making Games
Playtest from AAA to III
Product manager and game analytics
Sergei Romanenko, PlayFlock
A Case for Predictive Analytics - Aren Arakelyan - Fiveonenine games
Free2 play soft launch obtaining tangible results through action-oriented a...
[DSC Europe 23][Pandora] Siyu SUN Data Science Enter The Game.pptx
Ad

Recently uploaded (20)

PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
Nante Industrial Plug Factory: Engineering Quality for Modern Power Applications
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PDF
Daniels 2024 Inclusive, Sustainable Development
PDF
Cours de Système d'information about ERP.pdf
PPTX
Astra-Investor- business Presentation (1).pptx
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PDF
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
PDF
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PDF
Solaris Resources Presentation - Corporate August 2025.pdf
PDF
Module 2 - Modern Supervison Challenges - Student Resource.pdf
PDF
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
PDF
How to Get Funding for Your Trucking Business
PDF
How to Get Approval for Business Funding
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
How to Get Business Funding for Small Business Fast
PDF
Booking.com The Global AI Sentiment Report 2025
Slide gioi thieu VietinBank Quy 2 - 2025
Nante Industrial Plug Factory: Engineering Quality for Modern Power Applications
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
Daniels 2024 Inclusive, Sustainable Development
Cours de Système d'information about ERP.pdf
Astra-Investor- business Presentation (1).pptx
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
NEW - FEES STRUCTURES (01-july-2024).pdf
Solaris Resources Presentation - Corporate August 2025.pdf
Module 2 - Modern Supervison Challenges - Student Resource.pdf
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
How to Get Funding for Your Trucking Business
How to Get Approval for Business Funding
Tata consultancy services case study shri Sharda college, basrur
Digital Marketing & E-commerce Certificate Glossary.pdf.................
How to Get Business Funding for Small Business Fast
Booking.com The Global AI Sentiment Report 2025

Data Gone Wrong - GDCNext 2013

  • 1. 1
  • 2. It’s the cornerstone of many of the biggest businesses in the US, including Google & Amazon, and the backbone of most scientific undertakings. 2
  • 3. There’s plenty of cheerleading for data so I want to spend today on cautionary tales & advice, in the hope of helping keep data more on the Iron Man side of the Robert Downey Jr spectrum. 3
  • 4. I love using numbers & testing to understand the world, I’m not a data hater by any means. If you want to know about medieval Transylvania or the Ottoman invasion of Hungary, I’m you’re woman. That didn’t get me far on the job market. Partly to do something different, partly to do games. But despite that intention it hasn’t been as different as I expected – user acquisition for games and catalogs is fundamentally the same. But Kongregate, because we’re an open platform with over 70k games and now a mobile game publisher, has provided some incredibly rich opportunities for data mining. 4
  • 5. Part of the reason I’m telling you this is to make my first point: And for an organization to do data right you can’t toss analysis back and forth over a wall to quants. It takes intimate knowledge of a game (and the development) to do good analysis and multiple perspectives and theories are good. 5
  • 6. Sometimes it’s immediately obvious – we had a mobile game we launched recently, an endless runner, that wasn’t filtering purchases from jailbroken phones and was showing an ARPPU of $500, not very plausible and easily caught. But most issues are much more subtle – tracking pixels not firing correctly for a particular game on a particular browser, tutorial steps being completed twice by some players but not by others, clients reporting strange timestamps, etc. For this reason you should never rely on any analytic system where you can’t go in and inspect individual records. If you can’t check the detail you’ll never be able to find and fix problems. We use Google Analytics for reference and corroboration but nothing very crucial, and are using it less and less because of this. 6
  • 7. This looks like 4 separate pictures photoshopped together to create an appealing color grid, right? 7
  • 8. Wrong. So much of data is like these pictures – a set-up that appears straightforwardly to be one thing from one angle, turns out to be completely different from another. 8
  • 9. Except of course you know I’m setting you up 9
  • 10. I mentioned lifetime conversion and showed daily ARPPU. Lifetime conversion may be similar between the two games, but daily conversion is 40% higher for game 1. This is why $/DAU is not a very interesting stat on its own. If someone quotes just D1 retention and $/DAU that’s not enough information to judge how a game monetizes. 10
  • 11. It’s a living, changing system. Flat views are not enough. 11
  • 12. So here are a series of likely traps analysts can fall into. I know I have. They’re not in a particular order of importance because they’re all important. We tend to think of playerbases as monolithic but really they are aggregations of all sorts of subgroups. It’s sort of like watching a meal go through a snake. Though with time cohorts it’s easy to lose track of events and changes in the game, so you can’t rely on those, either. 12
  • 13. 13
  • 14. 14
  • 15. You may have noticed that win rates got a bit wacky towards the later missions of the graph of the last chart – this is a sample size issue. Even games that overall have very substantial playerbases like Tyrant may end up with small sample sizes when you’re looking at uncommon behavior in subgroups. Early test market data is often tantalizing & fascinating, but it’s often the most unreliable because you’re combining small sample sizes and a non-representative subgroup – the people who discover you first are the most hard-core. 15
  • 16. not normal (bell-curve) distributions, which affects everything. Theoretically it’s not even possible to calculate the average value of a power distribution since the infrequent top values could be infinite. The sample size depends on the frequency of the event – tutorial completion & D1 retention should be fine with just a few hundred users, % buyer with 500+, but I don’t like to look at ARPPU with much less than 5,000. These are just my rules of thumb based on experience and probably have no mathematical basis. 16
  • 17. 17
  • 18. 18
  • 19. If you ask small questions you’ll usually get small answers. And the dirty secret of testing is that most test are inconclusive anyway. It’s hard to move important metrics. So prioritize tests that significanly affect the game, like energy limits and regeneration over button colors. 19
  • 20. Your existing players are used to things working a certain way – a change in UI that makes things clearer for a new player may annoy/confuse an elder player. Where possible I like to test disruptive changes on new players only, and then roll out the test to other players if the test proves successful. A pricing change that increases non-buyer conversion might reduce repeat buyer revenue. For example if you’re A/B testing your store, don’t assign people to the test unless they interact with the store. It’s often easier to split people as they arrive in your game, or some other thing, but a) there’s a chance you would end up with non-equal distribution of interaction with the tested feature and b)any signal from the test group would get lost in the noise of a larger sample. 20
  • 21. Early results tend to be both volatile and fascinating – differences are exaggerated or totally change direction. People tend to remember the early, interesting results rather than the actual results. People also often want to end the test early if they see a big swing, which is a bad idea. We tested to see what gain we were getting from bonusing buying larger currency packages, which had to be judged on total revenue to make sure we were capturing both transaction size and repeat purchase factors. To make sure the 15% lift was real we broke buyers into cohorts by how much they’d spent ($0-$10, $100-$200, $200-$500, etc) and checked the distribution in each test. On the bonus side of the test we saw fewer buyer <$20 and 30%+ gains on all the cohorts above $100+, so we were confident that the gain was not being driven but a few big spenders. Again this should be worked backward from the frequency and distribution of the metric you’re judging the test on. There’s internet calculators to help you figure out what you need to get to statistical significance given an expected lift. My advice (if you have the playerbase and patience) is to then double or triple that. Why do I want my sample sizes so much bigger than the minimum? 21
  • 22. It comes down to some of the issues with judging results by statistical significance itself. It doesn’t mean what you probably think it means. Statistical significance tests assume that there is some true difference in lift, and that if you test there will be a bell curve distribution of results, with the true lift as the average. Your 5% result could be right on the mean, or it could be an outlier on either end. If it’s statistically significant then the chance is low (usually 5% or less) that there’s no lift at all. But the true lift could be 1% or 10%. It’s possible you’d get two outlier results in the same direction, but becomes less and less likely, and more likely that your test results represent the true mean. And the size of the effect you are testing does matter as it helps you understand the relative importance of different factors, and what to prioritize testing next. 22
  • 23. For example we’ve had a lot of tests that increased registration and reduced retention, so much so that we now judge tests based on % retained registrations because that’s what we really care about, but that’s not always possible. 23
  • 24. A good example of this is adding a Facebook login button on our website. If a player comes back on a different browser they need to be able to login. 24
  • 25. 25
  • 26. This is about how you think about your business. 26