SlideShare a Scribd company logo
Modern Data Stack for Game
Analytics
Disclaimer All thoughts are mine. Based on my experience and
environment I worked over decade.
Outline and Take
Away points
Outline:
● About myself and Microsoft
● Data Analytics Framework History
● Game Analytics Intro
● Modern Data Stack Overview
● Architectures from the Industry
● The Coalition Data Stack
Takeaways:
● All modern game studios require
analytics
● Privacy is critical
● Modern Data Analytics Open source and
commercial products
● Reference Architecture for the Game
Platform
● Challenges during designing of analytics
solution
- 11+ years in Analytics
- Moscow, Montenegro, Winnipeg, Vancouver, Victoria, Seattle,
Boston
- 5 years @Amazon, now @Microsoft, The Coalition
- Tableau, Snowflake, Microsoft, AWS user groups and meetups
Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The Coalition)
Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The Coalition)
● Gears of War: Ultimate Edition (XB1, Win10 | 2015 | Metascore 82/73)
● Gears of War 4 (XB1, Win10 | 2016 | Metascore 84/86)
● Gears Pop! (Mobile | 2019 )
● Gears of War 5 (XB1, Win10 | 2019 | Metascore 85/82)
● Gears Tactics (XB1, PC | 2020 | Metascore 81)
● Gears 5: Hivebusters (XB1, PC | 2020 | Metascore 82)
What the heck -
Analytics!?
What is Analytics?
Gaming Data Consumers
● Leadership
● Producers
● Artists
● Game Play Engineers
● QA Engineers
● Community Managers
Microsoft Privacy and Online Safety
https://privacy.microsoft.com/en-ca/privacystatement https://support.xbox.com/en-CA/help/family-online-safety/online-safety/privacy
3 Game Analytics Goals
Strategic Analytics - target the global view how the game should
evolve based on analysis of user behavior and the business model.
Tactical Analytics - inform game design at the short term.
Operational Analytics - analysis and evaluation in immediate situation.
Telemetry as a source of Player data
The word Telemetry is derived from the Greek roots tele,
"remote", and metron, "measure".
Games are state machines - a person creates a continual
loop of actions and responses which keep the game state
changing. Often loops keeping the user engages over a
period of time.
Telemetry helps to discovering who is performing what
action when and where in the game. It cannot provide why.
3 types of metrics
Gameplay metrics
user behavior in the
game
Community metrics
user engagement in
communities and social
media
Customer metrics
user as a customer,
acquisition and
retention
Action Third-Person Shooters (TPS) Metrics
● Weapon use
● trajectory
● item/asset use
● character/kit choice
● level/map choice
● loss/win
● heatmaps
● team scores
● map lethality
● map balance
● vehicle use metrics
● special moves
● jumps and many more.
Death map, Halo3
https://coolinfographics.com/blog/2009/1/12/halo-3-
heatmaps.html
Role of Data Engineer
My role is a DE to make sure that we have a infrastructure in place to collect,
transform and consolidate data for customer, community and gameplay
metrics.
The infrastructure is responsible for Strategic and Tactical Analytics during
development and post production.
Key Milestones in the Analytics Industry
● Relational Databases
● Custom software
● MPP Data warehouse
● Enterprise ETL
● Enterprise BI Tools
● Data Mining Tools
● Big Data: Hadoop, Hive, Spark (on-premise)
● Data Lake
● DataScience, R, Python
● Cloud Computing
● AWS Redshift, Azure SQL
DW (Synapse), Google
BigQuery, Snowflake,
Databricks
● ML frameworks
● ETL -> ELT
Synapse Analytics
Traditional
(Legacy) Data
Stack
Traditional approach
Batch (ETL)
Source Layer Data Processing Storage Business
Business
Intelligence
● Ad-hoc queries
● Pixel Perfect Reports
● Cross tables (Pivot)
Game Client
Data Warehouse
Data Storage Layer | Data Warehouse
SMP - Symmetric Multi-Processing
● Traditionally one server systems
● Data stored locally
● Processors share single OS, memory, I/O devices
● Scale-up only - physical limitations to scaling to
accommodate workload
MPP - Massively Parallel Processing
● Multi-node(server) systems
● Data stored externally
● Scale-out - add more Compute nodes, each with
dedicated CPU, memory & I/O subsystems
● No single point of contention
Data Storage Layer | Data Lake
Modern Data
Stack
Analytics architecture evolution
- Prior 2010 mostly Data Warehouse (SMP, MPP).
- With rise of Hadoop - shift towards data lake. Decouple Compute and Storage but lack of ACID
(Atomicity, Consistency, Isolation, Durability).
- Lake house = Data Warehouse + Data Lake.
Lakehouse Options
● Transaction Support (ACID)
● Schema Enforcement
● Upserts/Deletes
Key solutions:
● Apache Hudi (Hadoop
Update Delete and
Incremental) by Uber
Engineering
● Apache Iceberg by Netflix
● Delta Lake by Apache Spark
Streaming
Batch
(ETL/ELT)
Modern Data Stack
Source Layer Data Processing Storage
Science &
Experimentation
Business
Datascience
Machine
Learning
Business
Intelligence
Game Client
Streaming
Batch
(ETL/ELT)
Modern Key Layers and roles
Source Layer Data Processing Storage
Science &
Experimentation
Business
Datascience
Machine
Learning
Business
Intelligence
Data Engineer
ML Engineer
Data Scientist
BI Engineer
Product Manager - manage data product.
Game Client
Modern Data Stack with Open Source
Source Layer Data Processing Storage
Science &
Experimentation
Business
Spark Pool
MLlib
Game Client
Event Hub
Stream, Analytics
Batch
(ETL/ELT)
Modern Data Stack with Microsoft Azure
Source Layer Data Processing Storage
Science &
Experimentation
Business
Spark pools
Spark Pool
MLlib
Azure ML
(not in
Synapse)
Serverless
Pool
Azure
Synapse
Studio
Dedicated
SQL pool
Azure Data
Lake v2 Serverless
Pool
ADX | Ingesting
Modern Data with Azure Data Explorer (ADX)
Source Layer Data Processing Storage
Science &
Experimentation
Business
ADX | Data
Science
Azure Data
Explorer | Storage
Kusto / ADX
Solution: Fortnite on AWS
https://aws.amazon.com/solutions/case-studies/EPICGames/
Solution: Sega
Solution: WildLife Games
https://databricks.com/session_eu20/using-machine-learning-at-scale-a-gaming-industry-experience
Solution: Future Games Studio
https://youtu.be/9uec5ujkuCA
Solution: Electronic Arts Studio
https://youtu.be/ot1Qzdszvsc
Solution: Mobile Game Analytics from GCP
https://cloud.google.com/architecture/mobile-gaming-analysis-telemetry
The Coalition
Data Stack
How it was
Source Layer Data Processing Storage
Science &
Experimentation
Business
Azure Cloud
On-Premise
Game
Client/Server
The Coalition
Data Lake
Event Grid
Streaming
Azure Data
Factory |
Batch (ELT)
How it is going
Source Layer Data Processing Storage
Science &
Experimentation Business
The Coaltion
Data Lake
Game
Client/Server
Spark Structured
Streaming*
Azure Data Lake
Storage
V2(Compute)
Spark MLlib
*Spark Structured Streaming – not in production. We are testing it.
**Spark Mllib and Mlflow – part of the future vision
Data Engineering Design Flow as a Funnel
Event Names:
● Weapon Use
● Damage
● Shooting
● Flock
● Map Name
● HeartBeat
● and so on
Raw Tables (Bronze)
Method: Append
Trans: Minimum
Staging Tables (Silver)
Method: Append
Trans: JSON Schema
Fact Tables (Gold)
Method: Merge
Trans: Heavy
● Cross team collaboration between SDE vs DE, BI vs DE, DE vs DS
● Low data volume before Launch
● Schema Evolution
● Cost and Budgeting
● Security best practice (for example credentials)
● Privacy and compliance (GDPR, HIPAA lack of data form ML/AI)
● Data Quality at Scale (Deequ, Great Expectations)
● Responsible AI
Key Challenges
● There is no bad solution/vendor
● Focus on business outcome (working
backwards)
● Engineering Excellence (dev/prod, CI/CD)
● You can build solution using Code (Python, Java,
Scala, SQL and so on) or GUI (with some
restrictions).
● Security and Privacy best practices
Summary
For more information visit: https://www.thecoalitionstudio.com/join-us/
The Coalition is looking for talented and diverse people to join our squad, with exciting opportunities across our Art, Design, Engineering, and
Production teams.
Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The Coalition)

More Related Content

PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
PPTX
Platforms, Platform Engineering, & Platform as a Product
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PPTX
The Importance of DataOps in a Multi-Cloud World
PDF
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Platforms, Platform Engineering, & Platform as a Product
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Catalogs Are the Answer – What is the Question?
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
The Importance of DataOps in a Multi-Cloud World
Data Governance Takes a Village (So Why is Everyone Hiding?)

What's hot (20)

PDF
How to Create a Data Analytics Roadmap
 
PDF
Modern Data architecture Design
PDF
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PDF
Activate Data Governance Using the Data Catalog
PDF
DevOps for Databricks
PPTX
Microsoft Fabric Introduction
PDF
Using Databricks as an Analysis Platform
PPTX
data-analytics-strategy-ebook.pptx
PDF
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
PDF
Databricks Overview for MLOps
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PDF
Self-Service Analytics Framework - Connected Brains 2018
PPT
Data Architecture for Data Governance
PPTX
Hadoop Migration to databricks cloud project plan.pptx
PDF
Introdution to Dataops and AIOps (or MLOps)
PDF
Data Architecture Best Practices for Advanced Analytics
PDF
Enterprise Architecture vs. Data Architecture
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Webinar Data Mesh - Part 3
How to Create a Data Analytics Roadmap
 
Modern Data architecture Design
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Activate Data Governance Using the Data Catalog
DevOps for Databricks
Microsoft Fabric Introduction
Using Databricks as an Analysis Platform
data-analytics-strategy-ebook.pptx
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
Databricks Overview for MLOps
Enabling a Data Mesh Architecture with Data Virtualization
Self-Service Analytics Framework - Connected Brains 2018
Data Architecture for Data Governance
Hadoop Migration to databricks cloud project plan.pptx
Introdution to Dataops and AIOps (or MLOps)
Data Architecture Best Practices for Advanced Analytics
Enterprise Architecture vs. Data Architecture
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Webinar Data Mesh - Part 3
Ad

Similar to Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The Coalition) (20)

PDF
OpenNebulaConf 2019 - Crytek: A Video gaming Edge Implementation "on the shou...
PDF
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
PDF
게임을 위한 아마존웹서비스(AWS) (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
PDF
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
PPTX
Turbocharged Data - Leveraging Azure Data Explorer for Real-Time Insights fro...
PDF
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
PDF
Big data on_aws in korea by abhishek sinha (lunch and learn)
PDF
Designing a pragmatic back-end service for mobile games
PDF
Intro to Massively Multiplayer Online Game (MMOG) Design
PDF
Azure and Predix
PDF
The Lyft data platform: Now and in the future
PDF
Lyft data Platform - 2019 slides
PDF
Monitoring AI with AI
PDF
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
PDF
ARISE
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
PDF
Ibm db2update2019 machine learning and db2 ai
PPTX
Azure Stream Analytics : Analyse Data in Motion
PPTX
Azure Data Explorer deep dive - review 04.2020
PPTX
Hybrid Transactional/Analytics Processing with Spark and IMDGs
OpenNebulaConf 2019 - Crytek: A Video gaming Edge Implementation "on the shou...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
게임을 위한 아마존웹서비스(AWS) (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
Turbocharged Data - Leveraging Azure Data Explorer for Real-Time Insights fro...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
Big data on_aws in korea by abhishek sinha (lunch and learn)
Designing a pragmatic back-end service for mobile games
Intro to Massively Multiplayer Online Game (MMOG) Design
Azure and Predix
The Lyft data platform: Now and in the future
Lyft data Platform - 2019 slides
Monitoring AI with AI
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
ARISE
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Ibm db2update2019 machine learning and db2 ai
Azure Stream Analytics : Analyse Data in Motion
Azure Data Explorer deep dive - review 04.2020
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ad

More from DevGAMM Conference (20)

PPTX
The art of small steps, or how to make sound for games in conditions of war /...
PPTX
Breaking up with FMOD - Why we ended things and embraced Metasounds / Daniel ...
PPTX
How Audio Objects Improve Spatial Accuracy / Mads Maretty Sønderup (Audiokine...
PPTX
Why indie developers should consider hyper-casual right now / Igor Gurenyov (...
PPTX
AI / ML for Indies / Tyler Coleman (Retora Games)
PDF
Agility is the Key: Power Up Your GameDev Project Management with Agile Pract...
PPTX
New PR Tech and AI Tools for 2023: A Game Changer for Outreach / Kirill Perev...
PDF
Playable Ads - Revolutionizing mobile games advertising / Jakub Kukuryk (Popc...
PDF
Creative Collaboration: Managing an Art Team / Nastassia Radzivonava (Glera G...
PDF
From Local to Global: Unleashing the Power of Payments / Jan Kuhlmannn (Xsolla)
PDF
Strategies and case studies to grow LTV in 2023 / Julia Iljuk (Balancy)
PDF
Why is ASO not working in 2023 and how to change it? / Olena Vedmedenko (Keya...
PDF
How to increase wishlists & game sales from China? Growth marketing tactics &...
PDF
Turkish Gaming Industry and HR Insights / Mustafa Mert EFE (Zindhu)
PDF
Building an Awesome Creative Team from Scratch, Capable of Scaling Up / Sasha...
PPTX
Seven Reasons Why Your LiveOps Is Not Performing / Alexander Devyaterikov (Be...
PDF
The Power of Game and Music Collaborations: Reaching and Engaging the Masses ...
PPTX
Branded Content: How to overcome players' immunity to advertising / Alex Brod...
PPTX
Resurrecting Chasm: The Rift - A Source-less Remastering Journey / Gennadii P...
PPTX
How NOT to do showcase events: Behind the scenes of Midnight Show / Andrew Ko...
The art of small steps, or how to make sound for games in conditions of war /...
Breaking up with FMOD - Why we ended things and embraced Metasounds / Daniel ...
How Audio Objects Improve Spatial Accuracy / Mads Maretty Sønderup (Audiokine...
Why indie developers should consider hyper-casual right now / Igor Gurenyov (...
AI / ML for Indies / Tyler Coleman (Retora Games)
Agility is the Key: Power Up Your GameDev Project Management with Agile Pract...
New PR Tech and AI Tools for 2023: A Game Changer for Outreach / Kirill Perev...
Playable Ads - Revolutionizing mobile games advertising / Jakub Kukuryk (Popc...
Creative Collaboration: Managing an Art Team / Nastassia Radzivonava (Glera G...
From Local to Global: Unleashing the Power of Payments / Jan Kuhlmannn (Xsolla)
Strategies and case studies to grow LTV in 2023 / Julia Iljuk (Balancy)
Why is ASO not working in 2023 and how to change it? / Olena Vedmedenko (Keya...
How to increase wishlists & game sales from China? Growth marketing tactics &...
Turkish Gaming Industry and HR Insights / Mustafa Mert EFE (Zindhu)
Building an Awesome Creative Team from Scratch, Capable of Scaling Up / Sasha...
Seven Reasons Why Your LiveOps Is Not Performing / Alexander Devyaterikov (Be...
The Power of Game and Music Collaborations: Reaching and Engaging the Masses ...
Branded Content: How to overcome players' immunity to advertising / Alex Brod...
Resurrecting Chasm: The Rift - A Source-less Remastering Journey / Gennadii P...
How NOT to do showcase events: Behind the scenes of Midnight Show / Andrew Ko...

Recently uploaded (20)

PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to machine learning and Linear Models
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Data_Analytics_and_PowerBI_Presentation.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
climate analysis of Dhaka ,Banglades.pptx
Reliability_Chapter_ presentation 1221.5784
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Knowledge Engineering Part 1
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to machine learning and Linear Models

Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The Coalition)

  • 1. Modern Data Stack for Game Analytics
  • 2. Disclaimer All thoughts are mine. Based on my experience and environment I worked over decade.
  • 3. Outline and Take Away points Outline: ● About myself and Microsoft ● Data Analytics Framework History ● Game Analytics Intro ● Modern Data Stack Overview ● Architectures from the Industry ● The Coalition Data Stack Takeaways: ● All modern game studios require analytics ● Privacy is critical ● Modern Data Analytics Open source and commercial products ● Reference Architecture for the Game Platform ● Challenges during designing of analytics solution
  • 4. - 11+ years in Analytics - Moscow, Montenegro, Winnipeg, Vancouver, Victoria, Seattle, Boston - 5 years @Amazon, now @Microsoft, The Coalition - Tableau, Snowflake, Microsoft, AWS user groups and meetups
  • 7. ● Gears of War: Ultimate Edition (XB1, Win10 | 2015 | Metascore 82/73) ● Gears of War 4 (XB1, Win10 | 2016 | Metascore 84/86) ● Gears Pop! (Mobile | 2019 ) ● Gears of War 5 (XB1, Win10 | 2019 | Metascore 85/82) ● Gears Tactics (XB1, PC | 2020 | Metascore 81) ● Gears 5: Hivebusters (XB1, PC | 2020 | Metascore 82)
  • 8. What the heck - Analytics!?
  • 10. Gaming Data Consumers ● Leadership ● Producers ● Artists ● Game Play Engineers ● QA Engineers ● Community Managers
  • 11. Microsoft Privacy and Online Safety https://privacy.microsoft.com/en-ca/privacystatement https://support.xbox.com/en-CA/help/family-online-safety/online-safety/privacy
  • 12. 3 Game Analytics Goals Strategic Analytics - target the global view how the game should evolve based on analysis of user behavior and the business model. Tactical Analytics - inform game design at the short term. Operational Analytics - analysis and evaluation in immediate situation.
  • 13. Telemetry as a source of Player data The word Telemetry is derived from the Greek roots tele, "remote", and metron, "measure". Games are state machines - a person creates a continual loop of actions and responses which keep the game state changing. Often loops keeping the user engages over a period of time. Telemetry helps to discovering who is performing what action when and where in the game. It cannot provide why.
  • 14. 3 types of metrics Gameplay metrics user behavior in the game Community metrics user engagement in communities and social media Customer metrics user as a customer, acquisition and retention
  • 15. Action Third-Person Shooters (TPS) Metrics ● Weapon use ● trajectory ● item/asset use ● character/kit choice ● level/map choice ● loss/win ● heatmaps ● team scores ● map lethality ● map balance ● vehicle use metrics ● special moves ● jumps and many more. Death map, Halo3 https://coolinfographics.com/blog/2009/1/12/halo-3- heatmaps.html
  • 16. Role of Data Engineer My role is a DE to make sure that we have a infrastructure in place to collect, transform and consolidate data for customer, community and gameplay metrics. The infrastructure is responsible for Strategic and Tactical Analytics during development and post production.
  • 17. Key Milestones in the Analytics Industry ● Relational Databases ● Custom software ● MPP Data warehouse ● Enterprise ETL ● Enterprise BI Tools ● Data Mining Tools ● Big Data: Hadoop, Hive, Spark (on-premise) ● Data Lake ● DataScience, R, Python ● Cloud Computing ● AWS Redshift, Azure SQL DW (Synapse), Google BigQuery, Snowflake, Databricks ● ML frameworks ● ETL -> ELT
  • 20. Traditional approach Batch (ETL) Source Layer Data Processing Storage Business Business Intelligence ● Ad-hoc queries ● Pixel Perfect Reports ● Cross tables (Pivot) Game Client Data Warehouse
  • 21. Data Storage Layer | Data Warehouse SMP - Symmetric Multi-Processing ● Traditionally one server systems ● Data stored locally ● Processors share single OS, memory, I/O devices ● Scale-up only - physical limitations to scaling to accommodate workload MPP - Massively Parallel Processing ● Multi-node(server) systems ● Data stored externally ● Scale-out - add more Compute nodes, each with dedicated CPU, memory & I/O subsystems ● No single point of contention
  • 22. Data Storage Layer | Data Lake
  • 24. Analytics architecture evolution - Prior 2010 mostly Data Warehouse (SMP, MPP). - With rise of Hadoop - shift towards data lake. Decouple Compute and Storage but lack of ACID (Atomicity, Consistency, Isolation, Durability). - Lake house = Data Warehouse + Data Lake.
  • 25. Lakehouse Options ● Transaction Support (ACID) ● Schema Enforcement ● Upserts/Deletes Key solutions: ● Apache Hudi (Hadoop Update Delete and Incremental) by Uber Engineering ● Apache Iceberg by Netflix ● Delta Lake by Apache Spark
  • 26. Streaming Batch (ETL/ELT) Modern Data Stack Source Layer Data Processing Storage Science & Experimentation Business Datascience Machine Learning Business Intelligence Game Client
  • 27. Streaming Batch (ETL/ELT) Modern Key Layers and roles Source Layer Data Processing Storage Science & Experimentation Business Datascience Machine Learning Business Intelligence Data Engineer ML Engineer Data Scientist BI Engineer Product Manager - manage data product. Game Client
  • 28. Modern Data Stack with Open Source Source Layer Data Processing Storage Science & Experimentation Business Spark Pool MLlib Game Client
  • 29. Event Hub Stream, Analytics Batch (ETL/ELT) Modern Data Stack with Microsoft Azure Source Layer Data Processing Storage Science & Experimentation Business Spark pools Spark Pool MLlib Azure ML (not in Synapse) Serverless Pool Azure Synapse Studio Dedicated SQL pool Azure Data Lake v2 Serverless Pool
  • 30. ADX | Ingesting Modern Data with Azure Data Explorer (ADX) Source Layer Data Processing Storage Science & Experimentation Business ADX | Data Science Azure Data Explorer | Storage Kusto / ADX
  • 31. Solution: Fortnite on AWS https://aws.amazon.com/solutions/case-studies/EPICGames/
  • 34. Solution: Future Games Studio https://youtu.be/9uec5ujkuCA
  • 35. Solution: Electronic Arts Studio https://youtu.be/ot1Qzdszvsc
  • 36. Solution: Mobile Game Analytics from GCP https://cloud.google.com/architecture/mobile-gaming-analysis-telemetry
  • 38. How it was Source Layer Data Processing Storage Science & Experimentation Business Azure Cloud On-Premise Game Client/Server The Coalition Data Lake
  • 39. Event Grid Streaming Azure Data Factory | Batch (ELT) How it is going Source Layer Data Processing Storage Science & Experimentation Business The Coaltion Data Lake Game Client/Server Spark Structured Streaming* Azure Data Lake Storage V2(Compute) Spark MLlib *Spark Structured Streaming – not in production. We are testing it. **Spark Mllib and Mlflow – part of the future vision
  • 40. Data Engineering Design Flow as a Funnel Event Names: ● Weapon Use ● Damage ● Shooting ● Flock ● Map Name ● HeartBeat ● and so on Raw Tables (Bronze) Method: Append Trans: Minimum Staging Tables (Silver) Method: Append Trans: JSON Schema Fact Tables (Gold) Method: Merge Trans: Heavy
  • 41. ● Cross team collaboration between SDE vs DE, BI vs DE, DE vs DS ● Low data volume before Launch ● Schema Evolution ● Cost and Budgeting ● Security best practice (for example credentials) ● Privacy and compliance (GDPR, HIPAA lack of data form ML/AI) ● Data Quality at Scale (Deequ, Great Expectations) ● Responsible AI Key Challenges
  • 42. ● There is no bad solution/vendor ● Focus on business outcome (working backwards) ● Engineering Excellence (dev/prod, CI/CD) ● You can build solution using Code (Python, Java, Scala, SQL and so on) or GUI (with some restrictions). ● Security and Privacy best practices Summary
  • 43. For more information visit: https://www.thecoalitionstudio.com/join-us/ The Coalition is looking for talented and diverse people to join our squad, with exciting opportunities across our Art, Design, Engineering, and Production teams.