SlideShare a Scribd company logo
DATA DRIVEN DECISIONS AT SCALE
Jim Forsythe
2
OUR JOURNEY OVER THE LAST 10 YEARS
2009 2014 Today
DATA IS ESSENTIAL TO
CREATING SIMPLE, EASY,
AWESOME
CUSTOMER EXPERIENCES
We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
MAKING DATA DRIVEN
DECISIONS AT SCALE IS
CRITICAL
We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
2 MAJOR CHALLENGES:
DATA PROCESSING AT SCALE
+
MAKING DATA ACTIONABLE
We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
6
CHALLENGE #1
NEED TO PROCESS DATA AT MASSIVE
SCALE
We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
Raw Event Data Event Processing Data Lake
Millions of
Transactions
per Second
Hundreds of
Jobs
Processing TBs
of data
PBs of Data
across hundreds
of buckets
7
SOURCE SYSTEMS
PRODUCTS &
SYSTEMS
X1 & Flex
Mobile & Web
XFI Services
8
EVENT COLLECTORS
Streamin
g
Batch
DATA
COLLECTORS
PRODUCTS &
SYSTEMS
X1 & Flex
Mobile & Web
XFI Services
9
BUILD RICH DATA SETS
Streaming
Batch
DATA
COLLECTION
PRODUCTS &
SYSTEMS
PROCESSING &
ENRICHMENT
X1 & Flex
Mobile & Web
XFI Services
1 0
FOUNDATION OF DATA PLATFORM
Streamin
g
Batch
DATA
COLLECTION
PRODUCTS &
SYSTEMS
PROCESSING &
ENRICHMENT
DATA LAKE
PARQUET
X1 & Flex
Mobile & Web
XFI Services
1 1
DELTA SESSIONIZATION
Job 1
IngestData
Optimize
Streaming
Join
Optimize
Job 2
Sessionize
OptimizeJob 3
Enrich & Optimize
Batch
Join
1 2
DATA LAKE IS THE FOUNDATION
DATA LAKE
PRODUCT
S/
SERVICES
DATA PLATFORM
DATA
ENGINEERING
CUSTOMER
1 3
CHALLENGE #2
GETTING TO AGILE DATA DRIVEN
DECISIONS
DATA LAKE
DATA
ENGINEERING
PRODUCT
S/
SERVICES
DATA PLATFORM
ANALYTICS DATA SCIENCE AB TESTING
CUSTOMER
1 4
DEMOCRATIZE ANALYTICS
Analytics
Enable Self Service
Analytics/
Insights
Requesting
Reports
Traditional
Model
Analytics/
Insights
Requesting
New Data
Democratized
Model
1 5
SOLVING SOME OF OUR TOUGHEST
PROBLEMS
Data Science
Focus on
Customer
Experience
CUSTOMER PROBLEMS
DATA LAKE
DATA SCIENCE
1 6
MOVE FAST & LEARN
AB Testing
Focus on what we
launch, not what we
build
TREATMENTCONTROL
USERS
RESULTS
EXP PLATFORM
EVALUATION ON COMMON DATA
50%
USERS
50%
USERS
1 7
INCREASE CUSTOMER VALUE
PRODUCT
S/
SERVICES
MEASURE OUTCOMES
MEASURE OUTCOMES
DATA LAKE
DATA
ENGINEERING
ANALYTICS DATA SCIENCE AB TESTING
DATA PLATFORM
VALUE
CUSTOMER
1 8
FROM DATA TO INSIGHTS AND ACTIONS
Data to improve
the product
(A/B Testing)
DATA SCIENCE
& ENGINEERING
Data Informed
Big Bets
PRODUCT
ORGANIZATION
Data for training
decision models
APPLIED AI
UNLEASH INNOVATION
WITH DATA
We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
Thank you!

More Related Content

PDF
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
PDF
Democratizing Data
PDF
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
PDF
Redash: Open Source SQL Analytics on Data Lakes
PDF
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
PDF
Building the Next-gen Digital Meter Platform for Fluvius
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
PDF
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Democratizing Data
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Redash: Open Source SQL Analytics on Data Lakes
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Building the Next-gen Digital Meter Platform for Fluvius
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...

What's hot (20)

PDF
Getting Started with Databricks SQL Analytics
PDF
SQL Analytics Powering Telemetry Analysis at Comcast
PDF
Power Your Delta Lake with Streaming Transactional Changes
PDF
Headaches and Breakthroughs in Building Continuous Applications
PDF
Saving Energy in Homes with a Unified Approach to Data and AI
PDF
Build Real-Time Applications with Databricks Streaming
PDF
Phar Data Platform: From the Lakehouse Paradigm to the Reality
PDF
Bridging the Completeness of Big Data on Databricks
PDF
Improving Power Grid Reliability Using IoT Analytics
PDF
Building Robust Production Data Pipelines with Databricks Delta
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
PDF
Cloud Experience: Data-driven Applications Made Simple and Fast
PDF
InfoTrack: Creating a single source of truth with the Elastic Stack
PDF
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PDF
Building Custom Big Data Integrations
PDF
Building Sessionization Pipeline at Scale with Databricks Delta
PDF
Learn to Use Databricks for the Full ML Lifecycle
PDF
Scaling and Modernizing Data Platform with Databricks
PDF
Operationalizing Machine Learning at Scale at Starbucks
Getting Started with Databricks SQL Analytics
SQL Analytics Powering Telemetry Analysis at Comcast
Power Your Delta Lake with Streaming Transactional Changes
Headaches and Breakthroughs in Building Continuous Applications
Saving Energy in Homes with a Unified Approach to Data and AI
Build Real-Time Applications with Databricks Streaming
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Bridging the Completeness of Big Data on Databricks
Improving Power Grid Reliability Using IoT Analytics
Building Robust Production Data Pipelines with Databricks Delta
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Cloud Experience: Data-driven Applications Made Simple and Fast
InfoTrack: Creating a single source of truth with the Elastic Stack
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Building Custom Big Data Integrations
Building Sessionization Pipeline at Scale with Databricks Delta
Learn to Use Databricks for the Full ML Lifecycle
Scaling and Modernizing Data Platform with Databricks
Operationalizing Machine Learning at Scale at Starbucks
Ad

Similar to Data Driven Decisions at Scale (20)

PDF
Big data Analytics
PPTX
001 More introduction to big data analytics
PDF
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
PDF
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
PPTX
Big Data Mining Keynote presentation Sept 2013 09012013
PPTX
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
PDF
solulab.com-Data Science Consulting Services SoluLab.pdf
PDF
Big Data at a Gaming Company: Spil Games
PDF
A Fully Data Driven World
PDF
Enabling a Culture of Self-Service Analytics
PDF
Slides: Success Stories for Data-to-Cloud
PDF
PXL Data Engineering Workshop By Selligent
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
PPTX
Business Intelligence and Big Data in Cloud
PPTX
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PPSX
Intro to Data Science Big Data
PDF
Agility for big data
PPTX
Big data analytics presented at meetup big data for decision makers
Big data Analytics
001 More introduction to big data analytics
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Big Data Mining Keynote presentation Sept 2013 09012013
Driving Network and Marketing Investments at O2 by Focusing on Improving the ...
solulab.com-Data Science Consulting Services SoluLab.pdf
Big Data at a Gaming Company: Spil Games
A Fully Data Driven World
Enabling a Culture of Self-Service Analytics
Slides: Success Stories for Data-to-Cloud
PXL Data Engineering Workshop By Selligent
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Business Intelligence and Big Data in Cloud
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Intro to Data Science Big Data
Agility for big data
Big data analytics presented at meetup big data for decision makers
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Global journeys: estimating international migration
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
Clinical guidelines as a resource for EBP(1).pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Foundation of Data Science unit number two notes
Business Acumen Training GuidePresentation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Reliability_Chapter_ presentation 1221.5784
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Global journeys: estimating international migration
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx

Data Driven Decisions at Scale

  • 1. DATA DRIVEN DECISIONS AT SCALE Jim Forsythe
  • 2. 2 OUR JOURNEY OVER THE LAST 10 YEARS 2009 2014 Today
  • 3. DATA IS ESSENTIAL TO CREATING SIMPLE, EASY, AWESOME CUSTOMER EXPERIENCES We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
  • 4. MAKING DATA DRIVEN DECISIONS AT SCALE IS CRITICAL We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
  • 5. 2 MAJOR CHALLENGES: DATA PROCESSING AT SCALE + MAKING DATA ACTIONABLE We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
  • 6. 6 CHALLENGE #1 NEED TO PROCESS DATA AT MASSIVE SCALE We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws. Raw Event Data Event Processing Data Lake Millions of Transactions per Second Hundreds of Jobs Processing TBs of data PBs of Data across hundreds of buckets
  • 7. 7 SOURCE SYSTEMS PRODUCTS & SYSTEMS X1 & Flex Mobile & Web XFI Services
  • 9. 9 BUILD RICH DATA SETS Streaming Batch DATA COLLECTION PRODUCTS & SYSTEMS PROCESSING & ENRICHMENT X1 & Flex Mobile & Web XFI Services
  • 10. 1 0 FOUNDATION OF DATA PLATFORM Streamin g Batch DATA COLLECTION PRODUCTS & SYSTEMS PROCESSING & ENRICHMENT DATA LAKE PARQUET X1 & Flex Mobile & Web XFI Services
  • 11. 1 1 DELTA SESSIONIZATION Job 1 IngestData Optimize Streaming Join Optimize Job 2 Sessionize OptimizeJob 3 Enrich & Optimize Batch Join
  • 12. 1 2 DATA LAKE IS THE FOUNDATION DATA LAKE PRODUCT S/ SERVICES DATA PLATFORM DATA ENGINEERING CUSTOMER
  • 13. 1 3 CHALLENGE #2 GETTING TO AGILE DATA DRIVEN DECISIONS DATA LAKE DATA ENGINEERING PRODUCT S/ SERVICES DATA PLATFORM ANALYTICS DATA SCIENCE AB TESTING CUSTOMER
  • 14. 1 4 DEMOCRATIZE ANALYTICS Analytics Enable Self Service Analytics/ Insights Requesting Reports Traditional Model Analytics/ Insights Requesting New Data Democratized Model
  • 15. 1 5 SOLVING SOME OF OUR TOUGHEST PROBLEMS Data Science Focus on Customer Experience CUSTOMER PROBLEMS DATA LAKE DATA SCIENCE
  • 16. 1 6 MOVE FAST & LEARN AB Testing Focus on what we launch, not what we build TREATMENTCONTROL USERS RESULTS EXP PLATFORM EVALUATION ON COMMON DATA 50% USERS 50% USERS
  • 17. 1 7 INCREASE CUSTOMER VALUE PRODUCT S/ SERVICES MEASURE OUTCOMES MEASURE OUTCOMES DATA LAKE DATA ENGINEERING ANALYTICS DATA SCIENCE AB TESTING DATA PLATFORM VALUE CUSTOMER
  • 18. 1 8 FROM DATA TO INSIGHTS AND ACTIONS Data to improve the product (A/B Testing) DATA SCIENCE & ENGINEERING Data Informed Big Bets PRODUCT ORGANIZATION Data for training decision models APPLIED AI
  • 19. UNLEASH INNOVATION WITH DATA We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.