SlideShare a Scribd company logo
www.scout24.com
The Scout24 Data Landscape Manifesto:
Building an Opinionated Data Platform
Predictive Analytics World Berlin | 13.11.2018 | Sean Gustafson
5
Core Geographies
and an overall presence
in 18 countries
80m
Household Reach
2
Major Household Brand Names
Scout24 AG
• SDAX
• € 489 million revenue (2017)
Our technical evolution
How we started in 2007
BI Tool
Middle
Tier
DWH
Staging
Core
DB
CRM
2007
Web
Tier
Analyst
BI Dev
How things got complicated in 2011
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
Web
2011
API
APP
$$$
APPMySQL
Analyst
BI Dev
APPMySQL
APPMySQL
APPMySQL
How we sliced the monolith in 2013
BI Tool
DWH
StagingCRM
Web
2013
API
APPMySQL
Core
DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPI
API
HADOOP
REST API
Analyst
BI Dev
DE
AWS
APP
APP
APP
APPMySQL
APPMySQL
APPMySQL
How a central data team doesn’t scale
BI Tool
DWH
StagingCRM
Web
2015
API
APPMySQL
Core
DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPIAPI
HADOOP
REST API
APPAPP
Analyst
BI Dev
DE
Core
DB
APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
How we re-architected our Data Landscape
BI Tool
Presto
Central Data Lake on S3
CRM
2017
Core
DB APP
REST API
APPAPPAPP
Analyst
BI Dev
DE
DS
Spark, R, etc.
Our organizational evolution
Core
DB
APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
BI Tool
Presto
Central Data Lake on S3
CRM
2017
Core
DB APP
REST API
APPAPPAPP
Analyst
BI Dev
DE
DS
Spark, R, etc.
Analyst
BI Dev
DE
DS
Data Platform Engineering
Analysts (in residence)
Data Scientists (in residence)
Analyst
Central Analysts Team
Scout24 wants to become a truly data-driven company
Fast & easy data-driven
product development…
…supported by
Data & Analytics
Scout24 wants to become a truly data-driven company
Everywhere in the company... ...without bloating up DnA
Our cultural revolution
SCOUT24
DATA LANDSCAPE
MANIFESTO
ROLES, RESPONSIBILITIES, AND VALUES
FOR A DATA-DRIVEN COMPANY AT SCALE
Data is a key asset of our
company.
#1 Preamble
#2 Our Responsibility
We, Data & Analytics, are
responsible for providing a
solid Data Platform as well
as clear guidelines and
training how to participate
in the Data Landscape. Data Platform
DnA
Data Landscape
#3 Data Autonomy, Not Anarchy
Data autonomy puts data producers
& data consumers in control of
their data & of their metrics and
thereby allows us to be data-driven
at scale, but this comes with
responsibility. Data Platform
Data
Producer Consumer
DnA
Data Landscape
#4 Producer’s Responsibility
Data producers are responsible for
publishing data to the central Data
Lake, for the data's quality, and for
publishing metadata that makes it
easy to find and consume the data.
Data Platform
Metadata
Data
Producer
DnA
Data Landscape
#5 Consumer’s Responsibility
Data consumers are responsible for
the definition & visualization of
metrics and for driving the
implementation and maintenance of
these metrics.
Data Platform
Producer Consumer
DnA
Data Landscape
#6 Exception: Core KPIs
We, Data & Analytics, take the
full ownership and responsibility
of the few top company-wide
core KPIs.
Data Platform
Producer Consumer
DnA
Data Landscape
Core
metric
#7 Transparency Over Continuity
We value data transparency over
data continuity, which means
we may break metric
comparability if it is for the
cause of enabling better insights.
Data Platform
Producer Consumer
DnA
Data Landscape
Core
metric
The Ultimate Goal
Data Platform
Metadata
Data
Producer Consumer
DnA
Data Landscape
Core
metric
A federal landscape of data
producers and consumers with just
enough rules to ensure seamless
co-operation without severely
impeding autonomy.
Centralized Federated
Control Autonomy
Perfection Scale
Pull Push
Product is Data Product is Platform
Reporting Reporting, Ad hoc Analytics,
Machine Learning
Data Warehouse vs. Data Platform
How to convince them to go along?
à ‘Nudge’ them to participate
à Promote the platform
à Refuse new use cases in Data Warehouse
Result:
Product teams have much higher responsibility
Design ‘nudges’ into the Platform
Make Data Lake easier than something else:
- automatic table publishing, partition detection
- backup and disaster recovery
- access control for restricted data
- optimize file formats (e.g. parquet) for efficiency
Learnings and lessons
à Change needs to be technological, organizational
and cultural
à Build features to give benefits that counteract resistance
à Communication is the key
Have a strong opinion about
how your company should use data and
build a platform that pushes toward that vision.
Most importantly:

More Related Content

PPTX
The Scout24 Data Platform - a technical deep dive
PDF
The Scout24 Data Platform (A Technical Deep Dive)
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PDF
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
PDF
Accelerating Innovation with Unified Analytics with Ali Ghodsi
PDF
Deep Learning Image Processing Applications in the Enterprise
PDF
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
PDF
Airbyte - Series-B deck
The Scout24 Data Platform - a technical deep dive
The Scout24 Data Platform (A Technical Deep Dive)
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Deep Learning Image Processing Applications in the Enterprise
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
Airbyte - Series-B deck

What's hot (19)

PDF
Migrate and Modernize Hadoop-Based Security Policies for Databricks
PDF
Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carp...
PPTX
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
PPTX
Transforming your business through data driven insights and action with Azure
PDF
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
PDF
Javaedge 2010-cschalk
PDF
Big data: analyzing large data sets
PPTX
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
PPTX
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
PPTX
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
PPTX
Native Spark Executors on Kubernetes: Diving into the Data Lake - Chicago Clo...
PPTX
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
PPTX
Highly configurable and extensible data processing framework at PubMatic
PPTX
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
PPTX
Security, ETL, BI & Analytics, and Software Integration
PDF
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
PDF
Airbyte - Seed deck
PDF
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
PDF
Modernizing to a Cloud Data Architecture
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carp...
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
Transforming your business through data driven insights and action with Azure
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
Javaedge 2010-cschalk
Big data: analyzing large data sets
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Native Spark Executors on Kubernetes: Diving into the Data Lake - Chicago Clo...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
Highly configurable and extensible data processing framework at PubMatic
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Security, ETL, BI & Analytics, and Software Integration
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Airbyte - Seed deck
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Modernizing to a Cloud Data Architecture
Ad

Similar to The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform (20)

PPTX
Qlik_Value_Proposition_Sales_Presentation1_2.pptx
PDF
Big Data Enabled: How YARN Changes the Game
PDF
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
PDF
Role of Data in Digital Transformation
PPTX
Big and fast data strategy 2017 jr
PDF
Data and its Role in Your Digital Transformation
PPTX
Using Visualization to Succeed with Big Data
PPTX
BIG Data & Hadoop Applications in Finance
PDF
Data Discovery and BI - Is there Really a Difference?
PPT
Making Hadoop Ready for the Enterprise
PDF
Building Resiliency and Agility with Data Virtualization for the New Normal
PDF
Data Discovery Hype
PPTX
Trends for Modernizing Analytics and Data Warehousing in 2019
PDF
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
PDF
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
PDF
Power BI storytelling 101
PPTX
Turning Business Intelligence Into Actionable Insights
PDF
Four Key Considerations for your Big Data Analytics Strategy
PDF
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
PDF
The Connected Consumer – Real-time Customer 360
Qlik_Value_Proposition_Sales_Presentation1_2.pptx
Big Data Enabled: How YARN Changes the Game
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
Role of Data in Digital Transformation
Big and fast data strategy 2017 jr
Data and its Role in Your Digital Transformation
Using Visualization to Succeed with Big Data
BIG Data & Hadoop Applications in Finance
Data Discovery and BI - Is there Really a Difference?
Making Hadoop Ready for the Enterprise
Building Resiliency and Agility with Data Virtualization for the New Normal
Data Discovery Hype
Trends for Modernizing Analytics and Data Warehousing in 2019
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
Power BI storytelling 101
Turning Business Intelligence Into Actionable Insights
Four Key Considerations for your Big Data Analytics Strategy
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
The Connected Consumer – Real-time Customer 360
Ad

More from Rising Media Ltd. (20)

PDF
Data Science at Roche: From Exploration to Productionization - Frank Block
PDF
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
PDF
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
PDF
Behind the Buzzword: Understanding Customer Data Platforms in the Light of Pr...
PDF
Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...
PDF
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
PDF
More than 10 Blue Links: Advanced-Level SERP Optimisation
PDF
How to Get Great Results Across Every Marketing Channel
PDF
Don’t Freak Out! Tips for Mobile and Voice Search
PDF
Prescriptive ohne Predictive: Regression ist noch nicht tot! ROMI bei Unitymedia
PDF
Reinforcement Learning - Learning from Experience like a Human
PDF
Mindful Analytics - Wie Achtsamkeit uns noch besser macht
PDF
Data Science Development with Impact
PPTX
Predictive Analytics World for Business Deutschland 2018
PPTX
Predictive Analytics World for Business Germany 2018
PDF
The Centrality of a Detailed Understanding of your Audience
PDF
Der steinige Weg zum automatisierten Data Science Produkt – Empfehlungen und ...
PDF
Data Alchemy
PDF
SpiegelMining – Data Science auf Spiegel Online
PPTX
Predictive Analytics World for Industry 4.0 Munich
Data Science at Roche: From Exploration to Productionization - Frank Block
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Behind the Buzzword: Understanding Customer Data Platforms in the Light of Pr...
Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
More than 10 Blue Links: Advanced-Level SERP Optimisation
How to Get Great Results Across Every Marketing Channel
Don’t Freak Out! Tips for Mobile and Voice Search
Prescriptive ohne Predictive: Regression ist noch nicht tot! ROMI bei Unitymedia
Reinforcement Learning - Learning from Experience like a Human
Mindful Analytics - Wie Achtsamkeit uns noch besser macht
Data Science Development with Impact
Predictive Analytics World for Business Deutschland 2018
Predictive Analytics World for Business Germany 2018
The Centrality of a Detailed Understanding of your Audience
Der steinige Weg zum automatisierten Data Science Produkt – Empfehlungen und ...
Data Alchemy
SpiegelMining – Data Science auf Spiegel Online
Predictive Analytics World for Industry 4.0 Munich

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Fluorescence-microscope_Botany_detailed content
climate analysis of Dhaka ,Banglades.pptx
Reliability_Chapter_ presentation 1221.5784
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
Acceptance and paychological effects of mandatory extra coach I classes.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
1_Introduction to advance data techniques.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
IBA_Chapter_11_Slides_Final_Accessible.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction-to-Cloud-ComputingFinal.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Fluorescence-microscope_Botany_detailed content

The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform

  • 1. www.scout24.com The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform Predictive Analytics World Berlin | 13.11.2018 | Sean Gustafson
  • 2. 5 Core Geographies and an overall presence in 18 countries 80m Household Reach 2 Major Household Brand Names Scout24 AG • SDAX • € 489 million revenue (2017)
  • 4. How we started in 2007 BI Tool Middle Tier DWH Staging Core DB CRM 2007 Web Tier Analyst BI Dev
  • 5. How things got complicated in 2011 BI Tool Middle Tier DWH Staging Core DB CRM Web 2011 API APP $$$ APPMySQL Analyst BI Dev
  • 6. APPMySQL APPMySQL APPMySQL How we sliced the monolith in 2013 BI Tool DWH StagingCRM Web 2013 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPI API HADOOP REST API Analyst BI Dev DE
  • 7. AWS APP APP APP APPMySQL APPMySQL APPMySQL How a central data team doesn’t scale BI Tool DWH StagingCRM Web 2015 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPIAPI HADOOP REST API APPAPP Analyst BI Dev DE
  • 8. Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS How we re-architected our Data Landscape BI Tool Presto Central Data Lake on S3 CRM 2017 Core DB APP REST API APPAPPAPP Analyst BI Dev DE DS Spark, R, etc.
  • 10. Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS BI Tool Presto Central Data Lake on S3 CRM 2017 Core DB APP REST API APPAPPAPP Analyst BI Dev DE DS Spark, R, etc.
  • 11. Analyst BI Dev DE DS Data Platform Engineering Analysts (in residence) Data Scientists (in residence) Analyst Central Analysts Team
  • 12. Scout24 wants to become a truly data-driven company Fast & easy data-driven product development… …supported by Data & Analytics
  • 13. Scout24 wants to become a truly data-driven company Everywhere in the company... ...without bloating up DnA
  • 15. SCOUT24 DATA LANDSCAPE MANIFESTO ROLES, RESPONSIBILITIES, AND VALUES FOR A DATA-DRIVEN COMPANY AT SCALE
  • 16. Data is a key asset of our company. #1 Preamble
  • 17. #2 Our Responsibility We, Data & Analytics, are responsible for providing a solid Data Platform as well as clear guidelines and training how to participate in the Data Landscape. Data Platform DnA Data Landscape
  • 18. #3 Data Autonomy, Not Anarchy Data autonomy puts data producers & data consumers in control of their data & of their metrics and thereby allows us to be data-driven at scale, but this comes with responsibility. Data Platform Data Producer Consumer DnA Data Landscape
  • 19. #4 Producer’s Responsibility Data producers are responsible for publishing data to the central Data Lake, for the data's quality, and for publishing metadata that makes it easy to find and consume the data. Data Platform Metadata Data Producer DnA Data Landscape
  • 20. #5 Consumer’s Responsibility Data consumers are responsible for the definition & visualization of metrics and for driving the implementation and maintenance of these metrics. Data Platform Producer Consumer DnA Data Landscape
  • 21. #6 Exception: Core KPIs We, Data & Analytics, take the full ownership and responsibility of the few top company-wide core KPIs. Data Platform Producer Consumer DnA Data Landscape Core metric
  • 22. #7 Transparency Over Continuity We value data transparency over data continuity, which means we may break metric comparability if it is for the cause of enabling better insights. Data Platform Producer Consumer DnA Data Landscape Core metric
  • 23. The Ultimate Goal Data Platform Metadata Data Producer Consumer DnA Data Landscape Core metric A federal landscape of data producers and consumers with just enough rules to ensure seamless co-operation without severely impeding autonomy.
  • 24. Centralized Federated Control Autonomy Perfection Scale Pull Push Product is Data Product is Platform Reporting Reporting, Ad hoc Analytics, Machine Learning Data Warehouse vs. Data Platform
  • 25. How to convince them to go along? à ‘Nudge’ them to participate à Promote the platform à Refuse new use cases in Data Warehouse Result: Product teams have much higher responsibility
  • 26. Design ‘nudges’ into the Platform Make Data Lake easier than something else: - automatic table publishing, partition detection - backup and disaster recovery - access control for restricted data - optimize file formats (e.g. parquet) for efficiency
  • 27. Learnings and lessons à Change needs to be technological, organizational and cultural à Build features to give benefits that counteract resistance à Communication is the key
  • 28. Have a strong opinion about how your company should use data and build a platform that pushes toward that vision. Most importantly: