SlideShare a Scribd company logo
1 
What Comes After the 
Star Schema? 
Dimensional Modeling for Enterprise Data Hubs 
Josh Wills 
Senior Director of Data Science
About Me 
2
Two Fish 
3
4 
Once Upon A Time…
Our data looked like this… 
5
…but the data analyst wanted this. 
6
The Impedance Mismatch 
7
The Unicorn Solution 
8
The Great Compromise 
9
Ease of Use 
10
Division of Labor 
11
Given Away Freely 
12
The Central Idea of an Entire Industry 
13
Modern BI and Reporting 
14
15 
The Next Hard Problem
Operationalizing Insights 
16
The Analytics Maturity Curve 
17
Example: Modeling Risk 
18
The Impedance Mismatch, Redux 
19
The Unicorn Solution, Redux 
20
This All Feels Very Familiar 
21
It’s Time To Find A New Compromise 
22
Data Science 101 
23
24 
Introducing Supernovas
A Simple Star Schema for Search 
25
The Cartesian Explosion 
26
A Supernova Schema for Search 
27
Beyond Analytic SQL Functions: 
Nested SQL Sessions 
28
Event Series Analytics 
29
http://github.com/jwills/exhibit 
30
31 
Supernova for 
BI and Reporting
A New Kind of Data Model 
32 
Context/Backend RDBMS NoSQL 
Operational Entity-Relational Generic Objects 
Analytical Star Supernova
From Star To Supernova 
33
Supernova 101: Affinity Analysis 
34
Supernova Design Rules 
1. Identify the root dimension. 
2. Identify the facts about the root dimension that we 
want to analyze. 
3. De-normalize critical child dimensions. 
4. Define the time window. 
35
Creating a Supernova 
36
Injecting Child Dimensions 
37
Taking Advantage of Multi-Insert 
38
39 
Supernova for 
Operational Analytics
A New Kind of Data Model 
40 
Context/Backend RDBMS NoSQL 
Operational Entity-Relational Generic Objects 
Analytical Star Supernova
Data Science and the Holy Grail 
41
Replicating the Online Environment 
42
Feature Engineering: Within / Across 
1. Generate normalization and segmentation 
features. 
2. Generate normalization constants and segments 
using the result of Step 1. 
3. Generate input features using the original data 
and the result of Step 2. 
4. Generate a model using the result of Step 3. 
43
Integrating New Signals 
44
Building a Wall 
45
Driving Innovation 
46
Thank you! 
Josh Wills, Senior Director of Data Science, Cloudera @josh_wills

More Related Content

PPTX
Building an Effective Data Warehouse Architecture
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Data at the Speed of Business with Data Mastering and Governance
PPTX
Protect your Database with Data Masking & Enforced Version Control
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
Data Warehouse Design and Best Practices
PPTX
Dimensional Modeling
PDF
Enterprise Architecture vs. Data Architecture
Building an Effective Data Warehouse Architecture
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data at the Speed of Business with Data Mastering and Governance
Protect your Database with Data Masking & Enforced Version Control
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Data Warehouse Design and Best Practices
Dimensional Modeling
Enterprise Architecture vs. Data Architecture

What's hot (20)

PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PPTX
Building a modern data warehouse
PPTX
Snowflake Overview
PDF
Modern Data architecture Design
PPTX
Unique ID generation in distributed systems
PDF
Databricks Delta Lake and Its Benefits
PDF
Snowflake Architecture
PPTX
Introduction to snowflake
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
PDF
Building Robust ETL Pipelines with Apache Spark
PPTX
Dynamic filtering for presto join optimisation
PDF
Business Data Lake Best Practices
PDF
Introduction to Hadoop
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PPTX
Data Quality Patterns in the Cloud with Azure Data Factory
PDF
Introduction SQL Analytics on Lakehouse Architecture
PDF
Speeding Time to Insight with a Modern ELT Approach
PPTX
Free Training: How to Build a Lakehouse
PDF
Gain 3 Benefits with Delta Sharing
Building Lakehouses on Delta Lake with SQL Analytics Primer
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Building a modern data warehouse
Snowflake Overview
Modern Data architecture Design
Unique ID generation in distributed systems
Databricks Delta Lake and Its Benefits
Snowflake Architecture
Introduction to snowflake
Large Scale Lakehouse Implementation Using Structured Streaming
Building Robust ETL Pipelines with Apache Spark
Dynamic filtering for presto join optimisation
Business Data Lake Best Practices
Introduction to Hadoop
Apache Iceberg - A Table Format for Hige Analytic Datasets
Data Quality Patterns in the Cloud with Azure Data Factory
Introduction SQL Analytics on Lakehouse Architecture
Speeding Time to Insight with a Modern ELT Approach
Free Training: How to Build a Lakehouse
Gain 3 Benefits with Delta Sharing
Ad

Viewers also liked (20)

PPTX
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
PPTX
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
PPTX
Hadoop and Enterprise Data Warehouse
PDF
Scaling Management without Sacrificing Culture - Velocity Europe 2014
PPTX
Hadoop and Your Data Warehouse
PDF
A Reference Architecture for ETL 2.0
PDF
Agile Data Warehouse Design for Big Data Presentation
PDF
Building a Successful Organization By Mastering Failure
PDF
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
DOCX
Sat Pal Resume 2016
PPTX
Sewage Contamination: Microbiology, Health Risks, and Remediation
PPTX
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
PPTX
Big Data Warehousing Meetup with Riak
PPTX
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
PDF
SQL Saturday Paris 2015 - Polybase
PDF
Building a Hadoop Data Warehouse with Impala
PDF
Informatica Command Line Statements
PPTX
Netflix Billing System
PPT
Dimensional Modelling Session 2
PPT
Dimensional modelling-mod-3
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Hadoop and Enterprise Data Warehouse
Scaling Management without Sacrificing Culture - Velocity Europe 2014
Hadoop and Your Data Warehouse
A Reference Architecture for ETL 2.0
Agile Data Warehouse Design for Big Data Presentation
Building a Successful Organization By Mastering Failure
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Sat Pal Resume 2016
Sewage Contamination: Microbiology, Health Risks, and Remediation
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
Big Data Warehousing Meetup with Riak
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
SQL Saturday Paris 2015 - Polybase
Building a Hadoop Data Warehouse with Impala
Informatica Command Line Statements
Netflix Billing System
Dimensional Modelling Session 2
Dimensional modelling-mod-3
Ad

Similar to What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs (20)

PPTX
Data modeling trends for analytics
PDF
The Death of the Star Schema
PPTX
Data science Innovations January 2018
PPTX
Big data analyti data analytical life cycle
PPTX
Foresight conversation
PPTX
Build a modern data platform.pptx
PPTX
Data modeling trends for Analytics
PPTX
Identifying semantics characteristics of user’s interactions datasets through...
PPTX
Data Science Innovations : Democratisation of Data and Data Science
PDF
Challenges in Analytics for BIG Data
PDF
Asper database presentation - Data Modeling Topics
PDF
unit-4-notes.pdf
PDF
Python's Role in the Future of Data Analysis
PPTX
Data science innovations
PPTX
BD1.pptx
PPTX
Big data unit 2
PDF
Data Analytics Data Analytics Data Ana
PDF
iot_module4.pdf
PDF
INF2190_W1_2016_public
PDF
SuanIct-Bigdata desktop-final
Data modeling trends for analytics
The Death of the Star Schema
Data science Innovations January 2018
Big data analyti data analytical life cycle
Foresight conversation
Build a modern data platform.pptx
Data modeling trends for Analytics
Identifying semantics characteristics of user’s interactions datasets through...
Data Science Innovations : Democratisation of Data and Data Science
Challenges in Analytics for BIG Data
Asper database presentation - Data Modeling Topics
unit-4-notes.pdf
Python's Role in the Future of Data Analysis
Data science innovations
BD1.pptx
Big data unit 2
Data Analytics Data Analytics Data Ana
iot_module4.pdf
INF2190_W1_2016_public
SuanIct-Bigdata desktop-final

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx

What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs