SlideShare a Scribd company logo
4
Most read
6
Most read
8
Most read
Why Data Modeling
Matters
Felipe Cardeneti Mendes
Data Modeling for Performance
A building is only strong as its foundation
Data Model shapes how data gets stored and
accessed on databases.
■ Trade-off between scalability and flexibility:
○ A flexible querying model is easier to work with; at the
cost of performance
○ A performance-oriented model sacrifices flexibility; but
benefits from near-infinite scaling
■ A weak foundation often lead to:
○ Bottlenecks and higher maintenance (thus $$$ ;-)
○ Hard to adapt to evolving business needs
■ Our goal: Build solid and strong skyscrapers!
2
Real Story
3
https://www.youtube.com/watch?v=G71MnVtDtS4
#1 Understand Your Needs
Ad-hoc
4
■ Unknown access patterns
■ Lots of flexibility
■ VERY large datasets
Analytics, Business Intelligence,
Reporting
Search
■ Prone to Index overloads
■ Limited Flexibility
■ Moderate datasets
Full-text search, Knowledge
bases, Type-ahead…
Transactional
■ Defined structure
■ Good flexibility
■ Small datasets
Financial, Tax Filing, Reservations
Performance
■ Access patterns are well established
■ Not much flexibility beyond what was set upfront
■ Any scale
Real-time Machine Learning, Social Networks, IoT / Time-series,
Recommendation Engines, Fraud Detection, Event Logging, Gaming,
Catalogs…
#2 Use the Right Tool for the Job
■ Our focus: Performance
■ Understand the goals of each solution:
● Data Warehouses focus on throughput over latency
● Poor insert performance, slow disks is the norm
● OLAP (incl. Search Engines) are optimized for complex aggregations
● Immediate, real-time updates are an anti-pattern
● Relational databases pose limited scaling
● Locking, JOINs and ACID guarantees inevitably affect query performance
● NoSQL OLTP focus on highly concurrent point query retrieval
● Some optimize for reads (DynamoDB, MongoDB)
● Some optimize for writes (Cassandra, RocksDB)
● Some optimize for both (ScyllaDB)
5
#3 Plan for the Future
What's wrong with this? Consider 100K devices reporting
measurements at every 15 seconds, and 5 years data retention.
6
CREATE TABLE timeseries (
device_id uuid,
time timestamp,
temperature float,
pressure float,
air_speed float,
(...)
PRIMARY KEY(device_id, time)
);
#3 Plan for the Future
1 Year = 86400 * 365 seconds
Single device = 1Y / 15s ~= 2.1M rows/device
5 years = >10M rows
7
Read performance degrades over time!
Maybe acceptable… Focus on your queries!
#3 Plan for the Future
If we infrequently read all rows, range queries help with latencies:
At the same time, there's very little reason to group many rows
together:
8
Maybe acceptable… Focus on your queries!
SELECT COUNT(1) FROM timeseries WHERE
device_id = ? AND time >= ? AND < ?;
CREATE TABLE timeseries (
bucket int,
(...)
PRIMARY KEY((device_id, bucket), time)
);
Lack of Planning
9
#4 Simplify Access Patterns
■ Minimize expensive queries
● Joins, complex filters
■ Pre-aggregate when possible
● Machine Learning – Pre-computed Counts, Sums, Similarity Scores, Summary
Statistics
● Dashboards – Denormalized data structures to minimize complexity/latencies
■ Denormalize
● A single query should retrieve all data it requires
● Reduces client-side processing
10
#4 Simplify Access Patterns
11
#4 Simplify Access Patterns
12
#5 Refine and Build!
■ Apply open-loop load testing
● Brian Taylor's talk on How to Maximize Database Concurrency
■ Define performance KPIs upfront
● Database Performance at Scale – Chapter 10: Monitoring
■ Hunt for potential imbalances
● Load imbalance → Uneven access patterns
● Data imbalance → Large partitions/rows
● Hotspots → Low cardinality/Retry Storms/Spammers
13
Keep in touch!
Felipe Cardeneti Mendes
Technical Director
ScyllaDB
felipe@scylladb.com
@felipemendes.dev
Attributions
Icon made by juicy_fish from www.flaticon.com
Icons made by Freepik from www.flaticon.com
Icon made by PixelPerfect from www.flaticon.com

More Related Content

PPTX
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
PPTX
3 Ways Modern Databases Drive Revenue
PPTX
An Enterprise Architect's View of MongoDB
PPTX
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
PPTX
Five ways database modernization simplifies your data life
PPTX
Solving the Database Problem
PPTX
ch02models.pptx
PPTX
ch02models.pptx
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
3 Ways Modern Databases Drive Revenue
An Enterprise Architect's View of MongoDB
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
Five ways database modernization simplifies your data life
Solving the Database Problem
ch02models.pptx
ch02models.pptx

Similar to Data Modeling for Performance Masterclass: Why Data Modeling Matters (20)

PPTX
high performance databases
PPTX
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
PPTX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
PPTX
JasperWorld 2012: Reinventing Data Management by Max Schireson
PDF
Big Data for the Rest of Us - OpenWest 2014 - Matt Asay
PDF
Data_Modeling_MongoDB.pdf
PDF
Creating a Modern Data Architecture for Digital Transformation
PPTX
Big Data Overview 2013-2014
PPTX
La nuova architettura di classe enterprise
PPTX
Big Data (NJ SQL Server User Group)
PDF
Next Generation Data Platforms - Deon Thomas
PDF
How to get started in Big Data for master's students
PDF
Database Survival Guide: Exploratory Webcast
PPTX
High Performance and Scalability Database Design
PPTX
How to Survive as a Data Architect in a Polyglot Database World
PDF
Webinar: NoSQL as the New Normal
PDF
Database Revolution - Exploratory Webcast
PDF
Database revolution opening webcast 01 18-12
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
high performance databases
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
JasperWorld 2012: Reinventing Data Management by Max Schireson
Big Data for the Rest of Us - OpenWest 2014 - Matt Asay
Data_Modeling_MongoDB.pdf
Creating a Modern Data Architecture for Digital Transformation
Big Data Overview 2013-2014
La nuova architettura di classe enterprise
Big Data (NJ SQL Server User Group)
Next Generation Data Platforms - Deon Thomas
How to get started in Big Data for master's students
Database Survival Guide: Exploratory Webcast
High Performance and Scalability Database Design
How to Survive as a Data Architect in a Polyglot Database World
Webinar: NoSQL as the New Normal
Database Revolution - Exploratory Webcast
Database revolution opening webcast 01 18-12
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
Ad

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Assigned Numbers - 2025 - Bluetooth® Document
“AI and Expert System Decision Support & Business Intelligence Systems”
sap open course for s4hana steps from ECC to s4
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...

Data Modeling for Performance Masterclass: Why Data Modeling Matters

  • 1. Why Data Modeling Matters Felipe Cardeneti Mendes Data Modeling for Performance
  • 2. A building is only strong as its foundation Data Model shapes how data gets stored and accessed on databases. ■ Trade-off between scalability and flexibility: ○ A flexible querying model is easier to work with; at the cost of performance ○ A performance-oriented model sacrifices flexibility; but benefits from near-infinite scaling ■ A weak foundation often lead to: ○ Bottlenecks and higher maintenance (thus $$$ ;-) ○ Hard to adapt to evolving business needs ■ Our goal: Build solid and strong skyscrapers! 2
  • 4. #1 Understand Your Needs Ad-hoc 4 ■ Unknown access patterns ■ Lots of flexibility ■ VERY large datasets Analytics, Business Intelligence, Reporting Search ■ Prone to Index overloads ■ Limited Flexibility ■ Moderate datasets Full-text search, Knowledge bases, Type-ahead… Transactional ■ Defined structure ■ Good flexibility ■ Small datasets Financial, Tax Filing, Reservations Performance ■ Access patterns are well established ■ Not much flexibility beyond what was set upfront ■ Any scale Real-time Machine Learning, Social Networks, IoT / Time-series, Recommendation Engines, Fraud Detection, Event Logging, Gaming, Catalogs…
  • 5. #2 Use the Right Tool for the Job ■ Our focus: Performance ■ Understand the goals of each solution: ● Data Warehouses focus on throughput over latency ● Poor insert performance, slow disks is the norm ● OLAP (incl. Search Engines) are optimized for complex aggregations ● Immediate, real-time updates are an anti-pattern ● Relational databases pose limited scaling ● Locking, JOINs and ACID guarantees inevitably affect query performance ● NoSQL OLTP focus on highly concurrent point query retrieval ● Some optimize for reads (DynamoDB, MongoDB) ● Some optimize for writes (Cassandra, RocksDB) ● Some optimize for both (ScyllaDB) 5
  • 6. #3 Plan for the Future What's wrong with this? Consider 100K devices reporting measurements at every 15 seconds, and 5 years data retention. 6 CREATE TABLE timeseries ( device_id uuid, time timestamp, temperature float, pressure float, air_speed float, (...) PRIMARY KEY(device_id, time) );
  • 7. #3 Plan for the Future 1 Year = 86400 * 365 seconds Single device = 1Y / 15s ~= 2.1M rows/device 5 years = >10M rows 7 Read performance degrades over time! Maybe acceptable… Focus on your queries!
  • 8. #3 Plan for the Future If we infrequently read all rows, range queries help with latencies: At the same time, there's very little reason to group many rows together: 8 Maybe acceptable… Focus on your queries! SELECT COUNT(1) FROM timeseries WHERE device_id = ? AND time >= ? AND < ?; CREATE TABLE timeseries ( bucket int, (...) PRIMARY KEY((device_id, bucket), time) );
  • 10. #4 Simplify Access Patterns ■ Minimize expensive queries ● Joins, complex filters ■ Pre-aggregate when possible ● Machine Learning – Pre-computed Counts, Sums, Similarity Scores, Summary Statistics ● Dashboards – Denormalized data structures to minimize complexity/latencies ■ Denormalize ● A single query should retrieve all data it requires ● Reduces client-side processing 10
  • 11. #4 Simplify Access Patterns 11
  • 12. #4 Simplify Access Patterns 12
  • 13. #5 Refine and Build! ■ Apply open-loop load testing ● Brian Taylor's talk on How to Maximize Database Concurrency ■ Define performance KPIs upfront ● Database Performance at Scale – Chapter 10: Monitoring ■ Hunt for potential imbalances ● Load imbalance → Uneven access patterns ● Data imbalance → Large partitions/rows ● Hotspots → Low cardinality/Retry Storms/Spammers 13
  • 14. Keep in touch! Felipe Cardeneti Mendes Technical Director ScyllaDB felipe@scylladb.com @felipemendes.dev Attributions Icon made by juicy_fish from www.flaticon.com Icons made by Freepik from www.flaticon.com Icon made by PixelPerfect from www.flaticon.com