SlideShare a Scribd company logo
Real-Time or Analytics
Workloads...
Why Not Both?
Felipe Cardeneti Mendes, Solution Architect at ScyllaDB
Felipe Mendes
■ Published Author
■ ScyllaDB Committer
■ IT Specialist & Solution Architect
■ Open Source Enthusiast
Your photo
goes here,
smile :)
ScyllaDB as an
Analytics Engine
An IoT Application
Total amount of data points
526 billion
temperature readings
1,000,000 sensors, representing homes in an area
365 days (1 year storage requirement) 1 reading per minute
Analytics over the entire data?
How long would it take at
normal speeds?
We need more if analytics
are a part of the pipeline
We need ScyllaDB
200,000 points/second
730 hours (30 days)
1 million points/second
146 hours (almost a week)
Easy Peasy
Scanning 3 months of data
Finished Scanning. Succeeded 132,480,000,000 rows. Failed 0 rows. RPC Failures: 0. Took 110,892.91 ms
Processed 1,194,666,083 rows/s
Absolute min: 19.71, date 2019-08-27, sensorID 473869
Absolute max: 135.21, date 2019-08-27, sensorID 473869
Easy Peasy
Scanning the entire dataset
Finished Scanning. Succeeded 525,599,474,400 rows. Failed 0 rows. RPC Failures: 0. Took 542,191.31 ms
Processed 969,398,554 rows/s
Absolute min: 68.00, date 2019-05-28, sensorID 82114
Absolute max: 79.99, date 2019-03-19, sensorID 152594
We can efficiently
process over 1.2 billion
points per second
(we’ll process whatever you need, too!)
Concurrency Challenges
10
P99 climbs to unacceptable values
The Latency Problem
Throughput gradually drops, load distribution becomes unfair
The Throughput Problem
Why Contention Happens?
Primarily lack of system resources (disk I/O, CPU time)
■ Not necessarily a problem
■ Introduces queueing
Easy then, let's simply ...
Addressing the Problem?
■ Divide and conquer!
■ Division in Space (Multi DC)
■ Division in time (off peak OLAP)
Workload Prioritization
Isolation and Performance Optimization
Background Tasks
User Tasks
Shares
Different workloads require different priorities
■ Meet SLAs
■ Flexible Configuration
■ Adaptability to Changing Conditions
Getting Started
■ Set up authentication, assign Roles to each workload
■ Prioritization is wired on the authenticated users role
■ Define your Service Levels
■ For a primary workload:
■ For a secondary one:
■ Profit!
CREATE SERVICE LEVEL main WITH shares = 200
CREATE SERVICE LEVEL secondary WITH shares = 600
Prioritization and Isolation
are NOT enough!
Workload
Characteristics
Workloads Characteristics: Time
21
The timeout dilemma:
1. Timeout should follow: 𝑇𝑠𝑒𝑟𝑣𝑒𝑟 ≤ 𝑇𝑐𝑙𝑖𝑒𝑛𝑡
2. For Real-time:
■ Can’t be too high
■ Incurs retries or dropped requests
■ Excessive retries result in wasted resources
3. For Analytics:
■ Can’t be too low
■ Otherwise Batch will likely fail
■ High throughput will typically increase latencies due to
contention
Workloads Characteristics: Shedding
22
Overload response:
1. Interactive workload:
■ Throttling won't help
■ Delaying response to user A will not cause
some user B to stop sending requests
■ Unbound concurrency
2. Batch workload:
■ Just throttle
■ Allow us to have a knob that controls the
pace of the analytics workload
■ Bounded concurrency
Introducing Workload Characterization
23
Ideally – we want the database to behave differently:
■ For Real-time:
■ Have low timeout
■ Load shedding (fail excessive requests), as the database
can NOT slow down interactive workloads.
■ Dedicate most of the resources to this workload.
■ For Batch:
■ Relatively higher timeout
■ Apply back-pressure via throttling
■ Use mostly unused resources
■ For Real-Time:
■ For Analytics:
Why not just hint the database with specifics?
Introducing Workload Characterization
24
Have low timeout (30ms) timeout=30ms
Load shedding AND workload_type=interactive
Dedicate most of the resources AND shares=800
Have relatively high timeout (5s) timeout=5s
Throttling AND workload_type=batch
Use mostly unused resources AND shares=200
Takeaways
25
■ ScyllaDB powers both Analytics & Real-time intensive workloads
■ Workload Prioritization helps with:
■ Infrastructure Consolidation
■ Resource Optimization
■ Performance Isolation
■ Workload Characterization provide:
■ Workload specific settings
■ Distinct overload and timeout responses
Stay in Touch
Felipe Mendes
felipemendes@scylladb.com
@cardeneti82118
fee-mendes
Find me on LinkedIn

More Related Content

PPTX
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
PPTX
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
PDF
Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
How to Meet Your P99 Goal While Overcommitting Another Workload
PDF
Dissecting Real-World Database Performance Dilemmas
PDF
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
PDF
Using ScyllaDB for Real-Time Write-Heavy Workloads
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
How to Meet Your P99 Goal While Overcommitting Another Workload
Dissecting Real-World Database Performance Dilemmas
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Using ScyllaDB for Real-Time Write-Heavy Workloads

Similar to Real-Time or Analytics Workloads... Why Not Both? (20)

PDF
How Optimizely (Safely) Maximizes Database Concurrency.pdf
PDF
Dissecting Real-World Database Performance Dilemmas
PDF
Webinar: How to Shrink Your Datacenter Footprint by 50%
PDF
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
PDF
ScyllaDB Virtual Workshop
PPTX
ShareChat's Path to High-Performance NoSQL with ScyllaDB
PPTX
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
PDF
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
PDF
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
PDF
What Developers Need to Unlearn for High Performance NoSQL
PPTX
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
PDF
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
PPTX
Overcoming Barriers of Scaling Your Database
PDF
Transforming the Database: Critical Innovations for Performance at Scale
PPTX
How Workload Prioritization Reduces Your Datacenter Footprint
PPTX
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
PDF
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
PDF
Replacing Your Cache with ScyllaDB by Felipe Cardeneti Mendes and Tomasz Grabiec
PDF
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
PPTX
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
How Optimizely (Safely) Maximizes Database Concurrency.pdf
Dissecting Real-World Database Performance Dilemmas
Webinar: How to Shrink Your Datacenter Footprint by 50%
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB Virtual Workshop
ShareChat's Path to High-Performance NoSQL with ScyllaDB
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
What Developers Need to Unlearn for High Performance NoSQL
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Overcoming Barriers of Scaling Your Database
Transforming the Database: Critical Innovations for Performance at Scale
How Workload Prioritization Reduces Your Datacenter Footprint
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Replacing Your Cache with ScyllaDB by Felipe Cardeneti Mendes and Tomasz Grabiec
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
Ad

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf

Real-Time or Analytics Workloads... Why Not Both?

  • 1. Real-Time or Analytics Workloads... Why Not Both? Felipe Cardeneti Mendes, Solution Architect at ScyllaDB
  • 2. Felipe Mendes ■ Published Author ■ ScyllaDB Committer ■ IT Specialist & Solution Architect ■ Open Source Enthusiast Your photo goes here, smile :)
  • 4. An IoT Application Total amount of data points 526 billion temperature readings 1,000,000 sensors, representing homes in an area 365 days (1 year storage requirement) 1 reading per minute
  • 5. Analytics over the entire data? How long would it take at normal speeds? We need more if analytics are a part of the pipeline We need ScyllaDB 200,000 points/second 730 hours (30 days) 1 million points/second 146 hours (almost a week)
  • 6. Easy Peasy Scanning 3 months of data Finished Scanning. Succeeded 132,480,000,000 rows. Failed 0 rows. RPC Failures: 0. Took 110,892.91 ms Processed 1,194,666,083 rows/s Absolute min: 19.71, date 2019-08-27, sensorID 473869 Absolute max: 135.21, date 2019-08-27, sensorID 473869
  • 7. Easy Peasy Scanning the entire dataset Finished Scanning. Succeeded 525,599,474,400 rows. Failed 0 rows. RPC Failures: 0. Took 542,191.31 ms Processed 969,398,554 rows/s Absolute min: 68.00, date 2019-05-28, sensorID 82114 Absolute max: 79.99, date 2019-03-19, sensorID 152594
  • 8. We can efficiently process over 1.2 billion points per second (we’ll process whatever you need, too!)
  • 10. 10
  • 11. P99 climbs to unacceptable values The Latency Problem
  • 12. Throughput gradually drops, load distribution becomes unfair The Throughput Problem
  • 13. Why Contention Happens? Primarily lack of system resources (disk I/O, CPU time) ■ Not necessarily a problem ■ Introduces queueing
  • 14. Easy then, let's simply ... Addressing the Problem? ■ Divide and conquer! ■ Division in Space (Multi DC) ■ Division in time (off peak OLAP)
  • 16. Isolation and Performance Optimization Background Tasks User Tasks
  • 17. Shares Different workloads require different priorities ■ Meet SLAs ■ Flexible Configuration ■ Adaptability to Changing Conditions
  • 18. Getting Started ■ Set up authentication, assign Roles to each workload ■ Prioritization is wired on the authenticated users role ■ Define your Service Levels ■ For a primary workload: ■ For a secondary one: ■ Profit! CREATE SERVICE LEVEL main WITH shares = 200 CREATE SERVICE LEVEL secondary WITH shares = 600
  • 21. Workloads Characteristics: Time 21 The timeout dilemma: 1. Timeout should follow: 𝑇𝑠𝑒𝑟𝑣𝑒𝑟 ≤ 𝑇𝑐𝑙𝑖𝑒𝑛𝑡 2. For Real-time: ■ Can’t be too high ■ Incurs retries or dropped requests ■ Excessive retries result in wasted resources 3. For Analytics: ■ Can’t be too low ■ Otherwise Batch will likely fail ■ High throughput will typically increase latencies due to contention
  • 22. Workloads Characteristics: Shedding 22 Overload response: 1. Interactive workload: ■ Throttling won't help ■ Delaying response to user A will not cause some user B to stop sending requests ■ Unbound concurrency 2. Batch workload: ■ Just throttle ■ Allow us to have a knob that controls the pace of the analytics workload ■ Bounded concurrency
  • 23. Introducing Workload Characterization 23 Ideally – we want the database to behave differently: ■ For Real-time: ■ Have low timeout ■ Load shedding (fail excessive requests), as the database can NOT slow down interactive workloads. ■ Dedicate most of the resources to this workload. ■ For Batch: ■ Relatively higher timeout ■ Apply back-pressure via throttling ■ Use mostly unused resources
  • 24. ■ For Real-Time: ■ For Analytics: Why not just hint the database with specifics? Introducing Workload Characterization 24 Have low timeout (30ms) timeout=30ms Load shedding AND workload_type=interactive Dedicate most of the resources AND shares=800 Have relatively high timeout (5s) timeout=5s Throttling AND workload_type=batch Use mostly unused resources AND shares=200
  • 25. Takeaways 25 ■ ScyllaDB powers both Analytics & Real-time intensive workloads ■ Workload Prioritization helps with: ■ Infrastructure Consolidation ■ Resource Optimization ■ Performance Isolation ■ Workload Characterization provide: ■ Workload specific settings ■ Distinct overload and timeout responses
  • 26. Stay in Touch Felipe Mendes felipemendes@scylladb.com @cardeneti82118 fee-mendes Find me on LinkedIn