SlideShare a Scribd company logo
A ScyllaDB Community
Object Storage in ScyllaDB
Ran Regev
Software Team Leader
Ran Regev
■ Software developer
■ 25 years in different domains
■ Communication (embedded devices)
■ Traffic (waze-like)
■ Cyber-security (threat simulation and mitigation)
■ Medical (equipment, not embedded)
■ ISO C++ committee member
Agenda ■ What is Object Storage
■ In a nutshell only
■ ScyllaDB’s usage of Object Storage
■ Backup
■ Tiering
Object Storage
4
object
API
ScyllaDB
Node
NVMe
object
object
object
object
object
object
object
⏫ Unlimited Storage
⏫ Durable
⏫ Cheap
⏬ High Latency
Backup and
Restore
■ Current Approach, old and new
■ Old: rclone
■ New: Native ScyllaDB
ScyllaDB Node
NVMe
Scylla
process
rclone
process
user queries
SSTable
1. Uncontrolled
contention on the
Disk
rclone
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
2. Data is
downloaded from
object storage,
saved and then
read by ScyllaDB
ScyllaDB Node
NVMe
ScyllaDB
process
user queries
Native ScyllaDB
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Restore ■ Challenges
■Fast Restore
■The Holy Grail: Restore To Any Topology
■ Given an existing backup
restore to any topology
■ Make sure the data is spread
efficiently
■ Restore as fast as possible
Challenges
SSTable 1A
node1 node2
SSTable 1B
SSTable 1C
SSTable 2A
SSTable 2C
SSTable 2B
SSTable 3C
node3
SSTable 1A
SSTable 1B
SSTable 1C SSTable 2A
SSTable 2C
SSTable 2B SSTable 3C
Original Cluster
Target Cluster
node1
nodeN
Backup
and Restore ■ Understanding the Complexity
■Moving Parts (variables)
■ Token Ranges of
● SSTables
● Nodes
■Data Replication
■Vnodes vs. Tablets
SSTable
90,000
94,291
Complexity: Node example
SSTable
with the
range of
tokens it
holds
25,002 98,968
104,254 187,965
547,831 602,337
25,104
26,200
SSTable
70,687
71,404
SSTable
97,875
98,627
SSTable SSTable SSTable SSTable SSTable
SSTable SSTable SSTable SSTable
Start token range
owned by the node
End token range
owned by the node
Complexity: Data Replication - fact
RF = 3
The same token (data) exists on
different SSTables on multiple nodes
Complexity: Data Replication - Multiple Backups
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
The same
token is kept in
more than one
SSTable
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
One SStable may contain data
belonging to more than one
node.
Complexity: Data Replication - Multiple Restore
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
vNodes
Tablets
Complexity: vNodes Vs. Tablets
■ Naive, inefficient Restore
■ One node downloads all SSTables and rebuilds the database: mutation by
mutation with load-and-stream.
■ Non Naive, Faster and more efficient Restore
■ Storing Token-Ranges of SStables is required.
■ Reading target topology
■ Preprocessing the combined information: stored token ranges and target
topology
■ All nodes participating in the restoration - each nodes does the bare
minimum required.
Complexity Conclusion
Fast Restore
Approaches
■ “Download & Stream” locally only
■ 1:1 Restore
■ Instant Readiness
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Fast Restore: Preamble - load and stream
All SStables are downloaded by
one node and mutations are
streamed to their destinations
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Fast Restore: “Download & Stream” locally only
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Fast Restore: 1:1 Restore
SSTable
Each SSTables goes exactly to
where it belongs. No load and
stream at all.
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Fast Restore: Instant Readiness
Instead of downloading and
streaming - serve from remote
ScyllaDB
future with
Object Storage
■ Tiering
■Time Windowed S3
■General Purpose Tiering
■ Tiering is NOT backup.
■ Tiering means serving data from object storage, as if it is local.
■ Example for Differences
■ Backup are not compacted (may be compacted into themselves)
■ backups are not read unless for restoring data.
■ Tiering enables more storage with less costs.
■ At the expanse of latency.
■ Tiering enables “fast backup” for tiered data.
■ We only need to mark a remote SSTable as being used for backup - no copy.
Tiering
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Tiering: General Solution
SSTables should be moved
from and to object storage
based on their usage, aka
“temperature”
SSTable
SSTable
SSTable
SSTable SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Tiering: Complexity
● Algorithm(s) for Hot and Cold Data
● Tiered data is (probably) not an entire SSTable,
but rather smaller fragments of data
● Storage calculations - we need to keep metadata
that points to object storage, and even for
metadata we have limited local space
● We can count on Object Storage durability when
promising Replication Factor
Challenges
Scylla Node
8 days
TWCS-S3
…
N days
7 days
1 day
Today
2 days
3 days
4 days
5 day
6 days
■ Serving from Remote (new feature)
■ Move Time To Tier SStables to
Object Storage, as part of their life
cycle management
■ No cold and hot data - just
cold.
■ We may need to keep some
parts of the SSTable (e.g.
index) on local disk
Native Support
for any Vendor
■Amazon (exists)
■Google
■Microsoft
■Oracle
■More?
■ Amazon is not the only object storage provider
■ Google, Microsoft, Oracle also suggest Object Storage Solutions
■ At ScyllaDB, we strive to support each vendor as natively as possible
■ And thus gain efficiency
■ And thus propose best solutions for our customers
Object Storage Vendors
Stay in Touch
Ran Regev
ran.regev@scylladb.com
regevran

More Related Content

PDF
High Availability: Lessons Learned by Paul Preuveneers
PDF
To Serverless and Beyond
PDF
Dissecting Real-World Database Performance Dilemmas
PDF
Using ScyllaDB for Real-Time Write-Heavy Workloads
PDF
Renegotiating the boundary between database latency and consistency
PDF
What’s New in ScyllaDB Open Source 5.0
PDF
Dissecting Real-World Database Performance Dilemmas
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
High Availability: Lessons Learned by Paul Preuveneers
To Serverless and Beyond
Dissecting Real-World Database Performance Dilemmas
Using ScyllaDB for Real-Time Write-Heavy Workloads
Renegotiating the boundary between database latency and consistency
What’s New in ScyllaDB Open Source 5.0
Dissecting Real-World Database Performance Dilemmas
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs

Similar to Object Storage in ScyllaDB by Ran Regev, ScyllaDB (20)

PDF
Architecture for Extreme Scale by Avi Kivity
PDF
ScyllaDB V Developer Deep Dive Series: Resiliency and Strong Consistency via ...
PDF
Using ScyllaDB for Extreme Scale Workloads
PPTX
Conquering Load Balancing: Experiences from ScyllaDB Drivers
PDF
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
PDF
Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Elasticity, Speed & Simplicity: Get the Most Out of New ScyllaDB Capabilities
PDF
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB 2024
PDF
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PPTX
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
PDF
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
PDF
Vienna Feb 2015: Cassandra: How it works and what it's good for!
PDF
Distributed Database Consistency: Architectural Considerations and Tradeoffs
PPTX
Top NoSQL Data Modeling Mistakes
PDF
Big table presentation-final
PDF
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...
PDF
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...
PDF
The Path to ScyllaDB 5.2
Architecture for Extreme Scale by Avi Kivity
ScyllaDB V Developer Deep Dive Series: Resiliency and Strong Consistency via ...
Using ScyllaDB for Extreme Scale Workloads
Conquering Load Balancing: Experiences from ScyllaDB Drivers
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Elasticity, Speed & Simplicity: Get the Most Out of New ScyllaDB Capabilities
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB 2024
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
ScyllaDB: 10 Years and Beyond by Dor Laor
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
Vienna Feb 2015: Cassandra: How it works and what it's good for!
Distributed Database Consistency: Architectural Considerations and Tradeoffs
Top NoSQL Data Modeling Mistakes
Big table presentation-final
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Arch...
The Path to ScyllaDB 5.2
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
PDF
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
PDF
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
PDF
Database Migration Strategies and Pitfalls by Patrick Bossman
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
How Natura Uses ScyllaDB and ScyllaDB Connector to Create a Real-time Data Pi...
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
Database Migration Strategies and Pitfalls by Patrick Bossman
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation

Object Storage in ScyllaDB by Ran Regev, ScyllaDB