Scaling for Performance

High Performance NoSQL Masterclass
Scaling for Performance
Felipe Cardeneti Mendes

● Solution Architect at ScyllaDB
● Published Author
● Linux and Open Source enthusiast

Agenda
● About ScyllaDB
● Getting started with NoSQL databases
● Deployment and Production Readiness
● Observability Tips and Tricks

About ScyllaDB

ScyllaDB Database Architecture
Horizontal & Vertical Scaling
Built in C++
(no Java overhead)
System and Data
Center Aware
Sharding Per Core Shard-Aware Drivers
Auto-Performance
Tuning
Network
Processor NUMA
Storage
Unique Close-to-Metal Architecture

Shard per Core
Threads Shards

Asynchronous Architecture
Request Answer
Request Answer
Waiting for response
Time Savings
Synchronous
architecture
Asynchronous
architecture

Specialized Cache
Cassandra ScyllaDB
Key
cache
Row
cache
Linux page cache
SSTables
Unified cache
SSTables
Complex Tuning
On-heap /
Off-heap

Ecosystem Compatibility
+ CQL native protocol
+ JMX management protocol
+ Management command line
/REST
+ SSTable file format
+ Configuration file format
+ CQL language

Getting Started with NoSQL
Databases

Modern Business Challenges
Keep CapEx
& OpEx in check
Reduce complexity
Scale as the
data grows
Queries in milliseconds
Leverage massive
amounts of data
Predictable, consistent
performance

Oh… The CAP Theorem

Workload Types

Workload Types
Decision Support
+ More complex queries - large amounts
+ Latency important but not critical
+ Seconds to hours
Fundamental business tasks
+ Simple queries
+ Latency critical
+ Milliseconds per transaction
OLAP
Time
Complexity
of
Query
Time
Complexity
of
Query
OLTP

OLAP Characteristics
Main Characteristics
+ Bound Concurrency
+ Scans and aggregations
+ Rely on MapReduce paradigms
OLAP
Time
Complexity
of
Query
Examples
+ How many users are from the US?
+ How many Twitter posts happened in 2022?
+ Which devices haven’t communicated back
within the past 1 hour?

OLTP Characteristics
Main Characteristics
+ Unbound Concurrency
+ Designed for speed and simplicity
+ Often user facing APIs
Examples
+ What’s the last time an user logged in?
+ What have been the last 10 temperature
measurements for a given device?
+ How many likes a given posting has?
Time
Complexity
of
Query
OLTP

Why not both? Meet Workload Prioritization
100 shares
Ratio = 100:100 (1:1) means equal shares of
processing/resources to complete tasks
Ratio = 100:50 (2:1) means 2X as many shares of processing/resources
for Transactions to complete tasks compared to Analytics
100 shares
100 shares
50 shares
OLTP
OLAP
Which Task to Run

Wide Column Databases Write Path
LSM storage engine’s write path:
18
Writes
commit log

19
Writes
commit log

20
Writes
commit log

21
Writes
commit log
compaction

22
Writes
commit log
compaction

23
Writes
commit log

What is compaction?
24
Hidden Gems
+ This technique of keeping sorted files and merging them is well-known and
often called Log-Structured Merge (LSM) Tree
+ Published in 1996, earliest popular application known is the Lucene search
engine, 1999
Characteristics
+ High performance write.
+ Immediately readable.
+ Reasonable performance for read.

What is a compaction strategy?
25
▪ Which files to compact, and when?
▪ This is called the compaction strategy
▪ The goal of the strategy is low amplification:
○ Avoid read requests needing many sstables.
• read amplification
○ Avoid overwritten/deleted/expired data staying on disk.
○ Avoid excessive temporary disk space needs
• space amplification
○ Avoid compacting the same data again and again.
• write amplification

Which one to choose?
26
Know your workload

Deployment and Production
Readiness

It all starts with Data Modeling
28
Do’s
+ Denormalize
+ Query oriented approach
+ High data distribution / cardinality
Dont’s
+ Create hotspots
+ Large partitions/rows/cells/collections/etc
+ Low cardinality tables/views/indexes

Test, test, test…!
29
Unit Testing
+ Test your workload and access patterns in a Docker container
+ Use specialized stress tools to simulate workload
○ cassandra-stress
○ nosqlbench
○ YCSB
+ OBSERVE the results (more on that later)

Application Development
30
Functional Testing
+ Use Prepared Statements
+ Configure your routing and load balancing policy correctly
○ DCAware and TokenAware policies
○ ShardAware Drivers if using ScyllaDB!
+ Make use of Asynchronous APIs
+ Ensure your client is NOT a bottleneck
+ Paging is important: Ensure you adjust it right

Test, test, test…!
31
Readiness Testing
+ What’s the unreplicated data set size?
+ How many operations per second do I need to achieve?
○ Out of these, what are the reads vs writes distribution?
○ What’s the average payload size?
○ What are my latency requirements?
+ How many regions should it replicate to?
+ Do I need indexes or views to satisfy my queries?
+ What is/are the target deployment location(s)?
+ Is the use case growth predictable or unpredictable?
+ What are the data retention requirements?
+ Is the use case storage or CPU bound?

Oh mighty sizing…!
32
A Sizing Exercise
+ 500k ops/sec with 1KB rows
+ 5TB data set size
+ P99 reads and writes < 10ms
+ Target deployment region: AWS
Simple Math
+ RF=3 / 6 nodes * 5TB = 2.5TB per node
+ 500k ops/sec / 12,5K ops/core = 40 physical cores
+ Result: 6 nodes of i4i.8xlarge

Observability Tips and Tricks

ScyllaDB Monitoring Architecture

Keep in touch!
Solutions Architect
ScyllaDB
felipemendes@scylladb.co
m
Find me on LinkedIn

Scaling for Performance

More Related Content

What's hot (20)

Similar to Scaling for Performance (20)

More from ScyllaDB (20)

Recently uploaded (20)

Scaling for Performance