SlideShare a Scribd company logo
Performance
Also an overloaded term with different
meanings depending on the context
See definition next
Scalability
An overloaded term that has been
perverted by technical marketing
The ability of a database to
improve performance when adding
more resources
Scalability & Performance
1
2
Throughput Response time
3
S C A L A B I L I T Y & P E R F O R M A N C E
Throughput:
The number of operations per time unit (e.g.,
transactions per second, operations per second, queries per
second)
Response time:
The time from submitting an operation (e.g., transaction,
query, individual row operation) until receiving the
answer
Database Performance Metrics
4
S C A L A B I L I T Y & P E R F O R M A N C E
The ability of a database to
deliver better performance
by adding more resources
Scalability
5
S C A L A B I L I T Y & P E R F O R M A N C E
The ability of a database to
reduce response time by
adding more resources
Speedup
Adding more resources (CPU, memory and disk)
to a centralized database yields more throughput
Adding more nodes to a distributed database (in
a cluster) yields more throughput
Vertical vs Horizontal Scalability
Do all databases scale the
same?
Scalability Factor
7
Can we measure and compare
scalabilities?
Measures scalability:
scale up for vertical
scalability and scale out
for horizontal scalability
Scalability Factor
The scale out factor provides the throughput of
a cluster size normalized to the relative
throughput of a single node
It can also be defined as the ratio between the
throughputs of a database with one node and a
database with n cluster nodes
What is the optimal scalability?
Types of Scalability
9
What is the worst scalability?
Scalability can be logarithmic or
linear, but can be also null or even
negative
Types of Scalability
Some databases have negative scalability, as adding more nodes to the system yields
a throughput lower than with a single node
Many databases have sublinear scalability
Often, scalability is null for write workloads and logarithmic for read/write workloads
Linear scalability is the optimal case: with a cluster of n nodes, you get n times the
throughput of a single node
For instance, if a single node delivers 1,000 transactions per second, a cluster of 100
nodes delivers a throughput of 100,000 transactions per second
Logarithmic Scale Out
Results from wasting capacity due to redundant work and/or contention
Open source databases such as MariaDB rely on cluster replication (see our blog post on Cluster Replication)
Cluster replication yields logarithmic scalability: since the writes are executed by all nodes, only the read fraction of
the workload can provide scalability
Shared disk databases also have logarithmic scalability: the need for a concurrency control protocol that locks disk
pages to be written results in substantial contention that increases with the cluster size
T Y P E S O F S C A L A B I L I T Y
Linear Scalability
Key-value stores (see our blog post on NoSQL) typically provide linear scalability because they are very simple,
without addressing the hard problem of scaling transactional management (the so-called ACID properties)
Transactional databases that exhibit linear scalability are very few (but since this blog series is vendor agnostic, we
don't discuss them)
T Y P E S O F S C A L A B I L I T Y
Types of Speed Up
Speed up can also show different behaviors, from null to linear
Linear speed up means that the response time obtained with one node is divided by n with n nodes
Null speed up means, for instance, that a given query always exhibit the same response time with one or more
nodes
Null speed up happens in a database without a parallel/OLAP query engine (i.e., without intra-query parallelism):
with inter-query parallelism only, each node is able to process a subset of the queries, but each query can only be
executed by a single node
The two main metrics for measuring the performance of a database are throughput
and response time
Throughput measures the number of operations (transactions, queries, inserts) per
unit of time
Response time measures how long it takes to execute a particular operation
14
2
1
3
Main Takeaways
Scalability is the ability of the database to handle bigger loads with more resources
In a distributed database, we talk about horizontal scalability where more
resources mean more nodes
In a centralized database, we talk about vertical scalability where more resources
mean more CPU, memory, and disk
15
2
2
3
Main Takeaways
Speed up is related to scalability but a different concept
Refers to the ability of reducing response time by adding more resources
Again, can be horizontal for a distributed database or vertical for a centralized
database
16
3
2
3
Main Takeaways
Scalability and speed up can be of different kinds
Negative and null are of no interest
Logarithmic scalability can be better but only for a few nodes and high proportion
of reads
17
4
Linear scalability is optimal since each new node contributes the same in terms of
additional load that can be handled
2
3
Main Takeaways
References
[Özsu & Valduriez 2020] Tamer Özsu, Patrick
Valduriez.
Principles of Distributed Database Systems, 4th
Edition.
Springer, 2020.
Relevant Posts from the Blog
How To Measure Scalability and Performance
Cluster Replication
Shared Nothing
Architectures
NoSQL
About
About the authors:
Dr. Ricardo Jimenez-Peris is the CEO and
founder of LeanXcale. Before founding
LeanXcale, he was for over 25 years a
researcher in distributed database systems,
published over 100 scientific publications and
has been director of the Distributed Systems
Lab and university professor on distributed
systems.
Dr. Patrick Valduriez is a researcher at INRIA,
co-author of the book “Principles of Distributed
Database Systems” that has educated legions
of students and engineers in this field and more
recently, Scientific Advisor of LeanXcale.
About this blog series:
This blog series aims at educating database
practitioners in topics commonly not well
understood, often due to false or confusing
marketing messages. The blog provides the
foundations and tools to let the reader
actually evaluate database systems, learn
their real capabilities and be able to compare
the performance of the different alternatives
for its targeted workload. The blog is vendor
agnostic and does not mention specific
vendors, sometimes open source databases
are mentioned to illustrate concepts.
About LeanXcale:
LeanXcale is a startup making a NewSQL
database. Since the blog is vendor
agnostic, we do not talk about LeanXcale
itself. Readers interested on LeanXcale
can visit LeanXcale web site.

More Related Content

PPTX
HDFS Tiered Storage
PPTX
Unit 2.pptx
PPTX
Hedvig & ClusterHQ - Persistent, portable storage for Docker
PPTX
DynomiteDB - No spof High-availability Redis cluster solution
PPT
PPTX
سکوهای ابری و مدل های برنامه نویسی در ابر
PPTX
HDFS Erasure Coding in Action
PPTX
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
HDFS Tiered Storage
Unit 2.pptx
Hedvig & ClusterHQ - Persistent, portable storage for Docker
DynomiteDB - No spof High-availability Redis cluster solution
سکوهای ابری و مدل های برنامه نویسی در ابر
HDFS Erasure Coding in Action
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage

What's hot (20)

PDF
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
PPTX
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
PPTX
Apache ignite as in-memory computing platform
PDF
Scalable and High available Distributed File System Metadata Service Using gR...
PDF
Achieving Separation of Compute and Storage in a Cloud World
PPTX
Backup multi-cloud solution based on named pipes
PPTX
Selective Data Replication with Geographically Distributed Hadoop
PPTX
Hadoop and WANdisco: The Future of Big Data
PDF
Design Patterns for Distributed Non-Relational Databases
PPTX
presentation_Hadoop_File_System
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PPTX
Debunking the Myths of HDFS Erasure Coding Performance
PDF
Introduction to Apache Spark
PPTX
Computer Hardware | 3B
PDF
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
PDF
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
PPTX
Zabbix at scale with Elasticsearch
PDF
Scalable Filesystem Metadata Services with RocksDB
PDF
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
PPTX
Hadoop Meetup Jan 2019 - Overview of Ozone
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Apache ignite as in-memory computing platform
Scalable and High available Distributed File System Metadata Service Using gR...
Achieving Separation of Compute and Storage in a Cloud World
Backup multi-cloud solution based on named pipes
Selective Data Replication with Geographically Distributed Hadoop
Hadoop and WANdisco: The Future of Big Data
Design Patterns for Distributed Non-Relational Databases
presentation_Hadoop_File_System
From limited Hadoop compute capacity to increased data scientist efficiency
Debunking the Myths of HDFS Erasure Coding Performance
Introduction to Apache Spark
Computer Hardware | 3B
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Zabbix at scale with Elasticsearch
Scalable Filesystem Metadata Services with RocksDB
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
Hadoop Meetup Jan 2019 - Overview of Ozone
Ad

Similar to Understanding Distributed Databases Scalability (20)

PPTX
Understanding Cluster Replication Scalability
PDF
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
DOCX
Benchmarking Scalability and Elasticity of DistributedDataba.docx
PDF
What is Scalability and How can affect on overall system performance of database
PDF
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
PPT
Scaling Your Web Application
PDF
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
PDF
ODP
Distributed systems - A Primer
PDF
One Size Doesn't Fit All: The New Database Revolution
PDF
Scalability
PDF
Doc 2011101412020074
PDF
NOSQL -lecture 1 mongo database expalnation.pdf
PPTX
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
PDF
Yapp methodology anjo-kolk
PPT
Building High Performance MySql Query Systems And Analytic Applications
PPT
Building High Performance MySQL Query Systems and Analytic Applications
PDF
polyserve-sql-server-scale-out-reporting
PDF
Diagnosing MySQL performance problems
PDF
Building Scalable Web Apps
Understanding Cluster Replication Scalability
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Benchmarking Scalability and Elasticity of DistributedDataba.docx
What is Scalability and How can affect on overall system performance of database
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Scaling Your Web Application
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Distributed systems - A Primer
One Size Doesn't Fit All: The New Database Revolution
Scalability
Doc 2011101412020074
NOSQL -lecture 1 mongo database expalnation.pdf
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Yapp methodology anjo-kolk
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
polyserve-sql-server-scale-out-reporting
Diagnosing MySQL performance problems
Building Scalable Web Apps
Ad

Recently uploaded (20)

PDF
medical staffing services at VALiNTRY
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Introduction to Artificial Intelligence
PPTX
ai tools demonstartion for schools and inter college
PPTX
history of c programming in notes for students .pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Digital Strategies for Manufacturing Companies
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Transform Your Business with a Software ERP System
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
medical staffing services at VALiNTRY
VVF-Customer-Presentation2025-Ver1.9.pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Introduction to Artificial Intelligence
ai tools demonstartion for schools and inter college
history of c programming in notes for students .pptx
Design an Analysis of Algorithms II-SECS-1021-03
Nekopoi APK 2025 free lastest update
ManageIQ - Sprint 268 Review - Slide Deck
PTS Company Brochure 2025 (1).pdf.......
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Understanding Forklifts - TECH EHS Solution
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Digital Strategies for Manufacturing Companies
Wondershare Filmora 15 Crack With Activation Key [2025
2025 Textile ERP Trends: SAP, Odoo & Oracle
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Transform Your Business with a Software ERP System
How Creative Agencies Leverage Project Management Software.pdf
Design an Analysis of Algorithms I-SECS-1021-03

Understanding Distributed Databases Scalability

  • 1. Performance Also an overloaded term with different meanings depending on the context See definition next Scalability An overloaded term that has been perverted by technical marketing The ability of a database to improve performance when adding more resources Scalability & Performance 1
  • 3. 3 S C A L A B I L I T Y & P E R F O R M A N C E Throughput: The number of operations per time unit (e.g., transactions per second, operations per second, queries per second) Response time: The time from submitting an operation (e.g., transaction, query, individual row operation) until receiving the answer Database Performance Metrics
  • 4. 4 S C A L A B I L I T Y & P E R F O R M A N C E The ability of a database to deliver better performance by adding more resources Scalability
  • 5. 5 S C A L A B I L I T Y & P E R F O R M A N C E The ability of a database to reduce response time by adding more resources Speedup
  • 6. Adding more resources (CPU, memory and disk) to a centralized database yields more throughput Adding more nodes to a distributed database (in a cluster) yields more throughput Vertical vs Horizontal Scalability
  • 7. Do all databases scale the same? Scalability Factor 7 Can we measure and compare scalabilities? Measures scalability: scale up for vertical scalability and scale out for horizontal scalability
  • 8. Scalability Factor The scale out factor provides the throughput of a cluster size normalized to the relative throughput of a single node It can also be defined as the ratio between the throughputs of a database with one node and a database with n cluster nodes
  • 9. What is the optimal scalability? Types of Scalability 9 What is the worst scalability? Scalability can be logarithmic or linear, but can be also null or even negative
  • 10. Types of Scalability Some databases have negative scalability, as adding more nodes to the system yields a throughput lower than with a single node Many databases have sublinear scalability Often, scalability is null for write workloads and logarithmic for read/write workloads Linear scalability is the optimal case: with a cluster of n nodes, you get n times the throughput of a single node For instance, if a single node delivers 1,000 transactions per second, a cluster of 100 nodes delivers a throughput of 100,000 transactions per second
  • 11. Logarithmic Scale Out Results from wasting capacity due to redundant work and/or contention Open source databases such as MariaDB rely on cluster replication (see our blog post on Cluster Replication) Cluster replication yields logarithmic scalability: since the writes are executed by all nodes, only the read fraction of the workload can provide scalability Shared disk databases also have logarithmic scalability: the need for a concurrency control protocol that locks disk pages to be written results in substantial contention that increases with the cluster size T Y P E S O F S C A L A B I L I T Y
  • 12. Linear Scalability Key-value stores (see our blog post on NoSQL) typically provide linear scalability because they are very simple, without addressing the hard problem of scaling transactional management (the so-called ACID properties) Transactional databases that exhibit linear scalability are very few (but since this blog series is vendor agnostic, we don't discuss them) T Y P E S O F S C A L A B I L I T Y
  • 13. Types of Speed Up Speed up can also show different behaviors, from null to linear Linear speed up means that the response time obtained with one node is divided by n with n nodes Null speed up means, for instance, that a given query always exhibit the same response time with one or more nodes Null speed up happens in a database without a parallel/OLAP query engine (i.e., without intra-query parallelism): with inter-query parallelism only, each node is able to process a subset of the queries, but each query can only be executed by a single node
  • 14. The two main metrics for measuring the performance of a database are throughput and response time Throughput measures the number of operations (transactions, queries, inserts) per unit of time Response time measures how long it takes to execute a particular operation 14 2 1 3 Main Takeaways
  • 15. Scalability is the ability of the database to handle bigger loads with more resources In a distributed database, we talk about horizontal scalability where more resources mean more nodes In a centralized database, we talk about vertical scalability where more resources mean more CPU, memory, and disk 15 2 2 3 Main Takeaways
  • 16. Speed up is related to scalability but a different concept Refers to the ability of reducing response time by adding more resources Again, can be horizontal for a distributed database or vertical for a centralized database 16 3 2 3 Main Takeaways
  • 17. Scalability and speed up can be of different kinds Negative and null are of no interest Logarithmic scalability can be better but only for a few nodes and high proportion of reads 17 4 Linear scalability is optimal since each new node contributes the same in terms of additional load that can be handled 2 3 Main Takeaways
  • 18. References [Özsu & Valduriez 2020] Tamer Özsu, Patrick Valduriez. Principles of Distributed Database Systems, 4th Edition. Springer, 2020.
  • 19. Relevant Posts from the Blog How To Measure Scalability and Performance Cluster Replication Shared Nothing Architectures NoSQL
  • 20. About About the authors: Dr. Ricardo Jimenez-Peris is the CEO and founder of LeanXcale. Before founding LeanXcale, he was for over 25 years a researcher in distributed database systems, published over 100 scientific publications and has been director of the Distributed Systems Lab and university professor on distributed systems. Dr. Patrick Valduriez is a researcher at INRIA, co-author of the book “Principles of Distributed Database Systems” that has educated legions of students and engineers in this field and more recently, Scientific Advisor of LeanXcale. About this blog series: This blog series aims at educating database practitioners in topics commonly not well understood, often due to false or confusing marketing messages. The blog provides the foundations and tools to let the reader actually evaluate database systems, learn their real capabilities and be able to compare the performance of the different alternatives for its targeted workload. The blog is vendor agnostic and does not mention specific vendors, sometimes open source databases are mentioned to illustrate concepts. About LeanXcale: LeanXcale is a startup making a NewSQL database. Since the blog is vendor agnostic, we do not talk about LeanXcale itself. Readers interested on LeanXcale can visit LeanXcale web site.