SlideShare a Scribd company logo
2
Most read
Database Scalability:
The Shard Conflict
July 2014
2
The Database Scalability: The Shard Conflict
This presentation tackles a particularly
challenging situation that often occurs when
creating a distributed database.
In this presentation you will learn:
• What a ‘shard conflict’ is
• How to identify ‘shard conflicts’
• How to resolve ‘shard conflicts’ in a distributed database
• How ‘shard conflicts’ affect query processing
3
Traditional Databases vs. Distributed Databases
Traditional Monolithic DB
Made up of tables of data that are
related to one another
Modern Distributed DB
Data distribution is necessary for
scalability
All of the data is located in one place and
is easily accessible
Information is spread across various
servers (instances)
The data relationship is stored deep in
the database and can be easily analyzed
and queried using conventional methods
Related data can be distributed into
different partitions, or shards, making
related query requests difficult to
process
4
So, What Is a‘Shard Conflict’?
At ScaleBase, we have coined the term ‘shard conflict’ to
describe a situation where:
• A given statement cannot be executed as is, unchanged,
on all (or one) partitions and cannot be relied upon to
yield a truly correct result.
Let’s take a look at the following examples…
5
Identifying the Conflict
Example #1
Choosing ‘id’ as the
shard key presents a
shard conflict,
because there is no
guarantee that all
employees are in the
same shard as their
corresponding
departments.
6
Resolving the Conflict
Example #2
The Method
• Choose
‘department_id’ as
the ‘Employee
Table’shard key
The Outcome:
• The join query was
optimized as a result
of all department-
related data being
stored in the same
partition
• No cross-joins exist
between partitions
• Statements can now
safely be executed
on all partitions
7
Wait a Minute...There’s Still a Conflict
‘Select e.first_name, e.last_name, m.first_name, m.last_name
from employee e join employee m on e.manager_id=m.id’
Join the ‘Employee Table’
together with itself to find a
manager  there is no
guarantee they are in the
same shard.
The employee tables are not
capable of being sharded by
both ‘id’ and ‘manager_id’ at
the same time.
8
‘Shard Conflict’ Effects on Query Processing
• It is clear from the examples that when dealing
with a foreign key and two tables, a common key
can be utilized to resolve certain (but not all)
conflicts
• Distributed data can become quite complex if not
handled correctly
• It’s the kind of problem that is not always
obvious, and can yield incorrect results,
unnoticed
9
ScaleBase Can Help
ScaleBase is a modern, distributed MySQL database management
system. It is optimized for the cloud and deploys in minutes to enable you
to scale out to an unlimited number of users, data and transactions.
It is a horizontally scalable database cluster built on MySQL that
dynamically optimizes workloads and availability by logically distributing
data across public, private and geo-distributed clouds.
Contact Us
sales@scalebase.com
or
Download free software
ScaleBase Software
http://www.scalebase.com/software/
Use your relational aDBA skills
and get NoSQL capabilities
10
Start Using ScaleBase Today
Check out ScaleBase’s software
• ScaleBase on Amazon
• ScaleBase on Rackspace

More Related Content

PPTX
Challenges in Querying a Distributed Relational Database
PPT
Introduction to Data Management
PDF
A Comparison between Relational Databases and NoSQL Databases
PPT
Data Integration (ETL)
PPTX
Introduction to Polyglot Persistence
DOCX
Android project (1)
PDF
Data virtualization
PDF
Corporate Open Source Anti-patterns
Challenges in Querying a Distributed Relational Database
Introduction to Data Management
A Comparison between Relational Databases and NoSQL Databases
Data Integration (ETL)
Introduction to Polyglot Persistence
Android project (1)
Data virtualization
Corporate Open Source Anti-patterns

Similar to Database Scalability - The Shard Conflict (20)

PDF
My Article on MySQL Magazine
PDF
Data warehousing change in a challenging environment
PPTX
Unit-1.pptx final unit new mtech unit thre
PDF
Data management in cloud study of existing systems and future opportunities
PDF
Top DBMS Interview Questions and Answers.pdf
ODP
Data massage: How databases have been scaled from one to one million nodes
PPTX
Big data Analytics(BAD601) -module-1 ppt
PDF
Rethink Smalltalk
PPT
Multidimensional Database Design & Architecture
PDF
No Sql Databases
PDF
NOSQL -lecture 1 mongo database expalnation.pdf
PPTX
Nosql-Module 1 PPT.pptx
PDF
Many Sources, Many Sinks, One Stream With Joel Eaton | Current 2022
PDF
Geek Sync | Field Medic’s Guide to Database Mirroring
PPTX
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
PPTX
UNIT II (1).pptx
PDF
DDMS DBMS Distributed DB Systems.pdf DMS
PPT
Enterprise NoSQL: Silver Bullet or Poison Pill
PPTX
Big data analytics(BAD601) module-1 ppt
PPTX
NoSQL and Couchbase
My Article on MySQL Magazine
Data warehousing change in a challenging environment
Unit-1.pptx final unit new mtech unit thre
Data management in cloud study of existing systems and future opportunities
Top DBMS Interview Questions and Answers.pdf
Data massage: How databases have been scaled from one to one million nodes
Big data Analytics(BAD601) -module-1 ppt
Rethink Smalltalk
Multidimensional Database Design & Architecture
No Sql Databases
NOSQL -lecture 1 mongo database expalnation.pdf
Nosql-Module 1 PPT.pptx
Many Sources, Many Sinks, One Stream With Joel Eaton | Current 2022
Geek Sync | Field Medic’s Guide to Database Mirroring
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
UNIT II (1).pptx
DDMS DBMS Distributed DB Systems.pdf DMS
Enterprise NoSQL: Silver Bullet or Poison Pill
Big data analytics(BAD601) module-1 ppt
NoSQL and Couchbase
Ad

More from ScaleBase (10)

PPTX
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
PPTX
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
PPTX
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
PDF
ScaleBase Webinar: Strategies for scaling MySQL
PDF
Scaling MySQL: Catch 22 of Read Write Splitting
PDF
Scaling MySQL: Benefits of Automatic Data Distribution
PDF
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
PDF
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
PPTX
ScaleBase Backs Mozilla's new app store
PDF
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Strategies for scaling MySQL
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Benefits of Automatic Data Distribution
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Backs Mozilla's new app store
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
Ad

Recently uploaded (20)

PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to the R Programming Language
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Mega Projects Data Mega Projects Data
PDF
annual-report-2024-2025 original latest.
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Introduction to machine learning and Linear Models
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Reliability_Chapter_ presentation 1221.5784
IB Computer Science - Internal Assessment.pptx
Introduction to Knowledge Engineering Part 1
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to the R Programming Language
[EN] Industrial Machine Downtime Prediction
Mega Projects Data Mega Projects Data
annual-report-2024-2025 original latest.
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Data Science and Data Analysis
Introduction to machine learning and Linear Models
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
ISS -ESG Data flows What is ESG and HowHow
Supervised vs unsupervised machine learning algorithms
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck

Database Scalability - The Shard Conflict

  • 1. Database Scalability: The Shard Conflict July 2014
  • 2. 2 The Database Scalability: The Shard Conflict This presentation tackles a particularly challenging situation that often occurs when creating a distributed database. In this presentation you will learn: • What a ‘shard conflict’ is • How to identify ‘shard conflicts’ • How to resolve ‘shard conflicts’ in a distributed database • How ‘shard conflicts’ affect query processing
  • 3. 3 Traditional Databases vs. Distributed Databases Traditional Monolithic DB Made up of tables of data that are related to one another Modern Distributed DB Data distribution is necessary for scalability All of the data is located in one place and is easily accessible Information is spread across various servers (instances) The data relationship is stored deep in the database and can be easily analyzed and queried using conventional methods Related data can be distributed into different partitions, or shards, making related query requests difficult to process
  • 4. 4 So, What Is a‘Shard Conflict’? At ScaleBase, we have coined the term ‘shard conflict’ to describe a situation where: • A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result. Let’s take a look at the following examples…
  • 5. 5 Identifying the Conflict Example #1 Choosing ‘id’ as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.
  • 6. 6 Resolving the Conflict Example #2 The Method • Choose ‘department_id’ as the ‘Employee Table’shard key The Outcome: • The join query was optimized as a result of all department- related data being stored in the same partition • No cross-joins exist between partitions • Statements can now safely be executed on all partitions
  • 7. 7 Wait a Minute...There’s Still a Conflict ‘Select e.first_name, e.last_name, m.first_name, m.last_name from employee e join employee m on e.manager_id=m.id’ Join the ‘Employee Table’ together with itself to find a manager  there is no guarantee they are in the same shard. The employee tables are not capable of being sharded by both ‘id’ and ‘manager_id’ at the same time.
  • 8. 8 ‘Shard Conflict’ Effects on Query Processing • It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts • Distributed data can become quite complex if not handled correctly • It’s the kind of problem that is not always obvious, and can yield incorrect results, unnoticed
  • 9. 9 ScaleBase Can Help ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions. It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds. Contact Us sales@scalebase.com or Download free software ScaleBase Software http://www.scalebase.com/software/ Use your relational aDBA skills and get NoSQL capabilities
  • 10. 10 Start Using ScaleBase Today Check out ScaleBase’s software • ScaleBase on Amazon • ScaleBase on Rackspace

Editor's Notes

  • #2: The Future of the DBA: Adapting to a New World of IT
  • #3: This presentation reviews the forces, trends and analyst research that is shaping the changing role of the DBA, along with the new skills required from DBAs in the current IT market
  • #5: At ScaleBase, we have coined the term ‘shard conflict’ to describe a situation where: A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result. Let’s take a look at the following examples…
  • #6: Example #1 Choosing ‘id’ as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.
  • #7: Example #2 The Method Choose ‘department_id’ as the ‘Employee Table’shard key The Outcome: The join query was optimized as a result of all department-related data being stored in the same partition No cross-joins exist between partitions Statements can now safely be executed on all partitions
  • #8: Join the ‘Employee Table’ together with itself to find a manager  there is no guarantee they are in the same shard. The employee tables are not capable of being sharded by both ‘id’ and ‘manager_id’ at the same time.
  • #9: It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts Distributed data can become quite complex if not handled correctly It’s the kind of problem that is not always obvious, and can yield incorrect results, unnoticed
  • #10: ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions.  It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds. Use your relational aDBA skills and get NoSQL capabilities Contact Us   sales@scalebase.com or Download a free software ScaleBase Software http://www.scalebase.com/software/
  • #11: Check out ScaleBase software ScaleBase on Amazon ScaleBase on Rackspace