SlideShare a Scribd company logo
Databases
Sargun Dhillon
@Sargun
What is a database?
A database is an organized collection of data
What are databases
for?
Applications
Internet Applications
Experiencing exploding growth
Internet Traffic vs. Penetration
0
25
50
75
100
0
10000
20000
30000
40000
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
IP Traffic (PB/mo) Global Penetration (%)
Number of Internet Users in 2012
Average Distance to Every Human
Extrapolating
We have not yet reached Peak “Web” and we won’t see
it for some time
Applications
How are they built?
Basic Application
Useful Application
Add Persistence
Scale Out
Scale Out with Correctness
What is a Transaction?
A Unit of Work
Transaction Scheduling
Concurrent Operations
Non-Conflicting Concurrency
Parallel Execution
ACID
ACID = Atomicity
A transaction executes or it does not
ACID = Consistency
Correctness; Require the database to follow set of
invariants
ACID = Isolation
Prevent inter-actor visibility during concurrent operations
ACID = Durability
Once you write, it will survive
Lifecycle of a Transaction
Vertically Scalability
Moore’s Law can take us places
Biggest AWS Database
• vCPUs: 32
• Memory: 244
• Storage: 3TB
• IOPs: 30,000 IOPs
• Networking: 10 Gigabit
• Resiliency: Multi-AZ
• SLA: 99.95%
• Backend: Postgresql
$141,052.66/yr
Scaling Beyond
Sharding?
Intro to Databases
Do we have a natural
sharding key?
Add a Coordinator?
Two-phase commit?
Three-phase commit?
Paxos?
Enhanced Three-phase commit?
Wat?
Egalitarian Paxos?
Do we really want to
run NxM databases?
Partial Availability
Failure detectors are
hard
Database Failure
Cascading App Failure
Recovery
Hotspots?
(The “Bieber” problem)
Scaling SSI databases
is a hard problem
What if want
multidatacenter?
Intro to Databases
Intro to Databases
Intro to Databases
No latency win for
mutable data
Must sacrifice recency
for latency win
Complex Routing
Semantics
Multi-master requires
at least 1 RTT
-F1: A Distributed SQL Database That Scales, Google
“Because the data is synchronously replicated
across multiple datacenters, and because
we’ve chosen widely distributed datacenters,
the commit latencies are relatively high (50-150
ms).”
-Kohavi and Longbotham 2007
“Every 100 ms increase in load time of
Amazon.com decreased sales by 1%.”
(~$120M of losses per 100 ms)
“Average partition duration ranged from 6 minutes for
software-related failures to more than 8.2 hours for
hardware-related failures (median 2.7 and 32 minutes;
95th percentile of 19.9 minutes and 3.7 days,
respectively).”
-The Network is Reliable
WANs Fail
Is there another way?
Eventually
Consistent
Systems
-F1: A Distributed SQL Database That Scales, Google
“We also have a lot of experience with eventual
consistency systems at Google. In all such
systems, we find developers spend a
significant fraction of their time building
extremely complex and error-prone
mechanisms to cope with eventual consistency
and handle data that may be out of date. We
think this is an unacceptable burden to place
on developers and that consistency problems
should be solved at the database level. ”
CAP Theorem
“A shared-data system can have at most
two of the three following properties:
Consistency, Availability, and tolerance to
network Partitions.”
-Dr. Eric Brewer
On Consistency
• ACID Consistency: Any transaction, or operation
will bring the database from one valid state to
another
• CAP Consistency: All nodes see the same data at
the same time (synchrony)
On Partition Tolerance
• The network will be allowed to lose arbitrarily many
messages sent from one node to another.
• Databases systems, in order to be useful must
have communication over the network
• Clients count
There is no such thing as
a 100% reliable network:
Can’t choose CA
http://codahale.com/you-cant-sacrifice-partition-tolerance
We Can Have Both*
(*Just not at the same time)
PNUTS
• Paper released by Yahoo! research in 2008
• Operations:
• Read-Any
• Read-Critical(Required-Version)*
• Read-Latest
• Write
• Test-and-set-write(Required-Version)
* Will fall back to CP operation
Weak Consistency
Weak Consistency
“This is a specific form of weak
consistency; the storage system
guarantees that if no new
updates are made to the object,
eventually all accesses will
return the last updated value.”
Definition of “Eventual Consistency” from “Eventually
Consistency Revisited” - Werner Vogels
Intro to Databases
Eventual Consistency
in the LAN
Less Relevant Today
Good at Building
LANs at Scale
Facebook Fabric
Microsoft VL2
Google Jupiter
Less Interesting
Eventual Consistency
in the WAN
Low-latency
everywhere
Write Anywhere
Beat the speed of the light
Build for WAN locality
Typical Pattern
with
COTS EC Store
System Model
Use Case:
Social Network
Models:
Users, Posts, Friends
Schema
CREATE TABLE test.users (
user_name text PRIMARY KEY,
friends set<text>,
posts set<text>
)
State
*****:test> SELECT * FROM users;
user_name | friends | posts
-----------+----------+-------
sargun | {'BOSS'} | null
Let’s Post!
(But First)
Remove Boss
*****:test> UPDATE users SET
friends = friends - {'BOSS'}
WHERE user_name = 'sargun' ;
Hidden Failure
Dropped Unfriending
State at DC2 & DC3
*****:test> SELECT * FROM users;
user_name | friends | posts
-----------+----------+-------
sargun | {'BOSS'} | null
Post Message
*****:test> UPDATE users SET
posts = posts + {'PARTY'} WHERE
user_name = 'sargun' ;
State at DC2 & DC3
*****:test> SELECT * FROM users;
user_name | friends | posts
-----------+----------+-----------
sargun | {'BOSS'} | {'PARTY'}
Worse Than Banking
Unbounded Financial Loss
No
Happens-Before (h.b.)
Relationship
Solution: Wait For Acks
Very Little Benefit
Over
CP system
Quorum Systems
RYOW at an
Incredible Cost
Why not just do
Paxos*?
Single-Decree Paxos Variant such as EPaxos, Cheap Paxos, or
Multi-Paxos
Intro to Databases
Intro to Databases
Quorum
Quorum
Participating Quorums
Must Overlap
Just Perform
Paxos Reconfiguration
to
Recover from Failure
Intro to Databases
Intro to Databases
Intro to Databases
Is there an alternative?
Strong
Eventual
Consistency
Strong Eventual Consistency
“Any set of nodes that have received
the same (unordered) set of updates
will be in the same state.”
How do you even use this?
Vector Clocks
Vector Clocks
• Extension of Lamport Clocks
• Used to detect cause and effect in distributed
systems
• Can determine concurrency of events, and
causality violations
• Preserves h.b. relationships
CRDTs
• CRDTs:
• Convergent Replicated Data Types
• Commutative Replication Data Types
• Enables data structures to be always writeable on both sides of a partition,
and replay after healing a partition
• Enable distributed computation across monotonic functions
• Two Types:
• CvRDTs
• CmRDTs
CRDTs
CvRDTs
• State / value based CRDTs
• Minimal state
• Don’t require active garbage collection
Set CvRDT
CmRDTs
• Op / method based CRDTs
• Size grows monotonically
• Uses version vectors to determine order of
operations
Counter CmRDT
CRDTs in the Wild
• Sets
• Observe-remove set
• Grow-only sets
• Counters
• Grow-only counters
• PN-Counters
• Flags
• Maps
Data structures that are
CRDTs
• Probabilistic, convergent data structures
• Hyper log log
• Bloom filter
• Co-recursive folding functions
• Maximum-counter
• Running Average
• Operational Transform
CRDTs
• Incredibly powerful primitive
• Not only useful for in-database manipulation but
client-database interaction
• You can compose them, and build your own
• Garbage collection is tricky
Riak
In Action
Model
curl -s http://localhost:8098/types/test/buckets/test/
datatypes/sargun |python -mjson.tool
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
“Primary Key”
curl -s http://localhost:8098/types/test/buckets/test/
datatypes/sargun |python -mjson.tool
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
Causal Context
curl -s http://localhost:8098/types/test/buckets/test/
datatypes/sargun |python -mjson.tool
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
Update
curl -XPOST http://localhost:8098/types/test/buckets/
test/datatypes/sargun 
-H "Content-Type: application/json" 
-H "X-Riak-Vclock: g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq" 
-d '
{
"update": {
"friends_set": {
"remove": "Boss"
}
}
}'
Updated Entries
(during partition)
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQtq",
"type": "map",
"value": {
"friends_set": [],
"posts_set": []
}
}
Updatecurl -XPOST http://localhost:8098/types/test/buckets/
test/datatypes/sargun 
-H "Content-Type: application/json"
-H "X-Riak-Vclock: g2wAAAABaAJtAAAACBjtDYuvG6A4YQtq"
-d '
{
"update": {
"posts_set": {
"add": "Party"
}
}
}'
Updated Entries
(After Healing)
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQ5q",
"type": "map",
"value": {
"friends_set": [],
"posts_set": [
"Party"
]
}
}
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQ5q",
"type": "map",
"value": {
"friends_set": [],
"posts_set": [
"Party"
]
}
}
Currently:
Replicates entire value
Future Work:
δ-CRDT
Ship only Deltas
Eventual Consistency
In Summary
SEC Enables
Distributed
Scalable
Scalability
Processors
Fault-Tolerant
Applications
Eventual Consistency (CAP)
Without Consistency (ACID)
Gives EC a Bad Name
Invariant Operation AP / CP
Specify unique ID Any CP
Generate unique ID Any AP
> INCREMENT AP
> DECREMENT CP
< INCREMENT CP
< DECREMENT AP
Secondary Index Any AP
Materialized View Any AP
AUTO_INCREMEN
T
INSERT CP
Linearizability CAS CP
Operations Requiring
Weak Consistency
vs.
Strong Consistency
BASE not ACID
•Basically Available: There will be a response
per request (failure, or success)
•Soft State: Any two reads against the system
may yield different data (when measured
against time)
•Eventually Consistent: The system will
eventually become consistent when all
failures have healed, and time goes to infinity
Brand New Technology
Still being invented
Technology Timeline
• 1996 - Log structured merge tree
• 2000 - CAP Theorem
• 2007 - Amazon Dynamo Paper
• 2011 - INRIA CRDT Technical Report
• 2014 - Riak DT map: a composable, convergent
replicated dictionary
Further Reading
• Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area
Storage with COPS
• PNUTS: Yahoo!’s Hosted Data Serving Platform
• F1: A Distributed SQL Database That Scales
• Spanner: Google's Globally-Distributed Database
• The Network is Reliable: An informal survey of real-world communications
failures
• A comprehensive study of Convergent and CommutativeReplicated Data
Types
• Riak DT Map: A Composable, Convergent Replicated Dictionary
Get in Touch
• If you’re interested in cheating the speed of light
• Come use our software
• If you’re interested in solving today’s computer science
problems
• Come work for us
• If you’d like to learn more about distributed systems at
scale
• Maybe you have a better idea
Sargun Dhillon
@Sargun
sdhillon@basho.com
The Case
for
Eventual Consistency

More Related Content

PDF
Why Distributed Databases?
PDF
DC/OS 1.8 Container Networking
PDF
Erlang containers
PDF
Building the Glue for Service Discovery & Load Balancing Microservices
PDF
Erlang User Conference 2016: Container Networking: A Field Report
PDF
Lying, Cheating, and Winning with Containers in Networking
PDF
OpenStack Scale-out Networking Architecture
PPTX
HadoopCon- Trend Micro SPN Hadoop Overview
Why Distributed Databases?
DC/OS 1.8 Container Networking
Erlang containers
Building the Glue for Service Discovery & Load Balancing Microservices
Erlang User Conference 2016: Container Networking: A Field Report
Lying, Cheating, and Winning with Containers in Networking
OpenStack Scale-out Networking Architecture
HadoopCon- Trend Micro SPN Hadoop Overview

What's hot (20)

PDF
Openstack summit 2015
PDF
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
PDF
Way to cloud
PDF
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
PPTX
Building clouds with apache cloudstack apache roadshow 2018
PDF
How to build a winning solution for large scale VDI deployments
PPTX
Neutron scaling
PPTX
Neutron scale
PPTX
Make a Move to the Azure Cloud with SoftNAS
PDF
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
PDF
Reactive Supply To Changing Demand
PPT
Introduction to Apache CloudStack by David Nalley
PDF
Pulling Back the Cloud Curtain
PPTX
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
PDF
Cloud Networking is not Virtual Networking - London VMUG 20130425
PDF
Networking in the Cloud Age (LISA 2012 Tutorial)
PDF
The Next Big Thing: Serverless
PDF
Percona presentation v2
PDF
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
PPTX
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Openstack summit 2015
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Way to cloud
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Building clouds with apache cloudstack apache roadshow 2018
How to build a winning solution for large scale VDI deployments
Neutron scaling
Neutron scale
Make a Move to the Azure Cloud with SoftNAS
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
Reactive Supply To Changing Demand
Introduction to Apache CloudStack by David Nalley
Pulling Back the Cloud Curtain
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Cloud Networking is not Virtual Networking - London VMUG 20130425
Networking in the Cloud Age (LISA 2012 Tutorial)
The Next Big Thing: Serverless
Percona presentation v2
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Ad

Similar to Intro to Databases (20)

PPTX
NoSQL Introduction, Theory, Implementations
PPTX
Data Engineering for Data Scientists
PPTX
NoSQL and Couchbase
PPTX
CS 542 Parallel DBs, NoSQL, MapReduce
PDF
Big data 101 for beginners riga dev days
ODP
Front Range PHP NoSQL Databases
PDF
NoSQL Basics - A Quick Tour
PPTX
Master.pptx
PPTX
osi-oss-dbs.pptx
PPTX
GIDS 2016 Understanding and Building No SQLs
PPTX
UNIT I Introduction to NoSQL.pptx
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
PPTX
UNIT I Introduction to NoSQL.pptx
PDF
CM2-Data model for Big Data chapter2.pdf
PDF
NoSQL overview implementation free
PPT
SQL or NoSQL, that is the question!
PPTX
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
PDF
Scalability, Availability & Stability Patterns
PPT
6269441.ppt
NoSQL Introduction, Theory, Implementations
Data Engineering for Data Scientists
NoSQL and Couchbase
CS 542 Parallel DBs, NoSQL, MapReduce
Big data 101 for beginners riga dev days
Front Range PHP NoSQL Databases
NoSQL Basics - A Quick Tour
Master.pptx
osi-oss-dbs.pptx
GIDS 2016 Understanding and Building No SQLs
UNIT I Introduction to NoSQL.pptx
Big Data & NoSQL - EFS'11 (Pavlo Baron)
UNIT I Introduction to NoSQL.pptx
CM2-Data model for Big Data chapter2.pdf
NoSQL overview implementation free
SQL or NoSQL, that is the question!
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Scalability, Availability & Stability Patterns
6269441.ppt
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Intro to Databases