SlideShare a Scribd company logo
SPANNER
Google’s globally distributed database
INTRODUCTION
     .
INTRODUCTION



•   Built and Deployed at Google
•   Scalable
•   Multi-version
•   Globally distributed
•   Synchronously-replicated
OVERVIEW



•   General Purpose Transactions (ACID)
•   Directory Placement
•   SQL query language
•   Schematized tables, Semi-relational data model
SPECIAL FEATURES



• Lock-free distributed read transactions
• External consistency of distributed transactions
• Integration of concurrency control, replication,
  and 2PC
• Interval-based global time – TrueTime – GPS and
  atomic clock powered
• More control to applications
EXAMPLE – SOCIAL NETWORK
SINGLE MACHINE
MULTIPLE MACHINES
MULTIPLE DATACENTERS
IMPLEMENTATION
      .
SERVER ORGANISATION
TRUETIME & CONCURRENCY
Synchronizing Snapshots
GLOBAL CONSISTENCY



 ‘As a distributed-systems developer, you’re taught
from — I want to say childhood — not to trust time.
 What we did is find a way that we could trust time
   — and understand what it meant to trust time.’
 ‘We wanted something that we were confident in.
   It’s a time reference that’s owned by Google.’
                   — Andrew Fikes
IMPLEMENTATION



• Set of time master machines per data center
• A time slave daemon per machine
• Most masters have GPS, Armageddon masters
  have atomic clocks
GLOBAL CONSISTENCY



•   Global wall-clock time == External Consistency
•   Commit order respects global wall-time order
•   Timestamp order respects global wall-time order
•   Given that timestamp order == commit order
TIMESTAMPS – GLOBAL CLOCK
TIMESTAMP INVARIANTS
TRUETIME
TIMESTAMPS & TRUETIME
COMMIT WAIT & REPLICATION
PAXOS PROTOCOL



• Used in situations requiring durability
  (replicating a file or database)
• Makes progress even during periods of partial
  unresponsiveness
• Roles : Client, Acceptor (Voters), Proposer,
  Learner, Leader
EVALUATION
    .
WHAT I THINK OF THE SYSTEM
THE END

More Related Content

PDF
An Overview of Spanner: Google's Globally Distributed Database
PDF
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
PPTX
Corbett osdi12 slides (1)
PDF
Google Spanner : our understanding of concepts and implications
PPTX
Spanner osdi2012
PDF
Try Cloud Spanner
PDF
MapReduce basics
PDF
The Google File System (GFS)
An Overview of Spanner: Google's Globally Distributed Database
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
Corbett osdi12 slides (1)
Google Spanner : our understanding of concepts and implications
Spanner osdi2012
Try Cloud Spanner
MapReduce basics
The Google File System (GFS)

What's hot (20)

PPT
Google File System
PDF
Seminar Report on Google File System
PDF
Instaclustr introduction to managing cassandra
PPTX
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
PPTX
Google file system
PPTX
Google File Systems
PDF
Cassandra background-and-architecture
PPTX
GOOGLE FILE SYSTEM
PPTX
Google file system
PDF
Cassandra overview
PDF
Instaclustr Apache Cassandra Best Practices & Toubleshooting
PDF
Cassandra - A Decentralized Structured Storage System
PPT
Pacemaker+DRBD
PDF
Cassandra 101
PDF
Cassandra: Open Source Bigtable + Dynamo
PPTX
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
ODP
Hadoop2
PDF
Hadoop-2.6.0 Slides
PDF
gfs-sosp2003
Google File System
Seminar Report on Google File System
Instaclustr introduction to managing cassandra
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Google file system
Google File Systems
Cassandra background-and-architecture
GOOGLE FILE SYSTEM
Google file system
Cassandra overview
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Cassandra - A Decentralized Structured Storage System
Pacemaker+DRBD
Cassandra 101
Cassandra: Open Source Bigtable + Dynamo
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Hadoop2
Hadoop-2.6.0 Slides
gfs-sosp2003
Ad

Similar to Spanner - Google distributed database (20)

PDF
Google Spanner
PPTX
Spanner
PPTX
PPTX
Spanner: Google’s Globally-Distributed Database
PDF
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
PPTX
Spanner (may 19)
PDF
Cloud spanner architecture and use cases
PPTX
Spanner : Google' s Globally Distributed Database
PPTX
Dissecting Scalable Database Architectures
PDF
Cloud Spanner
PDF
Google's Infrastructure and Specific IoT Services
ODP
Beyond TrueTime
DOCX
Spanner Google’s Globally-Distributed DatabaseJames C. Corbett,.docx
PPT
Gfs google-file-system-13331
PPT
Dynamo.ppt
PPT
Dynamo.ppt
PPT
Gfs介绍
PPTX
Timeline: An Operating System Abstraction for Time-Aware Applications
PPT
Distributed computing seminar lecture 3 - distributed file systems
PPT
Lec3 Dfs
Google Spanner
Spanner
Spanner: Google’s Globally-Distributed Database
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
Spanner (may 19)
Cloud spanner architecture and use cases
Spanner : Google' s Globally Distributed Database
Dissecting Scalable Database Architectures
Cloud Spanner
Google's Infrastructure and Specific IoT Services
Beyond TrueTime
Spanner Google’s Globally-Distributed DatabaseJames C. Corbett,.docx
Gfs google-file-system-13331
Dynamo.ppt
Dynamo.ppt
Gfs介绍
Timeline: An Operating System Abstraction for Time-Aware Applications
Distributed computing seminar lecture 3 - distributed file systems
Lec3 Dfs
Ad

More from Abhra Basak (8)

PPTX
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
PPTX
Concurrency in java
PPTX
Privacy Preservation Issues in Association Rule Mining in Horizontally Partit...
PPTX
Introduction to XML
PPTX
DADAGIRI - The Fire Within
PPTX
Usability evaluation of the IIT Mandi Website
PPTX
Course Recommender
ODP
National Stock Exchange and Nasdaq 100
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
Concurrency in java
Privacy Preservation Issues in Association Rule Mining in Horizontally Partit...
Introduction to XML
DADAGIRI - The Fire Within
Usability evaluation of the IIT Mandi Website
Course Recommender
National Stock Exchange and Nasdaq 100

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx

Spanner - Google distributed database

Editor's Notes

  • #4: Storage for Google’s ad data – Designed to replace a sharded MySQL database – F1, a rewrite of Google’s advertising backendDesigned to scale up to millions of machines, hundreds of data centersWrite transactions use strict 2PL, each assigned a timestamp, each version automatically timestamped with commit time – timestamps reflect serialisation orderData replicated across continentsAutomatically reshards & migrates data across machines to balance load / respond to failures
  • #5: As simple databaseBucketing structures called directories – unit of data placement – movement of data between Paxos groupsNoSQL is out, NewSQL is in, inspired by Dremel, an interactive analysis tool. SQL with extensions to support protocol buffered value fieldsAssociations depicted using table hierarchies – Directory Table (Users), Interleaved Table (Albums)Not purely relational – Each table must have primary key – Implemented using key-value stores, database partitioned by clients into one or more hierarchies of tables
  • #6: Lock-freeExternally consistent reads and writes, and globally-consistent reads across the database at a timestamp - Globally-meaningful commit timestamps to transactionsCorrectness and performanceEnabling technologyApplications can specify constraints to control which datacenters contain which data, how far data is from its users (to control read latency), how far replicas are from each other (to control write latency), and how many replicas are maintained (to control durability, availability, and read performance). Data can also be dynamically and transparently moved between datacenters by the system to balance resource usage across datacenters.
  • #12: Universe – A spanner deployment (Test/Playground, Development/Production, Production-only)Zone – Rough analog of deployment of BigTable servers – unit of physical isolationA zone has one zonemaster, 100 to several thousand spanservers – zone server serve data to spanserver, which in turn serve to clientsLocation proxy – used by clients to locate spanserversUniverse master – console for active debugging, and placement driver- Automated movement of data across zones (are singletons)Software Stack:Each spanserver responsible for 100-1000 data structures called tabletsB-tree files, write-ahead log, distributed filesystem called Colossus
  • #14: Google’s cluster-management software provides an implementation of the TrueTime API – An interval with bounded time uncertaintyTrueTime uses two forms of time reference because they have different failure modes. GPS reference-source vulnerabilities include antenna and receiver failures, local radio interference, correlated failures (e.g., design faults such as incorrect leap second handling and spoofing), and GPS system outages. Atomic clocks can fail in ways uncorrelated to GPS and each other, and over long periods of time can drift significantly due to frequency error.
  • #15: All masters’ time references are regularly compared against each other. Each master also cross-checks the rate at which its reference advances time against its own local clock, and evicts itself if there is substantial divergence
  • #16: Implement features such as externally consistent transactions, lock free read-only transactions, and non-blocking reads in the pastThese features enable, for example, the guarantee that a whole-database audit read at a timestamp t will see exactly the effects of every transaction that has committed as of t.The Spanner implementation supports readwrite transactions, read-only transactions (predeclared snapshot-isolation transactions), and snapshot reads.A snapshot read is a read in the past that executes without locking
  • #17: Strict two-phase locking for write transactionsAssign timestamp while locks are held
  • #18: Strict two-phase locking for write transactionsAssign timestamp while locks are heldSpanner also enforces the following external consistency invariant: if the start of a transaction T2 occurs after the commit of a transaction T1, then the commit timestamp of T2 must be greater than the commit timestamp of T1
  • #19: “Global wall-clock time” with bounded uncertainty
  • #20: Between synchronizations, a daemon advertises a slowly increasing time uncertainty. e is derived from conservatively applied worst-case local clock drift - Also depends on time-master uncertainty and communication delay to the time masters.S is the time of invocation of event
  • #21: “Global wall-clock time” with bounded uncertainty
  • #22: The Paxos family of protocols includes a spectrum of trade-offs between the number of processors, number of message delays before learning the agreed value, the activity level of individual participants, number of messages sent, and types of failures. Although no deterministic fault-tolerant consensus protocol can guarantee progress in an asynchronous network (a result proved in a paper by Fischer, Lynch and Paterson), Paxos guarantees safety (freedom from inconsistency), and the conditions that could prevent it from making progress are difficult to provoke.Paxos is normally used in situations requiring durability (for example, to replicate a file or a database), in which the amount of durable state could be large. The protocol attempts to make progress even during periods when some bounded number of replicas are unresponsive. However, a reconfiguration mechanism is available, and can be used to drop a permanently failed replica, or to add new replicas to the group.Client The Client issues a request to the distributed system, and waits for a response. For instance, a write request on a file in a distributed file server. Acceptor (Voters) The Acceptors act as the fault-tolerant "memory" of the protocol. Acceptors are collected into groups called Quorums. Any message sent to an Acceptor must be sent to a Quorum of Acceptors. Any message received from an Acceptor is ignored unless a copy is received from each Acceptor in a Quorum. Proposer A Proposer advocates a client request, attempting to convince the Acceptors to agree on it, and acting as a coordinator to move the protocol forward when conflicts occur. Learner Learners act as the replication factor for the protocol. Once a Client request has been agreed on by the Acceptors, the Learner may take action (i.e.: execute the request and send a response to the client). To improve availability of processing, additional Learners can be added. Leader Paxos requires a distinguished Proposer (called the leader) to make progress. Many processes may believe they are leaders, but the protocol only guarantees progress if one of them is eventually chosen. If two processes believe they are leaders, they may stall the protocol by continuously proposing conflicting updates. However, the safety properties are still preserved on that case.
  • #24: Spanner is a creation so large, some have trouble wrapping their heads around it. But the end result is easily explained: With Spanner, Google can offer a web service to a worldwide audience, but still ensure that something happening on the service in one part of the world doesn’t contradict what’s happening in another