SlideShare a Scribd company logo
Introducing TiDB
Kevin Xu (@pingcap; @kevinsxu)
Liu Tang (@siddontang)
August 20, 2018
● History and Community
● Technical Walkthrough
● Use Case with Mobike
● Live Demo: TiDB on GCP with Kubernetes
● Q&A
Agenda
A Little About PingCAP...
● Founded in April 2015 by 3 infrastructure engineers
● TiDB platform: (Ti = Titanium)
○ TiDB (stateless SQL layer compatible with MySQL)
○ TiKV (distributed transactional key-value store)
○ TiSpark (Apache Spark plug-in on top of TiKV)
○ Placement Driver (metadata cluster)
● Open source from Day 1
○ Inspired by Google Spanner / F1
○ GA 1.0: October 2017
○ GA 2.0: April 2018
● Hybrid OLTP & OLAP (Minimize ETL)
● Horizontal Scalability (Designed for infinity...)
● MySQL Compatible
● Distributed Transaction (ACID Compliant)
● High Availability
● Cloud-Native
○ *Just open-sourced TiDB-Operator leveraging Kubernetes*
○ On InfoWorld:
https://www.infoworld.com/article/3297700/kubernetes/introducing-the-kubernetes-operator-for-tidb.html
TiDB Core Features
2018 PingCAP
Stars
● TiDB: 14,500+
● TiKV: 3500+
Contributors
● TiDB: 195+
● TiKV: 80+
Community
Architecture
Architecture
PD PD
PD
TiDB and TiSpark (Compute)
TiDB: OLTP + Ad Hoc OLAP
Node1 Node2 Node3 Node4
MySQL Network Protocol
SQL Parser
Cost-based Optimizer
Distributed Executor (Coprocessor)
ODBC/JDBC MySQL Client
Any ORM which
supports MySQL
TiDB
TiKV
ID Name Email
1 Edward h@pingcap.com
2 Tom tom@pingcap.com
...
user/1 Edward,h@pingcap.com
user/2 Tom,tom@pingcap.com
...
In TiKV -∞
+∞
(-∞, +∞)
Sorted map
“User” Table
TiDB: Relational -> KV
Some region...
Index Structure
Row:
Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned by TiDB, all int64)
Value: [col1, col2, col3, col4]
Index:
Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID
Value: [null]
Keys are ordered by byte array in TiKV, so can support SCAN
Every key is appended a timestamp, issued by Placement Driver
TiSpark: Complex OLAP
Spark ExecSpark Exec
Spark Driver
Spark Exec
TiKV TiKV TiKV TiKV
TiSpark
TiSpark TiSpark TiSpark
TiKV
Placement
Driver (PD)
gRPC
Distributed Storage Layer
gRPC
retrieve data location
retrieve real data from TiKV
● Complex calculation pushdown
● Key-range pruning
● Index support:
○ Clustered index / non-clustered index
○ Index-only query optimization
● Cost-based optimization:
○ Stats gathered from TiDB in histogram
TiSpark: Features
● Hash Join (fastest; if table <= 50 million rows)
● Sort Merge Join (join on indexed column or ordered data
source)
● Index Lookup Join (join on indexed column; ideally after filter,
result < 10,000 rows)
Chosen based on Cost-base Optimizer:
Join Support
Network cost Memory cost CPU cost
TiKV and Placement Driver (Storage)
TiKV: The Foundation
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
Raft
Group
Client
gRPC
TiKV Instance TiKV Instance TiKV Instance
gRPC gRPC
PD Cluster
SQL -> Parser -> Coprocessor
PD: Dynamic Split and Merge
Region A
Region A
Region B
Region A
Region A
Region B
Split
Region A
Region A
Region B
Merge
TiKV_1 TiKV_2 TiKV_2TiKV_1
PD: Hotspot Removal
*Region A*
*Region B*
Region A
Region B
Workload
*Region A*
Region B
Region A
*Region B*
Workload
Workload
Hotspot Schedule
(Raft leader transfer)
TiKV_1 TiKV_2
TiKV_2TiKV_1
Geo-Replication
*Region A*
Region B
Region A
Region B
Seattle_1 Seattle_2
Region A
*Region B*
New York_1
*Region A*
Region B
Region A
*Region B*
Seattle_2Seattle_1
Region A
Region B
New York_1
● Timestamp Oracle service (from Google’s Percolator paper)
● 2-Phase commit protocol (2PC)
● Problem: Single point of failure
● Solution: Placement Driver HA cluster
○ Replicated using Raft
Transaction Model
● Formal proof using TLA+
○ a formal specification and verification language to reason about and prove
aspects of complex systems
○ Raft
○ TSO/Percolator
○ 2PC
● See details: https://github.com/pingcap/tla-plus
Guaranteeing Correctness
TiKV -> CNCF (To Be Announced…)
Who’s Using TiDB?
2018 PingCAP
Who’s using TiDB?
200+
Companies
2018 PingCAP
1. MySQL Scalability
2. Hybrid OLTP/OLAP Architecture
Two Major Use Cases
Mobike + TiDB
● 200 million users
● 200 cities
● 9 milllion smart bikes
● ~30 TB / day
● Locking and unlocking of smart bikes generate massive data
● Smooth experience is key to user retention
● TiDB supports this system by alerting administrators when
success rate of locking/unlocking drops, within minutes
● Quickly find malfunctioning bikes
Scenario #1: Locking/Unlocking
● Synchronize TiDB with MySQL
instances using Syncer (proprietary
tool)
● TiDB + TiSpark empower real-time
analysis with horizontal scalability
● No need for Hadoop + Hive
Scenario #2: Real-Time Analysis
● An innovative loyalty program that must
be on 24 x 7 x 356
● TiDB handles:
○ High-concurrency for peak or promotional season
○ Permanent storage
○ Horizontal scalability
● No interruption as business evolves
Scenario #3: Mobike Store
Test, Use, Contribute!
Thank You!
Twitter: @PingCAP; @kevinsxu; @siddontang
Kevin Xu (kevin@pingcap.com); Liu Tang (tl@pingcap.com)

More Related Content

PDF
TiDB + Mobike by Kevin Xu (@kevinsxu)
PDF
TiDB Introduction - San Francisco MySQL Meetup
PDF
TiDB Introduction
PDF
"Smooth Operator" [Bay Area NewSQL meetup]
PDF
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PDF
TiDB DevCon 2020 Opening Keynote
PDF
TiDB as an HTAP Database
PDF
TiDB for Big Data
TiDB + Mobike by Kevin Xu (@kevinsxu)
TiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction
"Smooth Operator" [Bay Area NewSQL meetup]
The Dark Side Of Go -- Go runtime related problems in TiDB in production
TiDB DevCon 2020 Opening Keynote
TiDB as an HTAP Database
TiDB for Big Data

What's hot (20)

PDF
A Brief Introduction of TiDB (Percona Live)
PDF
Golang in TiDB (GopherChina 2017)
PDF
Scale Relational Database with NewSQL
PDF
Rust in TiKV
PDF
Introducing TiDB - Percona Live Frankfurt
PPTX
Building a transactional key-value store that scales to 100+ nodes (percona l...
PDF
How to build TiDB
PDF
TiDB at PayPay
PDF
Kafka as an Eventing System to Replatform a Monolith into Microservices
PDF
Elasticsearch as a time series database
PDF
Introducing MagnetoDB, a key-value storage sevice for OpenStack
PDF
Evolving ALLSTOCKER: Agile increments with Pharo Smalltalk
PPTX
Webinar: Building a multi-cloud Kubernetes storage on GitLab
PDF
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
PDF
Head in the clouds @ bol.com
PPTX
MariaDB Encryption using AWS Key Management Service
PDF
Presto talk @ Global AI conference 2018 Boston
PDF
Best Practices for Scaling an InfluxEnterprise Cluster
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
PDF
Initial presentation of openstack (for montreal user group)
A Brief Introduction of TiDB (Percona Live)
Golang in TiDB (GopherChina 2017)
Scale Relational Database with NewSQL
Rust in TiKV
Introducing TiDB - Percona Live Frankfurt
Building a transactional key-value store that scales to 100+ nodes (percona l...
How to build TiDB
TiDB at PayPay
Kafka as an Eventing System to Replatform a Monolith into Microservices
Elasticsearch as a time series database
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Evolving ALLSTOCKER: Agile increments with Pharo Smalltalk
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Head in the clouds @ bol.com
MariaDB Encryption using AWS Key Management Service
Presto talk @ Global AI conference 2018 Boston
Best Practices for Scaling an InfluxEnterprise Cluster
Presto Summit 2018 - 09 - Netflix Iceberg
Initial presentation of openstack (for montreal user group)
Ad

Similar to Introducing TiDB @ SF DevOps Meetup (20)

PDF
TiDB Introduction - Boston MySQL Meetup Group
PDF
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
PDF
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
PDF
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
PDF
FOSDEM MySQL and Friends Devroom
PDF
When Apache Spark Meets TiDB with Xiaoyu Ma
PDF
Keynote -- Percona Live Europe 2018
PDF
Data-at-scale-with-TIDB Mydbops Co-Founder Kabilesh PR at LSPE Event
PDF
Introducing TiDB Operator [Cologne, Germany]
PDF
Introducing TiDB Operator
PDF
TiDB vs Aurora.pdf
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PDF
TiDB in a Nutshell - Power of Open-Source Distributed SQL Database - Mydbops
PDF
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
PDF
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
PDF
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PDF
Lambda at Weather Scale - Cassandra Summit 2015
PDF
TiDB - From Data to Discovery: Exploring the Intersection of Distributed Dat...
TiDB Introduction - Boston MySQL Meetup Group
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
FOSDEM MySQL and Friends Devroom
When Apache Spark Meets TiDB with Xiaoyu Ma
Keynote -- Percona Live Europe 2018
Data-at-scale-with-TIDB Mydbops Co-Founder Kabilesh PR at LSPE Event
Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator
TiDB vs Aurora.pdf
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
TiDB in a Nutshell - Power of Open-Source Distributed SQL Database - Mydbops
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Introduction to Apache Tajo: Data Warehouse for Big Data
The Future of Fast Databases: Lessons from a Decade of QuestDB
Lambda at Weather Scale - Cassandra Summit 2015
TiDB - From Data to Discovery: Exploring the Intersection of Distributed Dat...
Ad

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding

Introducing TiDB @ SF DevOps Meetup

  • 1. Introducing TiDB Kevin Xu (@pingcap; @kevinsxu) Liu Tang (@siddontang) August 20, 2018
  • 2. ● History and Community ● Technical Walkthrough ● Use Case with Mobike ● Live Demo: TiDB on GCP with Kubernetes ● Q&A Agenda
  • 3. A Little About PingCAP... ● Founded in April 2015 by 3 infrastructure engineers ● TiDB platform: (Ti = Titanium) ○ TiDB (stateless SQL layer compatible with MySQL) ○ TiKV (distributed transactional key-value store) ○ TiSpark (Apache Spark plug-in on top of TiKV) ○ Placement Driver (metadata cluster) ● Open source from Day 1 ○ Inspired by Google Spanner / F1 ○ GA 1.0: October 2017 ○ GA 2.0: April 2018
  • 4. ● Hybrid OLTP & OLAP (Minimize ETL) ● Horizontal Scalability (Designed for infinity...) ● MySQL Compatible ● Distributed Transaction (ACID Compliant) ● High Availability ● Cloud-Native ○ *Just open-sourced TiDB-Operator leveraging Kubernetes* ○ On InfoWorld: https://www.infoworld.com/article/3297700/kubernetes/introducing-the-kubernetes-operator-for-tidb.html TiDB Core Features
  • 5. 2018 PingCAP Stars ● TiDB: 14,500+ ● TiKV: 3500+ Contributors ● TiDB: 195+ ● TiKV: 80+ Community
  • 8. TiDB and TiSpark (Compute)
  • 9. TiDB: OLTP + Ad Hoc OLAP Node1 Node2 Node3 Node4 MySQL Network Protocol SQL Parser Cost-based Optimizer Distributed Executor (Coprocessor) ODBC/JDBC MySQL Client Any ORM which supports MySQL TiDB TiKV
  • 10. ID Name Email 1 Edward h@pingcap.com 2 Tom tom@pingcap.com ... user/1 Edward,h@pingcap.com user/2 Tom,tom@pingcap.com ... In TiKV -∞ +∞ (-∞, +∞) Sorted map “User” Table TiDB: Relational -> KV Some region...
  • 11. Index Structure Row: Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned by TiDB, all int64) Value: [col1, col2, col3, col4] Index: Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID Value: [null] Keys are ordered by byte array in TiKV, so can support SCAN Every key is appended a timestamp, issued by Placement Driver
  • 12. TiSpark: Complex OLAP Spark ExecSpark Exec Spark Driver Spark Exec TiKV TiKV TiKV TiKV TiSpark TiSpark TiSpark TiSpark TiKV Placement Driver (PD) gRPC Distributed Storage Layer gRPC retrieve data location retrieve real data from TiKV
  • 13. ● Complex calculation pushdown ● Key-range pruning ● Index support: ○ Clustered index / non-clustered index ○ Index-only query optimization ● Cost-based optimization: ○ Stats gathered from TiDB in histogram TiSpark: Features
  • 14. ● Hash Join (fastest; if table <= 50 million rows) ● Sort Merge Join (join on indexed column or ordered data source) ● Index Lookup Join (join on indexed column; ideally after filter, result < 10,000 rows) Chosen based on Cost-base Optimizer: Join Support Network cost Memory cost CPU cost
  • 15. TiKV and Placement Driver (Storage)
  • 16. TiKV: The Foundation RocksDB Raft Transaction Txn KV API Coprocessor API RocksDB Raft Transaction Txn KV API Coprocessor API RocksDB Raft Transaction Txn KV API Coprocessor API Raft Group Client gRPC TiKV Instance TiKV Instance TiKV Instance gRPC gRPC PD Cluster
  • 17. SQL -> Parser -> Coprocessor
  • 18. PD: Dynamic Split and Merge Region A Region A Region B Region A Region A Region B Split Region A Region A Region B Merge TiKV_1 TiKV_2 TiKV_2TiKV_1
  • 19. PD: Hotspot Removal *Region A* *Region B* Region A Region B Workload *Region A* Region B Region A *Region B* Workload Workload Hotspot Schedule (Raft leader transfer) TiKV_1 TiKV_2 TiKV_2TiKV_1
  • 20. Geo-Replication *Region A* Region B Region A Region B Seattle_1 Seattle_2 Region A *Region B* New York_1 *Region A* Region B Region A *Region B* Seattle_2Seattle_1 Region A Region B New York_1
  • 21. ● Timestamp Oracle service (from Google’s Percolator paper) ● 2-Phase commit protocol (2PC) ● Problem: Single point of failure ● Solution: Placement Driver HA cluster ○ Replicated using Raft Transaction Model
  • 22. ● Formal proof using TLA+ ○ a formal specification and verification language to reason about and prove aspects of complex systems ○ Raft ○ TSO/Percolator ○ 2PC ● See details: https://github.com/pingcap/tla-plus Guaranteeing Correctness
  • 23. TiKV -> CNCF (To Be Announced…)
  • 25. 2018 PingCAP Who’s using TiDB? 200+ Companies
  • 26. 2018 PingCAP 1. MySQL Scalability 2. Hybrid OLTP/OLAP Architecture Two Major Use Cases
  • 27. Mobike + TiDB ● 200 million users ● 200 cities ● 9 milllion smart bikes ● ~30 TB / day
  • 28. ● Locking and unlocking of smart bikes generate massive data ● Smooth experience is key to user retention ● TiDB supports this system by alerting administrators when success rate of locking/unlocking drops, within minutes ● Quickly find malfunctioning bikes Scenario #1: Locking/Unlocking
  • 29. ● Synchronize TiDB with MySQL instances using Syncer (proprietary tool) ● TiDB + TiSpark empower real-time analysis with horizontal scalability ● No need for Hadoop + Hive Scenario #2: Real-Time Analysis
  • 30. ● An innovative loyalty program that must be on 24 x 7 x 356 ● TiDB handles: ○ High-concurrency for peak or promotional season ○ Permanent storage ○ Horizontal scalability ● No interruption as business evolves Scenario #3: Mobike Store
  • 31. Test, Use, Contribute! Thank You! Twitter: @PingCAP; @kevinsxu; @siddontang Kevin Xu (kevin@pingcap.com); Liu Tang (tl@pingcap.com)