SlideShare a Scribd company logo
When Apache Spark meets TiDB
=> TiSpark
maxiaoyu@pingcap.com
Who am I
● Shawn Ma@PingCAP
● Tech Lead of OLAP Team
● Working on OLAP related products and features
● Previously tech lead of Big Data infra team@Netease
● Focus on SQL on Hadoop and Big Data related stuff
Agenda
● A little bit about TiDB / TiKV
● What is TiSpark
● Architecture
● Benefit
● What’s Next
What’s TiDB
● Open source distributed RDBMS
● Inspired by Google Spanner
● Horizontal Scalability
● ACID Transaction
● High Availability
● Auto-Failover
● SQL at scale
● Widely used in different industries, including Internet, Gaming,
Banking, Finance, Manufacture and so on (200+ users)
A little bit about TiDB and TiKV
TiKV TiKV TiKV TiKV
Raft Raft Raft
TiDB TiDB TiDB
... ......
... ...
Placement
Driver (PD)
Control flow:
Balance / Failover
Metadata / Timestamp request
Stateless SQL Layer
Distributed Storage Layer
gRPC
gRPC
gRPC
TiKV: The whole picture
Client
Store 1
Region 1
Region 3
Region 5
Region 4
Store 3
Region 3
Region 5
Region 2
Store 2
Region 1
Region 3
Region 2
Region 4
Store 4
Region 1
Region 5
Region 2
Region 4
RPC RPC RPC RPC
TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4
Placement
Driver
PD 1
PD 2
PD 3
Raft
Group
TiKV is powered by RocksDB
What is TiSpark
● TiSpark = Spark SQL on TiKV
○ Spark SQL directly on top of a distributed Database
Storage
● Hybrid Transactional/Analytical Processing (HTAP) rocks
○ Provide strong OLAP capacity together with TiDB
What is TiSpark
● Complex Calculation Pushdown
● Key Range pruning
● Index support
○ Clustered index / Non-Clustered index
○ Index Only Query
● Cost Based Optimization
○ Histogram
○ Pick up right Access Path
Spark Exec
Architecture
Spark Exec
Spark Driver
Spark Exec
TiKV TiKV TiKV TiKV
TiSpark
TiSpark TiSpark TiSpark
TiKV
Placement
Driver (PD)
gRPC
Distributed Storage Layer
gRPC
retrieve data location
retrieve real data from TiKV
Architecture
● On Spark Driver
○ Translate metadata from TiDB into Spark meta info
○ Transform Spark SQL logical plan, pick up elements to be
leverage by storage (TiKV) and rewrite the plan
○ Locate Data based on Region info from Placement Driver
and split partitions;
● On Spark Executor
○ Encode Spark SQL plan into TiKV’s coprocessor request
○ Decode TiKV / Coprocessor result and transform result
into Spark SQL Rows
How everything made possible
● Extension points for Spark SQL Internal
● Extra Strategies allow us to inject our own physical executor and that’s
what we leveraged for TiSpark
● Trying best to keep Spark Internal untouched to avoid compatibility issue
How everything made possible
● A fat java client module, paying the price of bypassing TiDB
○ Parsing Schema, Type system, encoding / decoding, coprocessor
○ Almost full featured TiKV client (without write support for now)
○ Predicates / Index - Key Range related logic
○ Aggregates pushdown related
○ Limit, Order, Stats related
● A thin layer inside Spark SQL
○ TiStrategy for Spark SQL plan transformation
○ And other utilities for mapping things from Spark SQL to TiKV
client library
○ Physical Operators like IndexScan
○ Thin enough for not bothering much of compatibility with Spark
SQL
Too Abstract? Let’s get concrete
select class, avg(score) from student
WHERE school = ‘engineering’ and lottery(name) = ‘picked’
and studentId >= 8000 and studentId < 10100
group by class ;
● Above is a table on TiDB named student
● Clustered index on StudentId and a secondary index on
School column
● Lottery is an Spark SQL UDF which pick up a name and
output ‘picked’ if RNG decided so
Construct Tasks
Predicates Processing
WHERE school = ‘engineering’ and lottery(name) = ‘picked’
and studentId >= 8000 and studentId < 10100
Region 1
StudentId
[0-5000)
Region 2
StudentId
[5000-10000)
Region 3
StudentId
[10000-15000)
StudentId >= 8000 StudentId < 10100 Key Range: [8000, 10100)
Predicates are converted into key ranges based on indexes
Spark Task 1
Region2
[8000, 10000)
COP Request
Spark Task 2
Region3
[10000, 10100)
COP Request
1. Append remaining predicates if supported by
coprocessor
2. Push back whatever needs to be computed by Spark
SQL, e.g. UDFs, prefix index predicates
3. Cut them into tasks according to Region/Range
4. Encode into coprocessor request
gRPC via Spark worker
school = ‘engineering’
School = ‘engineering’
Lottery(name) = ‘picked’
WHERE school = ‘engineering’ and lottery(name) = ‘picked’
and (studentId >= 8000 and studentId < 10100)
Index Scan
TiKV Region Data TiKV Region Data TiKV Region Data
Index Data for student_school Row Data for student
Executor
[1,5) 5,7,9 10 88Batch Scan for index according
to predicates range
Sort and cut row keys into
ranges according to
Key range in region
● Secondary Index is encode as key-value pair
○ Key is comparable bytes format of
all index keys in defined order
○ Value is the row ID pointing to table
row data
● Reading data via Secondary Index usually
requires a double read.
○ First, read secondary index in range
just like reading primary keys in
previous slide.
○ Shuffle Row IDs according to region
○ Sort all row IDs retrieved and
combine them into ranges if
possible
○ Encoding row IDs into row keys for
the table
○ Send those mini requests in batch
concurrently
● Optimize away second read operation
○ If all required column covered by
index itself already
1,2,3,4,5,7,8,10,88
Executor
Index Selection
WHERE school = ‘engineering’ and lottery(name) = ‘picked’
and (studentId >= 8000 and studentId < 10100) or studentId in
(10323, 10327)
Histogram
Clustered Index on
StudentID +
predicates related
StudentId matched
Secondary Index on
School +
predicates related
School matched
1k Rows
800 Rows
1K * Clustered Index
Access Cost
<
800 * Secondary
Index Access Cost
● If the columns referred are all covered by index, then instead of retrieving
actual rows, we apply index only query and cost function will be different
● If histogram not exists, TiSpark using pseudo selection logic.
Construct Schema Transformation Rules
TiDB has totally different type system and infer rules
Aggregates Processing
select class, avg(score) from student
…….
group by class ;
Region 1
StudentId
[0-5000)
Region 2
StudentId
[5000-10000)
Region 3
StudentId
[10000-15000)
AVG(score) Group BY class
AVG are rewritten into SUM and COUNT
Map Task 1 Map Task 2
gRPC via Spark worker
Spark SQL plan received in TiStrategy
SUM(score) / COUNT(score) Group BY class
Spark Schema by its own type infer rules
[SUM, COUNT, class]
TiKV Schema to Spark Schema
[groupBy keys as bytes, SUM as Decimal, COUNT as BigInt ]
Reduce Task 1 Reduce Task 2
● After coprocessor preprocessing,
TiSpark still rely on normal Spark
aggregation strategy
Benefit
● Analytical / Transactional support all on one platform
○ No need for ETL and query data in real-time
○ High throughput and consistent snapshot read from
database
○ Simplify your platform and reduce maintenance cost
● Embrace Apache Spark and its eco-system
○ Support of complex transformation and analytics
beyond SQL
○ Cooperate with other projects in eco-system (like
Apache Zeppelin)
○ Apache Spark bridges your data sources
Ease of Use
● Working on your existing Spark Cluster
○ Just a single jar like other Spark connector
● Workable as standalone application, spark-shell,
thrift-server, pyspark and R
● Work just like another data source
val ti = new org.apache.spark.sql.TiContext(spark)
// Map all TiDB tables from database tpch as Spark SQL tables
ti.tidbMapDatabase("sampleDB")
spark.sql("select count(*) from sampleTable").show
What’s Next
● Batch Write Support (writing directly as TiKV native
format)
● JSON Type support (since TiDB already supported)
● Partition Table support (both Range and Hash)
● Join optimization based on range and partition table
● (Maybe) Join Reorder with TiDB’s own Histogram
● Another separate columnar storage project using Spark
as its execution engine (not released yet)
Thanks!
Contact me:
maxiaoyu@pingcap.com
www.pingcap.com
https://github.com/pingcap/tispark
https://github.com/pingcap/tidb
https://github.com/pingcap/tikv

More Related Content

PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Cosco: An Efficient Facebook-Scale Shuffle Service
PPTX
RDB開発者のためのApache Cassandra データモデリング入門
PDF
Presto on YARNの導入・運用
PDF
TiDB Introduction
PPTX
Apache airflow
PPTX
PostgreSQL and CockroachDB SQL
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Cosco: An Efficient Facebook-Scale Shuffle Service
RDB開発者のためのApache Cassandra データモデリング入門
Presto on YARNの導入・運用
TiDB Introduction
Apache airflow
PostgreSQL and CockroachDB SQL

What's hot (20)

PDF
Etsy Activity Feeds Architecture
PDF
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
PDF
Spark shuffle introduction
PDF
Spark SQL Join Improvement at Facebook
PDF
Presto As A Service - Treasure DataでのPresto運用事例
PPTX
Apache hive
PPTX
Apache Spark Architecture
PDF
Apache Airflow
PPTX
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
PDF
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
PDF
Building an open data platform with apache iceberg
PPTX
Airflow presentation
PDF
Making Ceph fast in the face of failure
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
Log Structured Merge Tree
PDF
Apache Nifi Crash Course
PDF
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
Etsy Activity Feeds Architecture
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
Spark shuffle introduction
Spark SQL Join Improvement at Facebook
Presto As A Service - Treasure DataでのPresto運用事例
Apache hive
Apache Spark Architecture
Apache Airflow
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Building an open data platform with apache iceberg
Airflow presentation
Making Ceph fast in the face of failure
Building a SIMD Supported Vectorized Native Engine for Spark SQL
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Apache Tez: Accelerating Hadoop Query Processing
Apache Iceberg - A Table Format for Hige Analytic Datasets
Log Structured Merge Tree
Apache Nifi Crash Course
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
Ad

Similar to When Apache Spark Meets TiDB with Xiaoyu Ma (20)

PDF
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
PDF
Scale Relational Database with NewSQL
PDF
TiDB Introduction - San Francisco MySQL Meetup
PDF
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
PDF
TiDB for Big Data
PDF
TiDB Introduction - Boston MySQL Meetup Group
PDF
TiDB as an HTAP Database
PDF
SparkSQL: A Compiler from Queries to RDDs
PDF
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
PDF
A Brief Introduction of TiDB (Percona Live)
PDF
Argus Production Monitoring at Salesforce
PDF
Argus Production Monitoring at Salesforce
PDF
Introducing TiDB @ SF DevOps Meetup
PDF
Big Data processing with Apache Spark
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
PDF
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
PDF
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
PDF
TiDB vs Aurora.pdf
PDF
Fast federated SQL with Apache Calcite
PDF
Introducing TiDB - Percona Live Frankfurt
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Scale Relational Database with NewSQL
TiDB Introduction - San Francisco MySQL Meetup
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
TiDB for Big Data
TiDB Introduction - Boston MySQL Meetup Group
TiDB as an HTAP Database
SparkSQL: A Compiler from Queries to RDDs
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
A Brief Introduction of TiDB (Percona Live)
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
Introducing TiDB @ SF DevOps Meetup
Big Data processing with Apache Spark
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
TiDB vs Aurora.pdf
Fast federated SQL with Apache Calcite
Introducing TiDB - Percona Live Frankfurt
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Introduction to Business Data Analytics.
PPT
Quality review (1)_presentation of this 21
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Foundation of Data Science unit number two notes
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
STUDY DESIGN details- Lt Col Maksud (21).pptx
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Business Data Analytics.
Quality review (1)_presentation of this 21
Clinical guidelines as a resource for EBP(1).pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Major-Components-ofNKJNNKNKNKNKronment.pptx
Foundation of Data Science unit number two notes
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Launch Your Data Science Career in Kochi – 2025

When Apache Spark Meets TiDB with Xiaoyu Ma

  • 1. When Apache Spark meets TiDB => TiSpark maxiaoyu@pingcap.com
  • 2. Who am I ● Shawn Ma@PingCAP ● Tech Lead of OLAP Team ● Working on OLAP related products and features ● Previously tech lead of Big Data infra team@Netease ● Focus on SQL on Hadoop and Big Data related stuff
  • 3. Agenda ● A little bit about TiDB / TiKV ● What is TiSpark ● Architecture ● Benefit ● What’s Next
  • 4. What’s TiDB ● Open source distributed RDBMS ● Inspired by Google Spanner ● Horizontal Scalability ● ACID Transaction ● High Availability ● Auto-Failover ● SQL at scale ● Widely used in different industries, including Internet, Gaming, Banking, Finance, Manufacture and so on (200+ users)
  • 5. A little bit about TiDB and TiKV TiKV TiKV TiKV TiKV Raft Raft Raft TiDB TiDB TiDB ... ...... ... ... Placement Driver (PD) Control flow: Balance / Failover Metadata / Timestamp request Stateless SQL Layer Distributed Storage Layer gRPC gRPC gRPC
  • 6. TiKV: The whole picture Client Store 1 Region 1 Region 3 Region 5 Region 4 Store 3 Region 3 Region 5 Region 2 Store 2 Region 1 Region 3 Region 2 Region 4 Store 4 Region 1 Region 5 Region 2 Region 4 RPC RPC RPC RPC TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4 Placement Driver PD 1 PD 2 PD 3 Raft Group TiKV is powered by RocksDB
  • 7. What is TiSpark ● TiSpark = Spark SQL on TiKV ○ Spark SQL directly on top of a distributed Database Storage ● Hybrid Transactional/Analytical Processing (HTAP) rocks ○ Provide strong OLAP capacity together with TiDB
  • 8. What is TiSpark ● Complex Calculation Pushdown ● Key Range pruning ● Index support ○ Clustered index / Non-Clustered index ○ Index Only Query ● Cost Based Optimization ○ Histogram ○ Pick up right Access Path
  • 9. Spark Exec Architecture Spark Exec Spark Driver Spark Exec TiKV TiKV TiKV TiKV TiSpark TiSpark TiSpark TiSpark TiKV Placement Driver (PD) gRPC Distributed Storage Layer gRPC retrieve data location retrieve real data from TiKV
  • 10. Architecture ● On Spark Driver ○ Translate metadata from TiDB into Spark meta info ○ Transform Spark SQL logical plan, pick up elements to be leverage by storage (TiKV) and rewrite the plan ○ Locate Data based on Region info from Placement Driver and split partitions; ● On Spark Executor ○ Encode Spark SQL plan into TiKV’s coprocessor request ○ Decode TiKV / Coprocessor result and transform result into Spark SQL Rows
  • 11. How everything made possible ● Extension points for Spark SQL Internal ● Extra Strategies allow us to inject our own physical executor and that’s what we leveraged for TiSpark ● Trying best to keep Spark Internal untouched to avoid compatibility issue
  • 12. How everything made possible ● A fat java client module, paying the price of bypassing TiDB ○ Parsing Schema, Type system, encoding / decoding, coprocessor ○ Almost full featured TiKV client (without write support for now) ○ Predicates / Index - Key Range related logic ○ Aggregates pushdown related ○ Limit, Order, Stats related ● A thin layer inside Spark SQL ○ TiStrategy for Spark SQL plan transformation ○ And other utilities for mapping things from Spark SQL to TiKV client library ○ Physical Operators like IndexScan ○ Thin enough for not bothering much of compatibility with Spark SQL
  • 13. Too Abstract? Let’s get concrete select class, avg(score) from student WHERE school = ‘engineering’ and lottery(name) = ‘picked’ and studentId >= 8000 and studentId < 10100 group by class ; ● Above is a table on TiDB named student ● Clustered index on StudentId and a secondary index on School column ● Lottery is an Spark SQL UDF which pick up a name and output ‘picked’ if RNG decided so
  • 14. Construct Tasks Predicates Processing WHERE school = ‘engineering’ and lottery(name) = ‘picked’ and studentId >= 8000 and studentId < 10100 Region 1 StudentId [0-5000) Region 2 StudentId [5000-10000) Region 3 StudentId [10000-15000) StudentId >= 8000 StudentId < 10100 Key Range: [8000, 10100) Predicates are converted into key ranges based on indexes Spark Task 1 Region2 [8000, 10000) COP Request Spark Task 2 Region3 [10000, 10100) COP Request 1. Append remaining predicates if supported by coprocessor 2. Push back whatever needs to be computed by Spark SQL, e.g. UDFs, prefix index predicates 3. Cut them into tasks according to Region/Range 4. Encode into coprocessor request gRPC via Spark worker school = ‘engineering’ School = ‘engineering’ Lottery(name) = ‘picked’
  • 15. WHERE school = ‘engineering’ and lottery(name) = ‘picked’ and (studentId >= 8000 and studentId < 10100) Index Scan TiKV Region Data TiKV Region Data TiKV Region Data Index Data for student_school Row Data for student Executor [1,5) 5,7,9 10 88Batch Scan for index according to predicates range Sort and cut row keys into ranges according to Key range in region ● Secondary Index is encode as key-value pair ○ Key is comparable bytes format of all index keys in defined order ○ Value is the row ID pointing to table row data ● Reading data via Secondary Index usually requires a double read. ○ First, read secondary index in range just like reading primary keys in previous slide. ○ Shuffle Row IDs according to region ○ Sort all row IDs retrieved and combine them into ranges if possible ○ Encoding row IDs into row keys for the table ○ Send those mini requests in batch concurrently ● Optimize away second read operation ○ If all required column covered by index itself already 1,2,3,4,5,7,8,10,88 Executor
  • 16. Index Selection WHERE school = ‘engineering’ and lottery(name) = ‘picked’ and (studentId >= 8000 and studentId < 10100) or studentId in (10323, 10327) Histogram Clustered Index on StudentID + predicates related StudentId matched Secondary Index on School + predicates related School matched 1k Rows 800 Rows 1K * Clustered Index Access Cost < 800 * Secondary Index Access Cost ● If the columns referred are all covered by index, then instead of retrieving actual rows, we apply index only query and cost function will be different ● If histogram not exists, TiSpark using pseudo selection logic.
  • 17. Construct Schema Transformation Rules TiDB has totally different type system and infer rules Aggregates Processing select class, avg(score) from student ……. group by class ; Region 1 StudentId [0-5000) Region 2 StudentId [5000-10000) Region 3 StudentId [10000-15000) AVG(score) Group BY class AVG are rewritten into SUM and COUNT Map Task 1 Map Task 2 gRPC via Spark worker Spark SQL plan received in TiStrategy SUM(score) / COUNT(score) Group BY class Spark Schema by its own type infer rules [SUM, COUNT, class] TiKV Schema to Spark Schema [groupBy keys as bytes, SUM as Decimal, COUNT as BigInt ] Reduce Task 1 Reduce Task 2 ● After coprocessor preprocessing, TiSpark still rely on normal Spark aggregation strategy
  • 18. Benefit ● Analytical / Transactional support all on one platform ○ No need for ETL and query data in real-time ○ High throughput and consistent snapshot read from database ○ Simplify your platform and reduce maintenance cost ● Embrace Apache Spark and its eco-system ○ Support of complex transformation and analytics beyond SQL ○ Cooperate with other projects in eco-system (like Apache Zeppelin) ○ Apache Spark bridges your data sources
  • 19. Ease of Use ● Working on your existing Spark Cluster ○ Just a single jar like other Spark connector ● Workable as standalone application, spark-shell, thrift-server, pyspark and R ● Work just like another data source val ti = new org.apache.spark.sql.TiContext(spark) // Map all TiDB tables from database tpch as Spark SQL tables ti.tidbMapDatabase("sampleDB") spark.sql("select count(*) from sampleTable").show
  • 20. What’s Next ● Batch Write Support (writing directly as TiKV native format) ● JSON Type support (since TiDB already supported) ● Partition Table support (both Range and Hash) ● Join optimization based on range and partition table ● (Maybe) Join Reorder with TiDB’s own Histogram ● Another separate columnar storage project using Spark as its execution engine (not released yet)