SlideShare a Scribd company logo
The Glass Half Full
Using Programmable Hardware Accelerators in
Analytical Databases
Zsolt István
IMDEA Software Institute 1
IMDEA Software
Institute
• 16 Faculty in the areas of:
• Program Analysis and Verification
• Languages and Compilers
• Security and Privacy
• Theoretical Computer Science
• Distributed Systems and Databases
• ~10 Post-docs, ~25 PhD Students,
~10 Interns
• Located in UPM Montegancedo Campus,
Madrid
• We are hiring! https://software.imdea.org/
▪ OLAP – Online Analytical Processing
▪ Large datasets – up to TBs
▪ Ad-hoc querying to extract insight, recurring
reporting – Possibly complex operations
▪ Read-mostly workloads, updates in batches
▪ OLTP – Online Transaction Processing
▪ Smaller datasets
▪ Queries known, relate to business actions
▪ Makes heavy use of indexes
▪ Reads and updates intermixed
3
Context: Analytical Databases
4
Databases were a 25 Billion $ market in 2018…
Could we specialize machines to them?
https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/
▪ Fully custom machine for databases
▪ Processors – special ISA microprocessors
▪ Memory – magnetic bubbles and CCDs
▪ Semiconductor technology and
general purpose CPUs took over
5
Database Computer – ’70s
“The first goal is to design it with the
capability of handling a very large on-line
database of 10^10 bytes or beyond since
special-purpose machines are not likely to
be cost-effective for small databases.”
Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases.
IEEE Trans. Computers 28(6): 414-429 (1979)
▪ Based on VAX multi-
processor system
▪ By the time the software
and hardware were
developed, CPUs have
become much faster
▪ Couldn’t keep up with
Moore’s law
6
Gamma Machine – ’80s
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High
Performance Dataflow Database Machine. VLDB 1986: 228-237
7
Data/Compute Gap
CPU Scaling Commodity in Cloud
Specialized
Hardware
Revival
8
Renewed interest in Specialized Hardware
ASICsFPGAsCPUs
Field Programmable Gate Array (FPGA)
▪ Free choice of architecture
▪ Fine-grained pipelining,
communication, distributed memory
▪ Tradeoff: all “code” occupies chip
space
▪ Evolving platform: larger chips, more
heterogeneity
9
Re-programmable Specialized Hardware
Op 1
Op 2
Op 3
10
Integration Options
Accel.
1) On the side 2) In data-path 3) Co-processor
Data
Data Data
Accel.
Accel.
▪ Accelerator
▪ Amazon F1
▪ In data path
▪ Microsoft Catapult
▪ Co-processor
▪ Intel Xeon+FPGA
11
In the Cloud Today
Socket1 Socket2
CPU FPGA
Socket1
CPU FPGA
CPU
FPGA
Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2
12
The Glass Half Empty…
▪ 1) On the side acceleration introduces overhead
▪ Many related work offers no real speedup if we factor in data
movement, transformation, software overhead… 13
The Glass Half Empty…
0
20
40
60
80
100
120
Software With Acceleration
Query execution time
Compute Data Movement
2xAccel.
Data
▪ 2) “All or nothing” behavior makes query planning difficult
▪ Example: fixed capacity hash table on FPGA
▪ Constant time access for reads and writes
▪ What happens if data doesn’t fit?
▪ Can’t always know the number of keys aprioi
14
The Glass Half Empty…
#
▪ 3) Analytical databases becoming more optimized / not much
compute in core SQL
▪ X100 [CIDR05] showed that <10% of compute time spent on SQL
operators +,-,*,SUM,AVG in analytical queries
▪ Columnar stores often memory bound (10s of GB/s)
15
The Glass Half Empty…
▪ On the side acceleration introduces overhead
▪ “All or nothing” behavior makes query planning difficult
▪ Analytical databases becoming more optimized / not much
compute in core SQL
16
The Glass Half Empty…
▪ On the side acceleration introduces overhead
✓ Reduce data movement bottlenecks
17
The Glass Half Full…
▪ IBEX: Database storage engine with processing offload
▪ Filter and pre-aggregate for analytic workloads
18
Processing in data path: Smart Flash
Database Server
IBEX
SSD
IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan
and G. Alonso, VLDB’14
→ Larger bandwidth, more IOPS
(Samsung YourSQL, MIT BlueDBM)
▪ Opportunity to extend SSDs/Flash
with complex offload
Samsung “smart” SSD
19
Processing in data path: Distributed Processing
Workers(Compute)
Storage
+ Provisioning
+ Scalability
Caribou: Distributed
storage with processing
• Specialized HW nodes
• 10Gbps access
• 25W power cons.
Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.
20
Smart Storage in Databases: Filter push-down
Intel Hyperscan library (Xeon E5-2680 v2)
2.8x
SELECT … FROM customer
WHERE age<35 AND purchases>2
AND address LIKE “%PO. Box 123%”
▪ Challenge: guarantee that filtering never slows down retrieval
▪ Algorithms can be re-imagined to become bandwidth-bound
instead of compute-bound
▪ Extend the state of the art: parameterization without re-programming [FCCM16]
▪ Many options: Regular expressions, comparisons, decompression, …
[FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16
✓ Reduce data movement bottlenecks
▪ “All or nothing” behavior makes query planning difficult
✓ Hybrid processing
21
The Glass Half Full…
▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
22
IBEX’s Hybrid Group-by
CPUIbex with SW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data
▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
23
IBEX’s Hybrid Group-by
CPUIbex with HW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data
CPUIbex with HW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data
▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
▪ If number of groups does not fit on FPGA?
▪ Send partial aggregates – finalize in SW
▪ Worst case: same as no acceleration
▪ Best- case: All in HW!
24
IBEX’s Hybrid Group-by
CPUIbex with Hybrid Group-by
Input table Projection Selection Group-by Group-by
Final
Group
s
Filtered
data
Partial
Group
s
Challenge: How to split across accelerator and software?
✓ Reduce data movement bottlenecks
✓ Hybrid Processing
▪ Analytical databases becoming more optimized / not much
compute in core SQL
✓ Emerging compute-intensive workloads
25
The Glass Half Full
▪ Databases adopting new ways of analyzing the data
▪ SAP Hana, Oracle, SQL Server, etc.
▪ Specialized hardware can help both with model building [Kara18],
inference [Owaida18]
▪ Benefits for “classical” algorithms as well
[Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018)
[Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300
26
The Rise of Machine Learning
CPUFPGA Co-processor
27
doppioDB: a hybrid database engine
Database
Engine
(MonetDB)
Hardware
Operator
Software
operator
Software
operator
Hardware
Operator
Hardware
Operator
▪ Goal: extend the capabilities of analytical databases
▪ FPGA works on the same data as software (cache-coherent access)
▪ Can combine SW and HW operators inside the same query
▪ Challenge: ensure high utilization of FPGA, use in many queries
DRAM (DB Tables)
No data copy,
transformation,
partitioning, etc.
Hardware
Operator
K-means – Algorithm
◼ Goal: partition unlabeled data into several
clusters, where the number of clusters is
the “k” in the k-means.
◼ Two steps in each iteration:
◼ Assignment: assign data points to
closet centroid according to distance
metric
◼ Centroid update: the centroids are re-
calculated by averaging all the data
points within each cluster
◼ Long process if the data set and number of
iterations are large
28
DRAM
(DB Tables)
Design – Execution Walk-Through
Receives K-Means parameters1
Fetch the initial centroids and
the data
2
3 Calculates the distance between
a data point and all the centroids
and assign it to closest centroid
4 Accumulates data points per cluster and
counts how many data points are assigned to
each cluster
Collect partial results from each pipeline5
Division for updating new centroid6
Writes back the final results7
1
2
3 4
56
7
Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 2018
29
30
Uses of Parallelism
K is known /
Centroids
known
Need to determine K
(Elbow method)
▪ K-Means algorithm
▪ FPGA outperforms several cores of the CPU
▪ Can use parallelism in two ways – cover more queries
▪ Text: Regular expression matching, Edit distance, …
▪ Database ops.: Skyline queries, Group-by aggregations, …
▪ Statistics: Histograms, Count-min sketch, Bloom filters, …
▪ Machine learning: Clustering (K-means), Stochastic Gradient
Descent, Decision Trees, …
▪ Data management: Hash tables, hash functions, …
▪ [Your algorithm here]
31
Wide range of algorithms can benefit from hardware
✓ Reduce data movement bottlenecks
✓ Hybrid Processing
✓ Emerging compute-intensive workloads
32
The Glass Half Full…
Future Challenges…
▪ Managing Programmable Hardware accelerators
▪ Is this the job of the OS or does the DB has to take control?
▪ How to share programmable hardware across tenants
▪ Compilation/synthesis of hardware accelerators
▪ Can we derive accelerators from user queries?
▪ Intermediary DSL or building blocks we could use?
For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering
Bulletin, March 2019.

More Related Content

PPTX
Hadoop and Netezza - Co-existence or Competition?
PDF
Greenplum Architecture
PDF
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
PDF
ODI11g, Hadoop and "Big Data" Sources
PDF
Spark Driven Big Data Analytics
PPTX
PDF
Payment Gateway Live hadoop project
Hadoop and Netezza - Co-existence or Competition?
Greenplum Architecture
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
ODI11g, Hadoop and "Big Data" Sources
Spark Driven Big Data Analytics
Payment Gateway Live hadoop project

What's hot (20)

PPTX
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
PPTX
سکوهای ابری و مدل های برنامه نویسی در ابر
PPTX
Comparison of MPP Data Warehouse Platforms
PPTX
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
PPTX
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
PPTX
BDM8 - Near-realtime Big Data Analytics using Impala
PDF
Organising for Data Success
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Hadoop Overview & Architecture
 
PPTX
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
PDF
Girish Juneja - Intel Big Data & Cloud Summit 2013
PDF
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
PPTX
Big Data on the Microsoft Platform
PDF
Asd 2015
PPTX
Gpu computing workshop
PDF
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 
PPTX
Big data vahidamiri-tabriz-13960226-datastack.ir
PPT
Case Study Real Time Olap Cubes
DOCX
Hotel inspection data set analysis copy
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
سکوهای ابری و مدل های برنامه نویسی در ابر
Comparison of MPP Data Warehouse Platforms
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
BDM8 - Near-realtime Big Data Analytics using Impala
Organising for Data Success
Supporting Financial Services with a More Flexible Approach to Big Data
Hadoop Overview & Architecture
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
Girish Juneja - Intel Big Data & Cloud Summit 2013
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Big Data on the Microsoft Platform
Asd 2015
Gpu computing workshop
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Case Study Real Time Olap Cubes
Hotel inspection data set analysis copy
Overview of Big data, Hadoop and Microsoft BI - version1
Ad

Similar to A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases (20)

PDF
Challenges and Opportunities of FPGA Acceleration in Big Data
PDF
Wolfgang Lehner Technische Universitat Dresden
PPT
Databases.ppt
PPT
Data Management and Databases.ppt
PPT
Databases_Presentation.ppt
PPT
Data Management and Databases Presentation
PDF
Intel_Swarm64 Solution Brief
PDF
Query Evaluation Techniques for Large Databases.pdf
PDF
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
PDF
Database Modeling for Industrial Data Management Emerging Technologies and Ap...
PDF
Computerassisted Query Formulation Alvin Cheung Armando Solarlezama
PDF
Accelerating analytics in a new era of data
PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
Database System Concepts 6th Edition, (Ebook PDF)
PDF
Modern OLAP Databases CMU Advanced Databases
PDF
Database Modeling for Industrial Data Management Emerging Technologies and Ap...
PPT
Database Management System Processing.ppt
PDF
Vectorization whitepaper
PDF
Database System Concepts 6th Edition, (Ebook PDF)
PDF
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Challenges and Opportunities of FPGA Acceleration in Big Data
Wolfgang Lehner Technische Universitat Dresden
Databases.ppt
Data Management and Databases.ppt
Databases_Presentation.ppt
Data Management and Databases Presentation
Intel_Swarm64 Solution Brief
Query Evaluation Techniques for Large Databases.pdf
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
Database Modeling for Industrial Data Management Emerging Technologies and Ap...
Computerassisted Query Formulation Alvin Cheung Armando Solarlezama
Accelerating analytics in a new era of data
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Database System Concepts 6th Edition, (Ebook PDF)
Modern OLAP Databases CMU Advanced Databases
Database Modeling for Industrial Data Management Emerging Technologies and Ap...
Database Management System Processing.ppt
Vectorization whitepaper
Database System Concepts 6th Edition, (Ebook PDF)
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Ad

More from Facultad de Informática UCM (20)

PDF
¿Por qué debemos seguir trabajando en álgebra lineal?
PDF
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
PDF
DRAC: Designing RISC-V-based Accelerators for next generation Computers
PDF
uElectronics ongoing activities at ESA
PDF
Tendencias en el diseño de procesadores con arquitectura Arm
PDF
Formalizing Mathematics in Lean
PDF
Introduction to Quantum Computing and Quantum Service Oriented Computing
PPTX
Computer Design Concepts for Machine Learning
PDF
Inteligencia Artificial en la atención sanitaria del futuro
PDF
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
PDF
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
PPTX
Fault-tolerance Quantum computation and Quantum Error Correction
PDF
Cómo construir un chatbot inteligente sin morir en el intento
PDF
Automatic generation of hardware memory architectures for HPC
PDF
Type and proof structures for concurrency
PDF
Hardware/software security contracts: Principled foundations for building sec...
PDF
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
PDF
Do you trust your artificial intelligence system?
PDF
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
PDF
Challenges and Opportunities for AI and Data analytics in Offshore wind
¿Por qué debemos seguir trabajando en álgebra lineal?
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
DRAC: Designing RISC-V-based Accelerators for next generation Computers
uElectronics ongoing activities at ESA
Tendencias en el diseño de procesadores con arquitectura Arm
Formalizing Mathematics in Lean
Introduction to Quantum Computing and Quantum Service Oriented Computing
Computer Design Concepts for Machine Learning
Inteligencia Artificial en la atención sanitaria del futuro
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Fault-tolerance Quantum computation and Quantum Error Correction
Cómo construir un chatbot inteligente sin morir en el intento
Automatic generation of hardware memory architectures for HPC
Type and proof structures for concurrency
Hardware/software security contracts: Principled foundations for building sec...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Do you trust your artificial intelligence system?
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Challenges and Opportunities for AI and Data analytics in Offshore wind

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PPT on Performance Review to get promotions
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
additive manufacturing of ss316l using mig welding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Well-logging-methods_new................
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
DOCX
573137875-Attendance-Management-System-original
CYBER-CRIMES AND SECURITY A guide to understanding
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Geodesy 1.pptx...............................................
Foundation to blockchain - A guide to Blockchain Tech
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT on Performance Review to get promotions
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
OOP with Java - Java Introduction (Basics)
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
R24 SURVEYING LAB MANUAL for civil enggi
additive manufacturing of ss316l using mig welding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Well-logging-methods_new................
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
573137875-Attendance-Management-System-original

A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases

  • 1. The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt István IMDEA Software Institute 1
  • 2. IMDEA Software Institute • 16 Faculty in the areas of: • Program Analysis and Verification • Languages and Compilers • Security and Privacy • Theoretical Computer Science • Distributed Systems and Databases • ~10 Post-docs, ~25 PhD Students, ~10 Interns • Located in UPM Montegancedo Campus, Madrid • We are hiring! https://software.imdea.org/
  • 3. ▪ OLAP – Online Analytical Processing ▪ Large datasets – up to TBs ▪ Ad-hoc querying to extract insight, recurring reporting – Possibly complex operations ▪ Read-mostly workloads, updates in batches ▪ OLTP – Online Transaction Processing ▪ Smaller datasets ▪ Queries known, relate to business actions ▪ Makes heavy use of indexes ▪ Reads and updates intermixed 3 Context: Analytical Databases
  • 4. 4 Databases were a 25 Billion $ market in 2018… Could we specialize machines to them? https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/
  • 5. ▪ Fully custom machine for databases ▪ Processors – special ISA microprocessors ▪ Memory – magnetic bubbles and CCDs ▪ Semiconductor technology and general purpose CPUs took over 5 Database Computer – ’70s “The first goal is to design it with the capability of handling a very large on-line database of 10^10 bytes or beyond since special-purpose machines are not likely to be cost-effective for small databases.” Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases. IEEE Trans. Computers 28(6): 414-429 (1979)
  • 6. ▪ Based on VAX multi- processor system ▪ By the time the software and hardware were developed, CPUs have become much faster ▪ Couldn’t keep up with Moore’s law 6 Gamma Machine – ’80s David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237
  • 7. 7 Data/Compute Gap CPU Scaling Commodity in Cloud Specialized Hardware Revival
  • 8. 8 Renewed interest in Specialized Hardware ASICsFPGAsCPUs
  • 9. Field Programmable Gate Array (FPGA) ▪ Free choice of architecture ▪ Fine-grained pipelining, communication, distributed memory ▪ Tradeoff: all “code” occupies chip space ▪ Evolving platform: larger chips, more heterogeneity 9 Re-programmable Specialized Hardware Op 1 Op 2 Op 3
  • 10. 10 Integration Options Accel. 1) On the side 2) In data-path 3) Co-processor Data Data Data Accel. Accel.
  • 11. ▪ Accelerator ▪ Amazon F1 ▪ In data path ▪ Microsoft Catapult ▪ Co-processor ▪ Intel Xeon+FPGA 11 In the Cloud Today Socket1 Socket2 CPU FPGA Socket1 CPU FPGA CPU FPGA Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2
  • 12. 12 The Glass Half Empty…
  • 13. ▪ 1) On the side acceleration introduces overhead ▪ Many related work offers no real speedup if we factor in data movement, transformation, software overhead… 13 The Glass Half Empty… 0 20 40 60 80 100 120 Software With Acceleration Query execution time Compute Data Movement 2xAccel. Data
  • 14. ▪ 2) “All or nothing” behavior makes query planning difficult ▪ Example: fixed capacity hash table on FPGA ▪ Constant time access for reads and writes ▪ What happens if data doesn’t fit? ▪ Can’t always know the number of keys aprioi 14 The Glass Half Empty… #
  • 15. ▪ 3) Analytical databases becoming more optimized / not much compute in core SQL ▪ X100 [CIDR05] showed that <10% of compute time spent on SQL operators +,-,*,SUM,AVG in analytical queries ▪ Columnar stores often memory bound (10s of GB/s) 15 The Glass Half Empty…
  • 16. ▪ On the side acceleration introduces overhead ▪ “All or nothing” behavior makes query planning difficult ▪ Analytical databases becoming more optimized / not much compute in core SQL 16 The Glass Half Empty…
  • 17. ▪ On the side acceleration introduces overhead ✓ Reduce data movement bottlenecks 17 The Glass Half Full…
  • 18. ▪ IBEX: Database storage engine with processing offload ▪ Filter and pre-aggregate for analytic workloads 18 Processing in data path: Smart Flash Database Server IBEX SSD IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan and G. Alonso, VLDB’14 → Larger bandwidth, more IOPS (Samsung YourSQL, MIT BlueDBM) ▪ Opportunity to extend SSDs/Flash with complex offload Samsung “smart” SSD
  • 19. 19 Processing in data path: Distributed Processing Workers(Compute) Storage + Provisioning + Scalability Caribou: Distributed storage with processing • Specialized HW nodes • 10Gbps access • 25W power cons. Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.
  • 20. 20 Smart Storage in Databases: Filter push-down Intel Hyperscan library (Xeon E5-2680 v2) 2.8x SELECT … FROM customer WHERE age<35 AND purchases>2 AND address LIKE “%PO. Box 123%” ▪ Challenge: guarantee that filtering never slows down retrieval ▪ Algorithms can be re-imagined to become bandwidth-bound instead of compute-bound ▪ Extend the state of the art: parameterization without re-programming [FCCM16] ▪ Many options: Regular expressions, comparisons, decompression, … [FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16
  • 21. ✓ Reduce data movement bottlenecks ▪ “All or nothing” behavior makes query planning difficult ✓ Hybrid processing 21 The Glass Half Full…
  • 22. ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department 22 IBEX’s Hybrid Group-by CPUIbex with SW-only Group-By Projection Selection Group-by Final Group s Input table Filtered data
  • 23. ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department 23 IBEX’s Hybrid Group-by CPUIbex with HW-only Group-By Projection Selection Group-by Final Group s Input table Filtered data
  • 24. CPUIbex with HW-only Group-By Projection Selection Group-by Final Group s Input table Filtered data ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department ▪ If number of groups does not fit on FPGA? ▪ Send partial aggregates – finalize in SW ▪ Worst case: same as no acceleration ▪ Best- case: All in HW! 24 IBEX’s Hybrid Group-by CPUIbex with Hybrid Group-by Input table Projection Selection Group-by Group-by Final Group s Filtered data Partial Group s Challenge: How to split across accelerator and software?
  • 25. ✓ Reduce data movement bottlenecks ✓ Hybrid Processing ▪ Analytical databases becoming more optimized / not much compute in core SQL ✓ Emerging compute-intensive workloads 25 The Glass Half Full
  • 26. ▪ Databases adopting new ways of analyzing the data ▪ SAP Hana, Oracle, SQL Server, etc. ▪ Specialized hardware can help both with model building [Kara18], inference [Owaida18] ▪ Benefits for “classical” algorithms as well [Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018) [Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300 26 The Rise of Machine Learning
  • 27. CPUFPGA Co-processor 27 doppioDB: a hybrid database engine Database Engine (MonetDB) Hardware Operator Software operator Software operator Hardware Operator Hardware Operator ▪ Goal: extend the capabilities of analytical databases ▪ FPGA works on the same data as software (cache-coherent access) ▪ Can combine SW and HW operators inside the same query ▪ Challenge: ensure high utilization of FPGA, use in many queries DRAM (DB Tables) No data copy, transformation, partitioning, etc. Hardware Operator
  • 28. K-means – Algorithm ◼ Goal: partition unlabeled data into several clusters, where the number of clusters is the “k” in the k-means. ◼ Two steps in each iteration: ◼ Assignment: assign data points to closet centroid according to distance metric ◼ Centroid update: the centroids are re- calculated by averaging all the data points within each cluster ◼ Long process if the data set and number of iterations are large 28
  • 29. DRAM (DB Tables) Design – Execution Walk-Through Receives K-Means parameters1 Fetch the initial centroids and the data 2 3 Calculates the distance between a data point and all the centroids and assign it to closest centroid 4 Accumulates data points per cluster and counts how many data points are assigned to each cluster Collect partial results from each pipeline5 Division for updating new centroid6 Writes back the final results7 1 2 3 4 56 7 Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 2018 29
  • 30. 30 Uses of Parallelism K is known / Centroids known Need to determine K (Elbow method) ▪ K-Means algorithm ▪ FPGA outperforms several cores of the CPU ▪ Can use parallelism in two ways – cover more queries
  • 31. ▪ Text: Regular expression matching, Edit distance, … ▪ Database ops.: Skyline queries, Group-by aggregations, … ▪ Statistics: Histograms, Count-min sketch, Bloom filters, … ▪ Machine learning: Clustering (K-means), Stochastic Gradient Descent, Decision Trees, … ▪ Data management: Hash tables, hash functions, … ▪ [Your algorithm here] 31 Wide range of algorithms can benefit from hardware
  • 32. ✓ Reduce data movement bottlenecks ✓ Hybrid Processing ✓ Emerging compute-intensive workloads 32 The Glass Half Full… Future Challenges… ▪ Managing Programmable Hardware accelerators ▪ Is this the job of the OS or does the DB has to take control? ▪ How to share programmable hardware across tenants ▪ Compilation/synthesis of hardware accelerators ▪ Can we derive accelerators from user queries? ▪ Intermediary DSL or building blocks we could use? For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering Bulletin, March 2019.