SlideShare a Scribd company logo
SQL+GPU+SSD=∞
Wasserschwein@Shinagawa
Self Introduction
▌Name: Wasserschwein@Shinagawa
▌PostgreSQL: 9Years (2006~)
▌Works: Security, FDW, etc...
▌Hobby: Mixture of heterogeneous technology
with PostgreSQL
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞2
Very powerful
computing
capability
Very functional
& well-used
database
PG-Strom:
What I’m making
GPGPU
What’s PG-Strom – Brief overview
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞3
▌Core ideas
① GPU native code generation on the fly
② Asynchronous massive parallel execution
▌Advantages
 Transparent acceleration with 100% query compatibility
 Commodity H/W and less system integration cost
Parser
Planner
Executor
Custom-
Scan/Join
Interface
Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid;
PG-Strom
CUDA
driver
nvrtc
DMA Data Transfer
CUDA
Source
code
Massive
Parallel
Execution
Supported Workload – Scan, Join, Aggregation
▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;
 t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded.
▌Environment:
 PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)
 CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞4
0
50
100
150
200
250
300
PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom
2 3 4 5 6 7 8
QueryResponseTime[sec]
# of tables involved
Time consumption per component (PostgreSQL v9.5β vs PG-Strom)
Scan Join Aggregate Others
Next target is I/O acceleration – from TPC/DS results
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞5
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Time consumption per workloads (PostgreSQL v9.5beta+PG-Strom)
Scan Join Aggregate Others
So, How to accelerate I/O stuff by GPU?
NOTICE
The story I like to introduce next is...
Just my Ideaat this moment
......So, I’ll pay my efforts to implement
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞6
A rough x86_64 hardware architecture
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞7
GPU SSD
CPU + RAM CPU + RAM
PCI-E
SAS
Usual I/O bottleneck 
Simplified diagram for introduction
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞8
GPU SSD
CPU + RAM
PCI-E
OK, it’s storage
NVM EXPRESS SSD
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞9
PCI-E direct SSD device – low latency and higher bandwidth
Samsung
SSD 950 PRO
Intel SSD 750
HGST
Ultrastar SN100
Intel
SSD DC P3700
Data Flow in analytic queries
① Data load from storage to CPU/RAM
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞10
GPU SSD
CPU + RAM
PCI-E
Table
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞11
GPU SSD
CPU + RAM
PCI-E
Table
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
③ Remove unreferenced columns (Projection)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞12
GPU SSD
CPU + RAM
PCI-E
Table
 The job of CPU
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
③ Remove unreferenced columns (Projection)
④ Join with other tables (Join)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞13
GPU SSD
CPU + RAM
PCI-E
Table
+
SSD-to-GPU Direct
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞14
Data transfer between SSD
and GPU, bypass CPU/RAM
Also available on NVMe,
not only Fusion-IO
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞15
GPU SSD
CPU + RAM
PCI-E
Table
SSD-to-GPU
Direct
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞16
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞17
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Remove invisible rows
according to the scan
qualifiers
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞18
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Only visible rows are
moved to CPU+RAM
Remove invisible rows
according to the scan
qualifiers
Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞19
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞20
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Remove invisible rows
according to the scan
qualifiers
Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞21
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Only visible rows and
referenced columns are
moved to CPU+RAM
Remove invisible rows
according to the scan
qualifiers
Remove invisible rows
and unreferenced
columns according to
the scan qualifiers
and projection
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞22
GPU SSD
CPU + RAM
PCI-E
GPUcodegenerated
fromSQLonthefly
Innerrelations
(JOINtarget)
Table
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞23
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Innerrelations
(JOINtarget)
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞24
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Innerrelations
(JOINtarget)
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞25
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Tuples are already
joined when it read
data from the storage
Innerrelations
(JOINtarget)
+
Generate
joined tuples
on GPU side
Primitive Technologies
▌NVIDIA GPUDirect enhancement on NVMe device driver
 Interaction between NVMe and NVIDIA drivers are needed
▌Usage statistics of shared_buffers per relations
 To avoid SSDGPU direct on relations that is already preloaded
▌Add new access mode to shared_buffers
 Nobody can make the buffer dirty under the SSDGPU Direct transfer
We are welcome all the developer
who join to PG-Strom project
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞26
Coming Soon?

More Related Content

PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
20150318-SFPUG-Meetup-PGStrom
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
pgconfasia2016 plcuda en
PDF
PG-Strom
PDF
20160407_GTC2016_PgSQL_In_Place
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
20150318-SFPUG-Meetup-PGStrom
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
pgconfasia2016 plcuda en
PG-Strom
20160407_GTC2016_PgSQL_In_Place
GPGPU Accelerates PostgreSQL (English)
Let's turn your PostgreSQL into columnar store with cstore_fdw

What's hot (20)

PDF
PG-Strom - GPU Accelerated Asyncr
PDF
20170602_OSSummit_an_intelligent_storage
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PDF
20201006_PGconf_Online_Large_Data_Processing
PDF
20181212 - PGconfASIA - LT - English
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
PDF
20181016_pgconfeu_ssd2gpu_multi
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
20181025_pgconfeu_lt_gstorefdw
PPTX
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
PPTX
GPGPU programming with CUDA
PDF
20180920_DBTS_PGStrom_EN
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PDF
Easy and High Performance GPU Programming for Java Programmers
PPTX
Debugging CUDA applications
PDF
PostgreSQL with OpenCL
PDF
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PPTX
Parallel K means clustering using CUDA
PG-Strom - GPU Accelerated Asyncr
20170602_OSSummit_an_intelligent_storage
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
20201006_PGconf_Online_Large_Data_Processing
20181212 - PGconfASIA - LT - English
PG-Strom - A FDW module utilizing GPU device
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20181016_pgconfeu_ssd2gpu_multi
20171206 PGconf.ASIA LT gstore_fdw
20181025_pgconfeu_lt_gstorefdw
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
GPGPU programming with CUDA
20180920_DBTS_PGStrom_EN
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Easy and High Performance GPU Programming for Java Programmers
Debugging CUDA applications
PostgreSQL with OpenCL
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
Parallel K means clustering using CUDA
Ad

Viewers also liked (20)

PDF
20170127 JAWS HPC-UG#8
PDF
An Intelligent Storage?
PDF
pgconfasia2016 lt ssd2gpu
PDF
SQL+GPU+SSD=∞ (Japanese)
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
20170310_InDatabaseAnalytics_#1
PDF
Task Parallel Library (TPL)
PDF
TPL Dataflow – зачем и для кого?
PPTX
Task Parallel Library 2014
PDF
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
PDF
TPC-DSから学ぶPostgreSQLの弱点と今後の展望
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PDF
Convolutional Neural Networks (CNN)
PDF
並列クエリを実行するPostgreSQLのアーキテクチャ
PDF
In-Database Analyticsの必要性と可能性
PDF
UX, ethnography and possibilities: for Libraries, Museums and Archives
PDF
Designing Teams for Emerging Challenges
PDF
Visual Design with Data
PDF
3 Things Every Sales Team Needs to Be Thinking About in 2017
PDF
How to Become a Thought Leader in Your Niche
20170127 JAWS HPC-UG#8
An Intelligent Storage?
pgconfasia2016 lt ssd2gpu
SQL+GPU+SSD=∞ (Japanese)
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
20170310_InDatabaseAnalytics_#1
Task Parallel Library (TPL)
TPL Dataflow – зачем и для кого?
Task Parallel Library 2014
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
TPC-DSから学ぶPostgreSQLの弱点と今後の展望
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Networks (CNN)
並列クエリを実行するPostgreSQLのアーキテクチャ
In-Database Analyticsの必要性と可能性
UX, ethnography and possibilities: for Libraries, Museums and Archives
Designing Teams for Emerging Challenges
Visual Design with Data
3 Things Every Sales Team Needs to Be Thinking About in 2017
How to Become a Thought Leader in Your Niche
Ad

Similar to SQL+GPU+SSD=∞ (English) (20)

PDF
20190909_PGconf.ASIA_KaiGai
PDF
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PDF
20181210 - PGconf.ASIA Unconference
PDF
20181116 Massive Log Processing using I/O optimized PostgreSQL
PPTX
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
PDF
SNAP MACHINE LEARNING
PDF
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PDF
Hardware & Software Platforms for HPC, AI and ML
PPTX
Steen_Dissertation_March5
PDF
Deep Dive into GPU Support in Apache Spark 3.x
PDF
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
PPTX
Dpdk applications
PDF
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
PDF
Application Optimisation using OpenPOWER and Power 9 systems
PDF
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
PDF
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
PDF
RAPIDS: GPU-Accelerated ETL and Feature Engineering
PPTX
PDF
Nvidia tesla-k80-overview
20190909_PGconf.ASIA_KaiGai
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PG-Strom v2.0 Technical Brief (17-Apr-2018)
20181210 - PGconf.ASIA Unconference
20181116 Massive Log Processing using I/O optimized PostgreSQL
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
SNAP MACHINE LEARNING
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
Hardware & Software Platforms for HPC, AI and ML
Steen_Dissertation_March5
Deep Dive into GPU Support in Apache Spark 3.x
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Dpdk applications
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Application Optimisation using OpenPOWER and Power 9 systems
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
RAPIDS: GPU-Accelerated ETL and Feature Engineering
Nvidia tesla-k80-overview

More from Kohei KaiGai (20)

PDF
20221116_DBTS_PGStrom_History
PDF
20221111_JPUG_CustomScan_API
PDF
20211112_jpugcon_gpu_and_arrow
PDF
20210928_pgunconf_hll_count
PDF
20210731_OSC_Kyoto_PGStrom3.0
PDF
20210511_PGStrom_GpuCache
PDF
20201128_OSC_Fukuoka_Online_GPUPostGIS
PDF
20201113_PGconf_Japan_GPU_PostGIS
PDF
20200828_OSCKyoto_Online
PDF
20200806_PGStrom_PostGIS_GstoreFdw
PDF
20200424_Writable_Arrow_Fdw
PDF
20191211_Apache_Arrow_Meetup_Tokyo
PDF
20191115-PGconf.Japan
PDF
20190926_Try_RHEL8_NVMEoF_Beta
PDF
20190925_DBTS_PGStrom
PDF
20190516_DLC10_PGStrom
PDF
20190418_PGStrom_on_ArrowFdw
PDF
20190314 PGStrom Arrow_Fdw
PDF
20181212 - PGconf.ASIA - LT
PDF
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20221116_DBTS_PGStrom_History
20221111_JPUG_CustomScan_API
20211112_jpugcon_gpu_and_arrow
20210928_pgunconf_hll_count
20210731_OSC_Kyoto_PGStrom3.0
20210511_PGStrom_GpuCache
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201113_PGconf_Japan_GPU_PostGIS
20200828_OSCKyoto_Online
20200806_PGStrom_PostGIS_GstoreFdw
20200424_Writable_Arrow_Fdw
20191211_Apache_Arrow_Meetup_Tokyo
20191115-PGconf.Japan
20190926_Try_RHEL8_NVMEoF_Beta
20190925_DBTS_PGStrom
20190516_DLC10_PGStrom
20190418_PGStrom_on_ArrowFdw
20190314 PGStrom Arrow_Fdw
20181212 - PGconf.ASIA - LT
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData

Recently uploaded (20)

PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
assetexplorer- product-overview - presentation
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
AutoCAD Professional Crack 2025 With License Key
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
17 Powerful Integrations Your Next-Gen MLM Software Needs
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Reimagine Home Health with the Power of Agentic AI​
assetexplorer- product-overview - presentation
iTop VPN Crack Latest Version Full Key 2025
Monitoring Stack: Grafana, Loki & Promtail
CHAPTER 2 - PM Management and IT Context
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Design an Analysis of Algorithms I-SECS-1021-03
AutoCAD Professional Crack 2025 With License Key
Adobe Illustrator 28.6 Crack My Vision of Vector Design
17 Powerful Integrations Your Next-Gen MLM Software Needs
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Oracle Fusion HCM Cloud Demo for Beginners
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms II-SECS-1021-03
Weekly report ppt - harsh dattuprasad patel.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
iTop VPN Free 5.6.0.5262 Crack latest version 2025

SQL+GPU+SSD=∞ (English)

  • 2. Self Introduction ▌Name: Wasserschwein@Shinagawa ▌PostgreSQL: 9Years (2006~) ▌Works: Security, FDW, etc... ▌Hobby: Mixture of heterogeneous technology with PostgreSQL PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞2 Very powerful computing capability Very functional & well-used database PG-Strom: What I’m making GPGPU
  • 3. What’s PG-Strom – Brief overview PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞3 ▌Core ideas ① GPU native code generation on the fly ② Asynchronous massive parallel execution ▌Advantages  Transparent acceleration with 100% query compatibility  Commodity H/W and less system integration cost Parser Planner Executor Custom- Scan/Join Interface Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid; PG-Strom CUDA driver nvrtc DMA Data Transfer CUDA Source code Massive Parallel Execution
  • 4. Supported Workload – Scan, Join, Aggregation ▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;  t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded. ▌Environment:  PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)  CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞4 0 50 100 150 200 250 300 PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom 2 3 4 5 6 7 8 QueryResponseTime[sec] # of tables involved Time consumption per component (PostgreSQL v9.5β vs PG-Strom) Scan Join Aggregate Others
  • 5. Next target is I/O acceleration – from TPC/DS results PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞5 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Time consumption per workloads (PostgreSQL v9.5beta+PG-Strom) Scan Join Aggregate Others So, How to accelerate I/O stuff by GPU?
  • 6. NOTICE The story I like to introduce next is... Just my Ideaat this moment ......So, I’ll pay my efforts to implement PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞6
  • 7. A rough x86_64 hardware architecture PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞7 GPU SSD CPU + RAM CPU + RAM PCI-E SAS Usual I/O bottleneck 
  • 8. Simplified diagram for introduction PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞8 GPU SSD CPU + RAM PCI-E OK, it’s storage
  • 9. NVM EXPRESS SSD PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞9 PCI-E direct SSD device – low latency and higher bandwidth Samsung SSD 950 PRO Intel SSD 750 HGST Ultrastar SN100 Intel SSD DC P3700
  • 10. Data Flow in analytic queries ① Data load from storage to CPU/RAM PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞10 GPU SSD CPU + RAM PCI-E Table
  • 11. Data Flow in analytic queries ① Data load from storage to CPU/RAM ② Remove invisible rows (Select) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞11 GPU SSD CPU + RAM PCI-E Table
  • 12. Data Flow in analytic queries ① Data load from storage to CPU/RAM ② Remove invisible rows (Select) ③ Remove unreferenced columns (Projection) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞12 GPU SSD CPU + RAM PCI-E Table
  • 13.  The job of CPU Data Flow in analytic queries ① Data load from storage to CPU/RAM ② Remove invisible rows (Select) ③ Remove unreferenced columns (Projection) ④ Join with other tables (Join) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞13 GPU SSD CPU + RAM PCI-E Table +
  • 14. SSD-to-GPU Direct PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞14 Data transfer between SSD and GPU, bypass CPU/RAM Also available on NVMe, not only Fusion-IO
  • 15. Data Flow in analytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞15 GPU SSD CPU + RAM PCI-E Table SSD-to-GPU Direct
  • 16. Data Flow in analytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞16 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct
  • 17. Data Flow in analytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞17 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Remove invisible rows according to the scan qualifiers
  • 18. Data Flow in analytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞18 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Only visible rows are moved to CPU+RAM Remove invisible rows according to the scan qualifiers
  • 19. Data Flow in analytic queries (2/3) – Advanced PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞19 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct
  • 20. Data Flow in analytic queries (2/3) – Advanced PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞20 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Remove invisible rows according to the scan qualifiers
  • 21. Data Flow in analytic queries (2/3) – Advanced PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞21 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Only visible rows and referenced columns are moved to CPU+RAM Remove invisible rows according to the scan qualifiers Remove invisible rows and unreferenced columns according to the scan qualifiers and projection
  • 22. Data Flow in analytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞22 GPU SSD CPU + RAM PCI-E GPUcodegenerated fromSQLonthefly Innerrelations (JOINtarget) Table
  • 23. Data Flow in analytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞23 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Innerrelations (JOINtarget)
  • 24. Data Flow in analytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞24 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Innerrelations (JOINtarget)
  • 25. Data Flow in analytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞25 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Tuples are already joined when it read data from the storage Innerrelations (JOINtarget) + Generate joined tuples on GPU side
  • 26. Primitive Technologies ▌NVIDIA GPUDirect enhancement on NVMe device driver  Interaction between NVMe and NVIDIA drivers are needed ▌Usage statistics of shared_buffers per relations  To avoid SSDGPU direct on relations that is already preloaded ▌Add new access mode to shared_buffers  Nobody can make the buffer dirty under the SSDGPU Direct transfer We are welcome all the developer who join to PG-Strom project PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞26