SlideShare a Scribd company logo
GPU version of PostGIS and GiST-Index
~A new horizon of geospatial data analytics~
HeteroDB,Inc
Chief Architect & CEO
KaiGai Kohei <kaigai@heterodb.com>
about Us
 KaiGai Kohei (海外浩平)
 Chief Architect & CEO of HeteroDB
 Contributor of PostgreSQL (2006-)
 Primary Developer of PG-Strom (2012-)
 Interested in: Big-data, GPU, NVME/PMEM, ...
about Myself
about HeteroDB
 Established: 4th-Jul-2017
 Location: Shinagawa, Tokyo, Japan
 Businesses:
✓ Development of high-performance data-processing
software on top of heterogeneous architecture.
✓ Technology consulting service on GPU&DB area.
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
2
PG-Strom
What is PG-Strom?
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
3
 Adds custom-scan/join/aggregate path with GPU execution
 SQL to GPU binary code generation & Just-in-time compilation
 Designed for IoT/M2M grade log-data processing
 Closely connected NVME-SSD and GPU using P2P data transfer [I/O acceleration]
 Geospatial analytics with GPU-version of PostGIS and GiST-index support
PG-Strom: An extension of PostgreSQL to pull out maximum capability of
GPU and NVME for processing of terabytes scale data
App
GPU
off-loading
for IoT/Big-Data
for ML/Analytics
PG-Strom’s Features
➢ JIT of GPU code with SQL &
Transparent SQL acceleration
➢ SSD-to-GPU Direct SQL
➢ Columnar Store (Arrow_Fdw)
➢ Asymmetric Partition-wise
JOIN/GROUP BY
➢ GPU Memory Store
➢ PostGIS support
➢ GiST-Index on GPU
➢ BRIN-Index support
➢ NVME-over-Fabric support
➢ Data-frame exchange for
Python scripts
NEW
GPU’s overview
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
4
Over 10years history in HPC, then massive popularization in Machine-Learning
NVIDIA Tesla V100
Super Computer
(TITEC; TSUBAME3.0) Computer Graphics Machine-Learning
More than thousands cores and TB/s grade memory bandwidth on a chip
for highly computing intensive workloads.
Simulation
Target: Fast data search of mobile devices
▌Mobile devices
 Location (Longitude, Latitude) is updated very frequently.
 Its log data often contains (device_id, timestamp, location (point), other attributes)
▌Area definitions
 Relatively small items, and almost static data.
 Polygon often has very complicated form, thus heavy “collision detection”.
▌Purpose
 Area marketing, Logistics analytics, Advertisement delivery, Emergency Alert, etc…
Latest Location
(Point)
Area definition
(Polygon)
Mobile device
Extract mobile devices
within the target areas
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
5
GPS
GPU-version PostGIS
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
6
about PostGIS (1/2)
 Extension to add geometry type, functions and operators.
 First release at 2005, then contentious development over 15 years.
 More than 400 functions / operators
 R-tree on GiST-index framework
 Parallel query execution
© GAIA RESOURCES © OSGeo © KTGIS.net
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
7
about PostGIS (2/2)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
8
St_Distance(a,b)
St_Crosses(a,b)
St_Contains(a,b)
PostGIS’s optimization (1/2)
▌Bounding Box
 The least rectangle that contains a polygon (that is often very complicated).
 (x1,y1) - (x2,y2) form by FP32 (= 16bytes)
 PostGIS assigns a bounding-box when it stores geometry values
 It allows to skip heavy operations prior to geolocational operations
obviously
disjointed
Tokyo
Germany
France
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
9
PostGIS’s optimization (2/2)
▌R-Tree on GiST-Index
 GiST (Generalized Search Tree) - A framework of index at PostgreSQL
 PostGIS implements R-tree for Geometry type
 Check for contains (‘@’ operator), overlaps (‘&&’ operator)
 Works efficiently to check relationship between very large number of points
and many polygons.
# Data published by Geospatial Information Authority of Japan
$ shp2pgsql N03-20_200101.shp | psql gistest
gistest=# ¥d+
List of relations
Schema | Name | Type | Owner | Size |
--------+-------------+----------+--------+------------+
public | geo_japan | table | kaigai | 243 MB |
gistest=# ¥di+
List of relations
Schema | Name | Type | Owner | Table | Size |
--------+--------------------+-------+--------+-----------+---------+
public | geo_japan_pkey | index | kaigai | geo_japan | 2616 kB |
public | geo_japan_geom_idx | index | kaigai | geo_japan | 14 MB |
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
10
Why GPU is capable for SQL acceleration?
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
11
▌GPU’s characteristics
 SIMT (Single-Instruction Multiple-Threads) architecture
 More than thousands processor cores and 1.0TB/s grade memory band
➔ Designed for “same operations on large number of values”
▌SQL characteristics
 “same operations on large number of values”,
like evaluation of WHERE-clause, JOIN or GROUP BY.
SQL runs WHERE, JOIN, GROUP BY on very large number of rows
Thousands processor units on GPU
evaluates thousands rows in parallel
CPU Parallel
GPU Parallel
Performing PostGIS functions on GPU (1/2)
postgres=# ¥d _gistest
Table "public._gistest"
Column | Type | Collation | Nullable | Default
--------+----------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('_gistest_id_seq'::regclass)
a | geometry | | |
b | geometry | | |
postgres=# explain verbose select * from _gistest where st_contains(a,b);
QUERY PLAN
------------------------------------------------------------------------------------
Custom Scan (GpuScan) on public._gistest (cost=4251.50..4251.50 rows=1 width=196)
Output: id, a, b
GPU Filter: st_contains(_gistest.a, _gistest.b)
GPU Preference: None
Kernel Source: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_21028.4.gpu
Kernel Binary: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_21028.5.ptx
(6 rows)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
12
Performing PostGIS functions on GPU (2/2)
$ less /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_21028.4.gpu
:
#include "cuda_postgis.h"
#include "cuda_gpuscan.h"
DEVICE_FUNCTION(cl_bool)
gpuscan_quals_eval(kern_context *kcxt,
kern_data_store *kds,
ItemPointerData *t_self,
HeapTupleHeaderData *htup)
{
void *addr __attribute__((unused));
pg_geometry_t KVAR_2;
pg_geometry_t KVAR_3;
assert(htup != NULL);
EXTRACT_HEAP_TUPLE_BEGIN(addr, kds, htup);
EXTRACT_HEAP_TUPLE_NEXT(addr);
pg_datum_ref(kcxt,KVAR_2,addr); // pg_geometry_t
EXTRACT_HEAP_TUPLE_NEXT(addr);
pg_datum_ref(kcxt,KVAR_3,addr); // pg_geometry_t
EXTRACT_HEAP_TUPLE_END();
return EVAL(pgfn_st_contains(kcxt, KVAR_2, KVAR_3));
}
:
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
13
Load the geometry
values from column-A
and column-B
Calls GPU-revision of
st_contains() for each thread
GPU code automatically
generated for evaluation
of WHERE-clause
Basic performance (1/2)
--- GpuScan (GPU-rev PostGIS)
=# SELECT count(*) FROM ft
WHERE st_contains('polygon ((10 10,90 10,90 12,12 12,12 88,90 88,90 90,¥
10 90,10 10))’, st_makepoint(x,y));
count
--------
236610
(1 row)
Time: 44.680 ms
--- Vanilla PostGIS
=# SET pg_strom.enabled = off;
SET
=# SELECT count(*) FROM tt
WHERE st_contains('polygon ((10 10,90 10,90 12,12 12,12 88,90 88,90 90,¥
10 90,10 10))', st_makepoint(x,y));
count
--------
236610
(1 row)
Time: 622.314 ms
Count number of points in the specified
area from 5 million points.
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
14
Basic performance (2/2)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
15
(100,100)
(0,0)
(90,90)
(90,10)
(90,12)
(12,12)
(12,88) (90,88)
(10,90)
(10,10)
GPU-version of PostGIS
 geometry st_makepoint(float8, float8[, float8[, float8]])
 float8 st_distance(geometry,geometry)
 bool st_dwithin(geometry,geometry,float8)
 bool st_contains(geometry,geometry)
 bool st_crosses(geometry,geometry)
 text st_relate(geometry,geometry)
 ...and more in the future version
Current Status of the supported functions
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
16
Extract Points within Polygon
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
17
Number of Polygons x Points combination is too large even for GPU
Latest Location
(Point)
Area definition
(Polygon)
Mobile device
GPS
100K-10M devices
100 - 100K polygons
10M-1T
Combinations?
GiST-Index on GPU
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
18
How GiST-Index (R-tree) works
▌How GiST-Index (R-tree) works
✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers
✓ R4 is a rectangle that contains all of (R11,R12) and their pointers
✓ R12 is a rectangle that contains the target geometry and its ItemPointer
✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched.
✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree.
(xmin,ymin)
(xmax,ymax)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
19
Search Key (Lon,Lat)
〇 〇
How GiST-Index (R-tree) works
▌How GiST-Index (R-tree) works
✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers
✓ R4 is a rectangle that contains all of (R11,R12) and their pointers
✓ R12 is a rectangle that contains the target geometry and its ItemPointer
✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched.
✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree.
(xmin,ymin)
(xmax,ymax)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
20
Search Key (Lon,Lat)
〇 〇
× 〇 ×
How GiST-Index (R-tree) works
▌How GiST-Index (R-tree) works
✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers
✓ R4 is a rectangle that contains all of (R11,R12) and their pointers
✓ R12 is a rectangle that contains the target geometry and its ItemPointer
✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched.
✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree.
(xmin,ymin)
(xmax,ymax)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
21
Search Key (Lon,Lat)
〇 〇
× 〇 ×
〇
×
How GiST-Index (R-tree) works
▌How GiST-Index (R-tree) works
✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers
✓ R4 is a rectangle that contains all of (R11,R12) and their pointers
✓ R12 is a rectangle that contains the target geometry and its ItemPointer
✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched.
✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree.
(xmin,ymin)
(xmax,ymax)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
22
Search Key (Lon,Lat)
〇 〇
× 〇 ×
〇
×
× ×
GPU version of GiST-Index
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
23
▌Overview
 PG-Strom may utilize GiST-Index for joining a table with area definition (polygons; small)
and a table with locational data (points; large).
 Both the area definition table and its index are loaded onto GPU on GpuJoin.
 GpuJoin looks at the GiST-index first for rough pruning, using bounding-box
 Then, evaluate the “collision detection” with polygon value on the table
 Above operations by thousands cores of GPU in parallel,
so we expected its search performance is better, but...
collision detection of Polygons x Points as a part of GpuJoin
GiST-Index (R-tree)
Area Definitions
(Polygon)
A table with
Location Data
(Points)
Thousands
threads search
R-tree index
in parallel
Simple Test: Random points and St_Contains (2020-Sep)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
24
(123.0, 20.0)
(154.2, 46.2)
SELECT n03_001,n03_004,count(*)
FROM geo_japan j, geopoint p
WHERE st_contains(j.geom, st_makepoint(x,y))
AND j.n03_001 like ‘Tokyo’
GROUP BY n03_001,n03_004;
10M of randomly
generated
geolocational
points
n03_001 | n03_004 | count
---------+-------------+-------
Tokyo | Akiruno | 105
Tokyo | Miyake | 76
Tokyo | Mitaka | 17
Tokyo | Setagaya-ku | 67
Tokyo | Chuo-ku | 12
Tokyo | Nakano-ku | 18
Tokyo | Hachijo | 105
: : :
Tokyo | Toshima-ku | 14
Tokyo | Adachi-ku | 55
Tokyo | Aogashima | 7
Tokyo | Ome | 117
(63 rows)
CPU-rev: 30.539s
GPU-rev: 33.841s (Slow!)
Area definition data
by the Geospatial Information
Authority of Japan
Background) GPU’s Thread Scheduling
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
25
 GPU groups its processor cores by Streaming Multiprocessor(SM)
✓ Cores, Registers and L1 cache (shared memory) are managed per streaming-multiprocessor
 A SM can run up to 1024 threads simultaneously on the shared 64 cores/SM
✓ Like a very large scale hyper-threading in CPU
 Threads are scheduled per warp (32threads).
called SIMT (Single-Instruction Multiple-Thread) architecture
✓ Uniformed workloads, like matrix operations, fully utilizes the processor cores
✓ If a particular thread tends to consume large cycles, other threads in a warp must wait.
GPU Block Diagram
(Tesla V100; 80SMs)
Streaming Multiprocessors
(64CUDA cores/SM)
●●●…●●●
●●●…●●●
●●●…●●●
●●●…●●●
●●●…●●●
●●●…●●●
Thread Group
(1~1024 Threads)
Warp (32 threads)
Background) Internal design of GpuJoin
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
26
SELECT * FROM A, B
WHERE A.id = B.id;
Table-A
512 threads fetches
512 rows at once.
Table-B
● ● ● ● ● ●
〇 × 〇 × 〇 ×
GpuHashJoin / GpuNestLoop
N = __syncthreads_count(...)
Write out the JOIN results (if any)
thread-0 allocates result buffer for N-items
Fetch next frame,
and repeat the
above steps.
GpuHashJoin
• Hash calculation
• Search Hash table
• Evaluation of JOIN condition
GpuNestLoop
• Evaluation of JOIN condition
➔little differences in processing
cycles between the threads
References
Issues of index-search in GPU
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
27
Large variance of thread’s processing time, that leads synchronization wait!
Table-A
● ● ● ● ● ●
Reference
Index-A
× × × ×
×
〇
N = __syncthreads_count(...)
Reference
Walk down to the leaf
node of R-tree, but no
matched item
No matched items,
by just looking at
the root node
of R-tree
Walk down to the leaf
node of R-tree, and
evaluate JOIN conditions
Other GPU cores must be idle by the
completion of the longest operations.
Naive implementation of GPU GiST-Index Search (2020-Sep)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
28
Little utilization ratio of GPU cores, due to inter-core synchronization
● ● ● ● ● ● ●
●
Every thread loads a row from the input buffer,
and extract the key for GiST-Index.
Hit on the GiST-Index?
Search the GiST-Index by the key
Is the JOIN-condition true?
Evaluation of the JOIN-condition
write out JOIN-results (if found)
nitems = __syncthreads_count(found);
Repeat
found=true
found=false
No
No
A thread that found a matched entry can
block other 511 threads in the same
thread-group until completion of the
evaluation of JOIN-condition.
Very low efficiency of GPU core usage.
A new more optimal implementation (2020-Nov)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
29
Minimization of the synchronization point, to pull up utilization ratio
● ● ● ● ● ● ●
●
Every thread loads a row from the input buffer,
and extract the key for GiST-Index.
Search the GiST-Index by the key
Hit on the GiST-Index?
Allocation of the temporary buffer,
to save the pointers that hit GiST-Index above.
Consumption of temporary
buffer exceeds 512 items.
__syncthreads()
Evaluation of the JOIN-condition
Is the JOIN-condition true?
No
No
nitems = __syncthreads_count(found)
Write out JOIN results
Repeat
found=false
found=true
As long as temporary buffer has space,
threads continue to fetch rows and
search GiST-Index.
➔ Then, evaluate JOIN-conditions
by all the threads at once.
Simple Test: Random points and St_Contains (2020-Nov; the latest)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
30
(123.0, 20.0)
(154.2, 46.2)
SELECT n03_001,n03_004,count(*)
FROM geo_japan j, geopoint p
WHERE st_contains(j.geom, st_makepoint(x,y))
AND j.n03_001 like ‘Tokyo’
GROUP BY n03_001,n03_004;
10M of randomly
generated
geolocational
points
n03_001 | n03_004 | count
---------+-------------+-------
Tokyo | Akiruno | 105
Tokyo | Miyake | 76
Tokyo | Mitaka | 17
Tokyo | Setagaya-ku | 67
Tokyo | Chuo-ku | 12
Tokyo | Nakano-ku | 18
Tokyo | Hachijo | 105
: : :
Tokyo | Toshima-ku | 14
Tokyo | Adachi-ku | 55
Tokyo | Aogashima | 7
Tokyo | Ome | 117
(63 rows)
CPU-rev: 30.539s
GPU-rev: 0.316s
Area definition data
by the Geospatial Information
Authority of Japan
100 times
faster!!
EXPLAIN of the simple test (1/2) - CPU version
postgres=# EXPLAIN (analyze, costs off)
SELECT n03_001,n03_004,count(*)
FROM geo_japan j, geopoint p
WHERE st_contains(j.geom, st_makepoint(x,y))
AND j.n03_001 like ‘Tokyo’
GROUP BY n03_001,n03_004;
QUERY PLAN
--------------------------------------------------------------------------------------------
Finalize GroupAggregate (actual time=30709.855..30710.080 rows=63 loops=1)
Group Key: j.n03_001, j.n03_004
-> Gather Merge (actual time=30709.838..30732.270 rows=244 loops=1)
Workers Planned: 4
Workers Launched: 3
-> Partial GroupAggregate (actual time=30687.466..30687.572 rows=61 loops=4)
Group Key: j.n03_001, j.n03_004
-> Sort (actual time=30687.452..30687.475 rows=638 loops=4)
Sort Key: j.n03_001, j.n03_004
Sort Method: quicksort Memory: 73kB
-> Nested Loop (actual time=71.496..30686.278 rows=638 loops=4)
-> Parallel Seq Scan on geopoint p (actual time=0.012..207.553 rows=2500000 loops=4)
-> Index Scan using geo_japan_geom_idx on geo_japan j ¥
(actual time=0.012..0.012 rows=0 loops=10000000)
Index Cond: (geom ~ st_makepoint(p.x, p.y))
Filter: (((n03_001)::text ~~ ‘Tokyo’::text) AND
st_contains(geom, st_makepoint(p.x, p.y)))
Rows Removed by Filter: 0
Planning Time: 0.156 ms
Execution Time: 30732.422 ms
(21 rows)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
31
EXPLAIN of the simple test (2/2) - GPU version
postgres=# EXPLAIN (analyze, costs off)
SELECT n03_001,n03_004,count(*)
FROM geo_japan j, geopoint p
WHERE st_contains(j.geom, st_makepoint(x,y))
AND j.n03_001 like ‘Tokyo’
GROUP BY n03_001,n03_004;
QUERY PLAN
--------------------------------------------------------------------------------------------
GroupAggregate (actual time=329.118..329.139 rows=63 loops=1)
Group Key: j.n03_001, j.n03_004
-> Sort (actual time=329.107..329.110 rows=63 loops=1)
Sort Key: j.n03_001, j.n03_004
Sort Method: quicksort Memory: 29kB
-> Custom Scan (GpuPreAgg) (actual time=328.902..328.911 rows=63 loops=1)
Reduction: Local
Combined GpuJoin: enabled
-> Custom Scan (GpuJoin) on fgeopoint p (never executed)
Outer Scan: fgeopoint p (never executed)
Depth 1: GpuGiSTJoin(plan nrows: 10000000...60840000, actual nrows: 10000000...2553)
HeapSize: 7841.91KB (estimated: 3113.70KB), IndexSize: 13.28MB
IndexFilter: (j.geom ~ st_makepoint(p.x, p.y)) on geo_japan_geom_idx
Rows Fetched by Index: 4952
JoinQuals: st_contains(j.geom, st_makepoint(p.x, p.y))
-> Seq Scan on geo_japan j (actual time=0.164..17.723 rows=6173 loops=1)
Filter: ((n03_001)::text ~~ ‘Tokyo’::text)
Rows Removed by Filter: 112726
Planning Time: 0.344 ms
Execution Time: 340.415 ms
(20 rows)
Portion executed on GPU
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
32
Conclusion (1/2)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
33
▌GPU-version PostGIS
 Enhancement for parallel execution of PostGIS functions
 Also support GiST-Index (R-tree) of geometry type
 100 times faster results to pickup “points in area” type workloads.
▌Expected Use scenarios
 Area marketing analytics
 Real-time advertisement delivery
 Push event notifications, etc...
➔ GPU+PostGIS allows to run “computing intensive” workloads on your workstation or
cloud instance as like you are usually doing.
▌Resources
 GitHub: https://github.com/heterodb/pg-strom
 Document: http://heterodb.github.io/pg-strom/ja/
 Contact: Tw: @kkaigai / ✉ kaigai@heterodb.com
Conclusion (2/2)
PGconf.online 2021 - GPU version of PostGIS and GiST-Index
34
▌Resources
 GitHub: https://github.com/heterodb/pg-strom
 Document: http://heterodb.github.io/pg-strom/ja/
 Contact: Tw: @kkaigai / ✉ kaigai@heterodb.com
PG-Strom project welcomes your participation. Please contact us.
20210301_PGconf_Online_GPU_PostGIS_GiST_Index

More Related Content

PDF
20201128_OSC_Fukuoka_Online_GPUPostGIS
PDF
20181212 - PGconfASIA - LT - English
PDF
20210928_pgunconf_hll_count
PDF
20181016_pgconfeu_ssd2gpu_multi
PDF
20181116 Massive Log Processing using I/O optimized PostgreSQL
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
20201006_PGconf_Online_Large_Data_Processing
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
20201128_OSC_Fukuoka_Online_GPUPostGIS
20181212 - PGconfASIA - LT - English
20210928_pgunconf_hll_count
20181016_pgconfeu_ssd2gpu_multi
20181116 Massive Log Processing using I/O optimized PostgreSQL
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
20201006_PGconf_Online_Large_Data_Processing
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...

What's hot (20)

PDF
pgconfasia2016 plcuda en
PDF
20181025_pgconfeu_lt_gstorefdw
PDF
SQL+GPU+SSD=∞ (English)
PDF
20180920_DBTS_PGStrom_EN
PDF
20170602_OSSummit_an_intelligent_storage
PDF
20160407_GTC2016_PgSQL_In_Place
PDF
PG-Strom
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
20150318-SFPUG-Meetup-PGStrom
PDF
PG-Strom - GPU Accelerated Asyncr
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
20190909_PGconf.ASIA_KaiGai
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PDF
PostgreSQL with OpenCL
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PPTX
Parallel K means clustering using CUDA
PDF
PGConf.ASIA 2019 - PGSpider High Performance Cluster Engine - Shigeo Hirose
pgconfasia2016 plcuda en
20181025_pgconfeu_lt_gstorefdw
SQL+GPU+SSD=∞ (English)
20180920_DBTS_PGStrom_EN
20170602_OSSummit_an_intelligent_storage
20160407_GTC2016_PgSQL_In_Place
PG-Strom
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
20171206 PGconf.ASIA LT gstore_fdw
20150318-SFPUG-Meetup-PGStrom
PG-Strom - GPU Accelerated Asyncr
GPGPU Accelerates PostgreSQL (English)
20190909_PGconf.ASIA_KaiGai
Let's turn your PostgreSQL into columnar store with cstore_fdw
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PostgreSQL with OpenCL
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Parallel K means clustering using CUDA
PGConf.ASIA 2019 - PGSpider High Performance Cluster Engine - Shigeo Hirose
Ad

Similar to 20210301_PGconf_Online_GPU_PostGIS_GiST_Index (20)

ODP
Introduction To PostGIS
PDF
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PDF
Postgres Vision 2018: PostGIS and Spatial Extensions
 
PDF
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
PDF
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
PDF
GeoMesa on Apache Spark SQL with Anthony Fox
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Open source based software ‘gxt’ mangosystem
PDF
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
ODP
FOSS4G 2010 PostGIS Raster: an Open Source alternative to Oracle GeoRaster
PDF
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
PDF
Location based services for Nokia X and Nokia Asha using Geo2tag
PPTX
PostgreSQL 13 New Features
PDF
Datomic rtree-pres
PDF
Datomic R-trees
PPTX
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
PDF
Evolution of Spark APIs
PPT
NVIDIA CUDA
PDF
State of the Art Web Mapping with Open Source
PPTX
2017 PLSC Track: Using a Standard Version of ArcMap with External VRS Recieve...
Introduction To PostGIS
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
Postgres Vision 2018: PostGIS and Spatial Extensions
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
GeoMesa on Apache Spark SQL with Anthony Fox
Presto: Optimizing Performance of SQL-on-Anything Engine
Open source based software ‘gxt’ mangosystem
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
FOSS4G 2010 PostGIS Raster: an Open Source alternative to Oracle GeoRaster
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Location based services for Nokia X and Nokia Asha using Geo2tag
PostgreSQL 13 New Features
Datomic rtree-pres
Datomic R-trees
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Evolution of Spark APIs
NVIDIA CUDA
State of the Art Web Mapping with Open Source
2017 PLSC Track: Using a Standard Version of ArcMap with External VRS Recieve...
Ad

More from Kohei KaiGai (20)

PDF
20221116_DBTS_PGStrom_History
PDF
20221111_JPUG_CustomScan_API
PDF
20211112_jpugcon_gpu_and_arrow
PDF
20210731_OSC_Kyoto_PGStrom3.0
PDF
20210511_PGStrom_GpuCache
PDF
20201113_PGconf_Japan_GPU_PostGIS
PDF
20200828_OSCKyoto_Online
PDF
20200806_PGStrom_PostGIS_GstoreFdw
PDF
20200424_Writable_Arrow_Fdw
PDF
20191211_Apache_Arrow_Meetup_Tokyo
PDF
20191115-PGconf.Japan
PDF
20190926_Try_RHEL8_NVMEoF_Beta
PDF
20190925_DBTS_PGStrom
PDF
20190516_DLC10_PGStrom
PDF
20190418_PGStrom_on_ArrowFdw
PDF
20190314 PGStrom Arrow_Fdw
PDF
20181212 - PGconf.ASIA - LT
PDF
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
PDF
20181210 - PGconf.ASIA Unconference
PDF
20180920_DBTS_PGStrom_JP
20221116_DBTS_PGStrom_History
20221111_JPUG_CustomScan_API
20211112_jpugcon_gpu_and_arrow
20210731_OSC_Kyoto_PGStrom3.0
20210511_PGStrom_GpuCache
20201113_PGconf_Japan_GPU_PostGIS
20200828_OSCKyoto_Online
20200806_PGStrom_PostGIS_GstoreFdw
20200424_Writable_Arrow_Fdw
20191211_Apache_Arrow_Meetup_Tokyo
20191115-PGconf.Japan
20190926_Try_RHEL8_NVMEoF_Beta
20190925_DBTS_PGStrom
20190516_DLC10_PGStrom
20190418_PGStrom_on_ArrowFdw
20190314 PGStrom Arrow_Fdw
20181212 - PGconf.ASIA - LT
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181210 - PGconf.ASIA Unconference
20180920_DBTS_PGStrom_JP

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
Spectroscopy.pptx food analysis technology
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.

20210301_PGconf_Online_GPU_PostGIS_GiST_Index

  • 1. GPU version of PostGIS and GiST-Index ~A new horizon of geospatial data analytics~ HeteroDB,Inc Chief Architect & CEO KaiGai Kohei <kaigai@heterodb.com>
  • 2. about Us  KaiGai Kohei (海外浩平)  Chief Architect & CEO of HeteroDB  Contributor of PostgreSQL (2006-)  Primary Developer of PG-Strom (2012-)  Interested in: Big-data, GPU, NVME/PMEM, ... about Myself about HeteroDB  Established: 4th-Jul-2017  Location: Shinagawa, Tokyo, Japan  Businesses: ✓ Development of high-performance data-processing software on top of heterogeneous architecture. ✓ Technology consulting service on GPU&DB area. PGconf.online 2021 - GPU version of PostGIS and GiST-Index 2 PG-Strom
  • 3. What is PG-Strom? PGconf.online 2021 - GPU version of PostGIS and GiST-Index 3  Adds custom-scan/join/aggregate path with GPU execution  SQL to GPU binary code generation & Just-in-time compilation  Designed for IoT/M2M grade log-data processing  Closely connected NVME-SSD and GPU using P2P data transfer [I/O acceleration]  Geospatial analytics with GPU-version of PostGIS and GiST-index support PG-Strom: An extension of PostgreSQL to pull out maximum capability of GPU and NVME for processing of terabytes scale data App GPU off-loading for IoT/Big-Data for ML/Analytics PG-Strom’s Features ➢ JIT of GPU code with SQL & Transparent SQL acceleration ➢ SSD-to-GPU Direct SQL ➢ Columnar Store (Arrow_Fdw) ➢ Asymmetric Partition-wise JOIN/GROUP BY ➢ GPU Memory Store ➢ PostGIS support ➢ GiST-Index on GPU ➢ BRIN-Index support ➢ NVME-over-Fabric support ➢ Data-frame exchange for Python scripts NEW
  • 4. GPU’s overview PGconf.online 2021 - GPU version of PostGIS and GiST-Index 4 Over 10years history in HPC, then massive popularization in Machine-Learning NVIDIA Tesla V100 Super Computer (TITEC; TSUBAME3.0) Computer Graphics Machine-Learning More than thousands cores and TB/s grade memory bandwidth on a chip for highly computing intensive workloads. Simulation
  • 5. Target: Fast data search of mobile devices ▌Mobile devices  Location (Longitude, Latitude) is updated very frequently.  Its log data often contains (device_id, timestamp, location (point), other attributes) ▌Area definitions  Relatively small items, and almost static data.  Polygon often has very complicated form, thus heavy “collision detection”. ▌Purpose  Area marketing, Logistics analytics, Advertisement delivery, Emergency Alert, etc… Latest Location (Point) Area definition (Polygon) Mobile device Extract mobile devices within the target areas PGconf.online 2021 - GPU version of PostGIS and GiST-Index 5 GPS
  • 6. GPU-version PostGIS PGconf.online 2021 - GPU version of PostGIS and GiST-Index 6
  • 7. about PostGIS (1/2)  Extension to add geometry type, functions and operators.  First release at 2005, then contentious development over 15 years.  More than 400 functions / operators  R-tree on GiST-index framework  Parallel query execution © GAIA RESOURCES © OSGeo © KTGIS.net PGconf.online 2021 - GPU version of PostGIS and GiST-Index 7
  • 8. about PostGIS (2/2) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 8 St_Distance(a,b) St_Crosses(a,b) St_Contains(a,b)
  • 9. PostGIS’s optimization (1/2) ▌Bounding Box  The least rectangle that contains a polygon (that is often very complicated).  (x1,y1) - (x2,y2) form by FP32 (= 16bytes)  PostGIS assigns a bounding-box when it stores geometry values  It allows to skip heavy operations prior to geolocational operations obviously disjointed Tokyo Germany France PGconf.online 2021 - GPU version of PostGIS and GiST-Index 9
  • 10. PostGIS’s optimization (2/2) ▌R-Tree on GiST-Index  GiST (Generalized Search Tree) - A framework of index at PostgreSQL  PostGIS implements R-tree for Geometry type  Check for contains (‘@’ operator), overlaps (‘&&’ operator)  Works efficiently to check relationship between very large number of points and many polygons. # Data published by Geospatial Information Authority of Japan $ shp2pgsql N03-20_200101.shp | psql gistest gistest=# ¥d+ List of relations Schema | Name | Type | Owner | Size | --------+-------------+----------+--------+------------+ public | geo_japan | table | kaigai | 243 MB | gistest=# ¥di+ List of relations Schema | Name | Type | Owner | Table | Size | --------+--------------------+-------+--------+-----------+---------+ public | geo_japan_pkey | index | kaigai | geo_japan | 2616 kB | public | geo_japan_geom_idx | index | kaigai | geo_japan | 14 MB | PGconf.online 2021 - GPU version of PostGIS and GiST-Index 10
  • 11. Why GPU is capable for SQL acceleration? PGconf.online 2021 - GPU version of PostGIS and GiST-Index 11 ▌GPU’s characteristics  SIMT (Single-Instruction Multiple-Threads) architecture  More than thousands processor cores and 1.0TB/s grade memory band ➔ Designed for “same operations on large number of values” ▌SQL characteristics  “same operations on large number of values”, like evaluation of WHERE-clause, JOIN or GROUP BY. SQL runs WHERE, JOIN, GROUP BY on very large number of rows Thousands processor units on GPU evaluates thousands rows in parallel CPU Parallel GPU Parallel
  • 12. Performing PostGIS functions on GPU (1/2) postgres=# ¥d _gistest Table "public._gistest" Column | Type | Collation | Nullable | Default --------+----------+-----------+----------+-------------------------------------- id | integer | | not null | nextval('_gistest_id_seq'::regclass) a | geometry | | | b | geometry | | | postgres=# explain verbose select * from _gistest where st_contains(a,b); QUERY PLAN ------------------------------------------------------------------------------------ Custom Scan (GpuScan) on public._gistest (cost=4251.50..4251.50 rows=1 width=196) Output: id, a, b GPU Filter: st_contains(_gistest.a, _gistest.b) GPU Preference: None Kernel Source: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_21028.4.gpu Kernel Binary: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_21028.5.ptx (6 rows) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 12
  • 13. Performing PostGIS functions on GPU (2/2) $ less /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_21028.4.gpu : #include "cuda_postgis.h" #include "cuda_gpuscan.h" DEVICE_FUNCTION(cl_bool) gpuscan_quals_eval(kern_context *kcxt, kern_data_store *kds, ItemPointerData *t_self, HeapTupleHeaderData *htup) { void *addr __attribute__((unused)); pg_geometry_t KVAR_2; pg_geometry_t KVAR_3; assert(htup != NULL); EXTRACT_HEAP_TUPLE_BEGIN(addr, kds, htup); EXTRACT_HEAP_TUPLE_NEXT(addr); pg_datum_ref(kcxt,KVAR_2,addr); // pg_geometry_t EXTRACT_HEAP_TUPLE_NEXT(addr); pg_datum_ref(kcxt,KVAR_3,addr); // pg_geometry_t EXTRACT_HEAP_TUPLE_END(); return EVAL(pgfn_st_contains(kcxt, KVAR_2, KVAR_3)); } : PGconf.online 2021 - GPU version of PostGIS and GiST-Index 13 Load the geometry values from column-A and column-B Calls GPU-revision of st_contains() for each thread GPU code automatically generated for evaluation of WHERE-clause
  • 14. Basic performance (1/2) --- GpuScan (GPU-rev PostGIS) =# SELECT count(*) FROM ft WHERE st_contains('polygon ((10 10,90 10,90 12,12 12,12 88,90 88,90 90,¥ 10 90,10 10))’, st_makepoint(x,y)); count -------- 236610 (1 row) Time: 44.680 ms --- Vanilla PostGIS =# SET pg_strom.enabled = off; SET =# SELECT count(*) FROM tt WHERE st_contains('polygon ((10 10,90 10,90 12,12 12,12 88,90 88,90 90,¥ 10 90,10 10))', st_makepoint(x,y)); count -------- 236610 (1 row) Time: 622.314 ms Count number of points in the specified area from 5 million points. PGconf.online 2021 - GPU version of PostGIS and GiST-Index 14
  • 15. Basic performance (2/2) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 15 (100,100) (0,0) (90,90) (90,10) (90,12) (12,12) (12,88) (90,88) (10,90) (10,10)
  • 16. GPU-version of PostGIS  geometry st_makepoint(float8, float8[, float8[, float8]])  float8 st_distance(geometry,geometry)  bool st_dwithin(geometry,geometry,float8)  bool st_contains(geometry,geometry)  bool st_crosses(geometry,geometry)  text st_relate(geometry,geometry)  ...and more in the future version Current Status of the supported functions PGconf.online 2021 - GPU version of PostGIS and GiST-Index 16
  • 17. Extract Points within Polygon PGconf.online 2021 - GPU version of PostGIS and GiST-Index 17 Number of Polygons x Points combination is too large even for GPU Latest Location (Point) Area definition (Polygon) Mobile device GPS 100K-10M devices 100 - 100K polygons 10M-1T Combinations?
  • 18. GiST-Index on GPU PGconf.online 2021 - GPU version of PostGIS and GiST-Index 18
  • 19. How GiST-Index (R-tree) works ▌How GiST-Index (R-tree) works ✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers ✓ R4 is a rectangle that contains all of (R11,R12) and their pointers ✓ R12 is a rectangle that contains the target geometry and its ItemPointer ✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched. ✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree. (xmin,ymin) (xmax,ymax) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 19 Search Key (Lon,Lat) 〇 〇
  • 20. How GiST-Index (R-tree) works ▌How GiST-Index (R-tree) works ✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers ✓ R4 is a rectangle that contains all of (R11,R12) and their pointers ✓ R12 is a rectangle that contains the target geometry and its ItemPointer ✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched. ✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree. (xmin,ymin) (xmax,ymax) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 20 Search Key (Lon,Lat) 〇 〇 × 〇 ×
  • 21. How GiST-Index (R-tree) works ▌How GiST-Index (R-tree) works ✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers ✓ R4 is a rectangle that contains all of (R11,R12) and their pointers ✓ R12 is a rectangle that contains the target geometry and its ItemPointer ✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched. ✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree. (xmin,ymin) (xmax,ymax) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 21 Search Key (Lon,Lat) 〇 〇 × 〇 × 〇 ×
  • 22. How GiST-Index (R-tree) works ▌How GiST-Index (R-tree) works ✓ R1 is a rectangle [(Xmin,Ymin) – (Xmax,Ymax)] that contains all of (R3, R4, R5) and their pointers ✓ R4 is a rectangle that contains all of (R11,R12) and their pointers ✓ R12 is a rectangle that contains the target geometry and its ItemPointer ✓ Sequentially evaluates the entry for each tree-node. Dive into the next depth only if matched. ✓ Not fast as like B-tree, because of sequential evaluation for each depth in R-tree. (xmin,ymin) (xmax,ymax) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 22 Search Key (Lon,Lat) 〇 〇 × 〇 × 〇 × × ×
  • 23. GPU version of GiST-Index PGconf.online 2021 - GPU version of PostGIS and GiST-Index 23 ▌Overview  PG-Strom may utilize GiST-Index for joining a table with area definition (polygons; small) and a table with locational data (points; large).  Both the area definition table and its index are loaded onto GPU on GpuJoin.  GpuJoin looks at the GiST-index first for rough pruning, using bounding-box  Then, evaluate the “collision detection” with polygon value on the table  Above operations by thousands cores of GPU in parallel, so we expected its search performance is better, but... collision detection of Polygons x Points as a part of GpuJoin GiST-Index (R-tree) Area Definitions (Polygon) A table with Location Data (Points) Thousands threads search R-tree index in parallel
  • 24. Simple Test: Random points and St_Contains (2020-Sep) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 24 (123.0, 20.0) (154.2, 46.2) SELECT n03_001,n03_004,count(*) FROM geo_japan j, geopoint p WHERE st_contains(j.geom, st_makepoint(x,y)) AND j.n03_001 like ‘Tokyo’ GROUP BY n03_001,n03_004; 10M of randomly generated geolocational points n03_001 | n03_004 | count ---------+-------------+------- Tokyo | Akiruno | 105 Tokyo | Miyake | 76 Tokyo | Mitaka | 17 Tokyo | Setagaya-ku | 67 Tokyo | Chuo-ku | 12 Tokyo | Nakano-ku | 18 Tokyo | Hachijo | 105 : : : Tokyo | Toshima-ku | 14 Tokyo | Adachi-ku | 55 Tokyo | Aogashima | 7 Tokyo | Ome | 117 (63 rows) CPU-rev: 30.539s GPU-rev: 33.841s (Slow!) Area definition data by the Geospatial Information Authority of Japan
  • 25. Background) GPU’s Thread Scheduling PGconf.online 2021 - GPU version of PostGIS and GiST-Index 25  GPU groups its processor cores by Streaming Multiprocessor(SM) ✓ Cores, Registers and L1 cache (shared memory) are managed per streaming-multiprocessor  A SM can run up to 1024 threads simultaneously on the shared 64 cores/SM ✓ Like a very large scale hyper-threading in CPU  Threads are scheduled per warp (32threads). called SIMT (Single-Instruction Multiple-Thread) architecture ✓ Uniformed workloads, like matrix operations, fully utilizes the processor cores ✓ If a particular thread tends to consume large cycles, other threads in a warp must wait. GPU Block Diagram (Tesla V100; 80SMs) Streaming Multiprocessors (64CUDA cores/SM) ●●●…●●● ●●●…●●● ●●●…●●● ●●●…●●● ●●●…●●● ●●●…●●● Thread Group (1~1024 Threads) Warp (32 threads)
  • 26. Background) Internal design of GpuJoin PGconf.online 2021 - GPU version of PostGIS and GiST-Index 26 SELECT * FROM A, B WHERE A.id = B.id; Table-A 512 threads fetches 512 rows at once. Table-B ● ● ● ● ● ● 〇 × 〇 × 〇 × GpuHashJoin / GpuNestLoop N = __syncthreads_count(...) Write out the JOIN results (if any) thread-0 allocates result buffer for N-items Fetch next frame, and repeat the above steps. GpuHashJoin • Hash calculation • Search Hash table • Evaluation of JOIN condition GpuNestLoop • Evaluation of JOIN condition ➔little differences in processing cycles between the threads References
  • 27. Issues of index-search in GPU PGconf.online 2021 - GPU version of PostGIS and GiST-Index 27 Large variance of thread’s processing time, that leads synchronization wait! Table-A ● ● ● ● ● ● Reference Index-A × × × × × 〇 N = __syncthreads_count(...) Reference Walk down to the leaf node of R-tree, but no matched item No matched items, by just looking at the root node of R-tree Walk down to the leaf node of R-tree, and evaluate JOIN conditions Other GPU cores must be idle by the completion of the longest operations.
  • 28. Naive implementation of GPU GiST-Index Search (2020-Sep) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 28 Little utilization ratio of GPU cores, due to inter-core synchronization ● ● ● ● ● ● ● ● Every thread loads a row from the input buffer, and extract the key for GiST-Index. Hit on the GiST-Index? Search the GiST-Index by the key Is the JOIN-condition true? Evaluation of the JOIN-condition write out JOIN-results (if found) nitems = __syncthreads_count(found); Repeat found=true found=false No No A thread that found a matched entry can block other 511 threads in the same thread-group until completion of the evaluation of JOIN-condition. Very low efficiency of GPU core usage.
  • 29. A new more optimal implementation (2020-Nov) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 29 Minimization of the synchronization point, to pull up utilization ratio ● ● ● ● ● ● ● ● Every thread loads a row from the input buffer, and extract the key for GiST-Index. Search the GiST-Index by the key Hit on the GiST-Index? Allocation of the temporary buffer, to save the pointers that hit GiST-Index above. Consumption of temporary buffer exceeds 512 items. __syncthreads() Evaluation of the JOIN-condition Is the JOIN-condition true? No No nitems = __syncthreads_count(found) Write out JOIN results Repeat found=false found=true As long as temporary buffer has space, threads continue to fetch rows and search GiST-Index. ➔ Then, evaluate JOIN-conditions by all the threads at once.
  • 30. Simple Test: Random points and St_Contains (2020-Nov; the latest) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 30 (123.0, 20.0) (154.2, 46.2) SELECT n03_001,n03_004,count(*) FROM geo_japan j, geopoint p WHERE st_contains(j.geom, st_makepoint(x,y)) AND j.n03_001 like ‘Tokyo’ GROUP BY n03_001,n03_004; 10M of randomly generated geolocational points n03_001 | n03_004 | count ---------+-------------+------- Tokyo | Akiruno | 105 Tokyo | Miyake | 76 Tokyo | Mitaka | 17 Tokyo | Setagaya-ku | 67 Tokyo | Chuo-ku | 12 Tokyo | Nakano-ku | 18 Tokyo | Hachijo | 105 : : : Tokyo | Toshima-ku | 14 Tokyo | Adachi-ku | 55 Tokyo | Aogashima | 7 Tokyo | Ome | 117 (63 rows) CPU-rev: 30.539s GPU-rev: 0.316s Area definition data by the Geospatial Information Authority of Japan 100 times faster!!
  • 31. EXPLAIN of the simple test (1/2) - CPU version postgres=# EXPLAIN (analyze, costs off) SELECT n03_001,n03_004,count(*) FROM geo_japan j, geopoint p WHERE st_contains(j.geom, st_makepoint(x,y)) AND j.n03_001 like ‘Tokyo’ GROUP BY n03_001,n03_004; QUERY PLAN -------------------------------------------------------------------------------------------- Finalize GroupAggregate (actual time=30709.855..30710.080 rows=63 loops=1) Group Key: j.n03_001, j.n03_004 -> Gather Merge (actual time=30709.838..30732.270 rows=244 loops=1) Workers Planned: 4 Workers Launched: 3 -> Partial GroupAggregate (actual time=30687.466..30687.572 rows=61 loops=4) Group Key: j.n03_001, j.n03_004 -> Sort (actual time=30687.452..30687.475 rows=638 loops=4) Sort Key: j.n03_001, j.n03_004 Sort Method: quicksort Memory: 73kB -> Nested Loop (actual time=71.496..30686.278 rows=638 loops=4) -> Parallel Seq Scan on geopoint p (actual time=0.012..207.553 rows=2500000 loops=4) -> Index Scan using geo_japan_geom_idx on geo_japan j ¥ (actual time=0.012..0.012 rows=0 loops=10000000) Index Cond: (geom ~ st_makepoint(p.x, p.y)) Filter: (((n03_001)::text ~~ ‘Tokyo’::text) AND st_contains(geom, st_makepoint(p.x, p.y))) Rows Removed by Filter: 0 Planning Time: 0.156 ms Execution Time: 30732.422 ms (21 rows) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 31
  • 32. EXPLAIN of the simple test (2/2) - GPU version postgres=# EXPLAIN (analyze, costs off) SELECT n03_001,n03_004,count(*) FROM geo_japan j, geopoint p WHERE st_contains(j.geom, st_makepoint(x,y)) AND j.n03_001 like ‘Tokyo’ GROUP BY n03_001,n03_004; QUERY PLAN -------------------------------------------------------------------------------------------- GroupAggregate (actual time=329.118..329.139 rows=63 loops=1) Group Key: j.n03_001, j.n03_004 -> Sort (actual time=329.107..329.110 rows=63 loops=1) Sort Key: j.n03_001, j.n03_004 Sort Method: quicksort Memory: 29kB -> Custom Scan (GpuPreAgg) (actual time=328.902..328.911 rows=63 loops=1) Reduction: Local Combined GpuJoin: enabled -> Custom Scan (GpuJoin) on fgeopoint p (never executed) Outer Scan: fgeopoint p (never executed) Depth 1: GpuGiSTJoin(plan nrows: 10000000...60840000, actual nrows: 10000000...2553) HeapSize: 7841.91KB (estimated: 3113.70KB), IndexSize: 13.28MB IndexFilter: (j.geom ~ st_makepoint(p.x, p.y)) on geo_japan_geom_idx Rows Fetched by Index: 4952 JoinQuals: st_contains(j.geom, st_makepoint(p.x, p.y)) -> Seq Scan on geo_japan j (actual time=0.164..17.723 rows=6173 loops=1) Filter: ((n03_001)::text ~~ ‘Tokyo’::text) Rows Removed by Filter: 112726 Planning Time: 0.344 ms Execution Time: 340.415 ms (20 rows) Portion executed on GPU PGconf.online 2021 - GPU version of PostGIS and GiST-Index 32
  • 33. Conclusion (1/2) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 33 ▌GPU-version PostGIS  Enhancement for parallel execution of PostGIS functions  Also support GiST-Index (R-tree) of geometry type  100 times faster results to pickup “points in area” type workloads. ▌Expected Use scenarios  Area marketing analytics  Real-time advertisement delivery  Push event notifications, etc... ➔ GPU+PostGIS allows to run “computing intensive” workloads on your workstation or cloud instance as like you are usually doing. ▌Resources  GitHub: https://github.com/heterodb/pg-strom  Document: http://heterodb.github.io/pg-strom/ja/  Contact: Tw: @kkaigai / ✉ kaigai@heterodb.com
  • 34. Conclusion (2/2) PGconf.online 2021 - GPU version of PostGIS and GiST-Index 34 ▌Resources  GitHub: https://github.com/heterodb/pg-strom  Document: http://heterodb.github.io/pg-strom/ja/  Contact: Tw: @kkaigai / ✉ kaigai@heterodb.com PG-Strom project welcomes your participation. Please contact us.