[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

map-D
GDWDUHÀQHG
www.map-d.com
@datarefined Todd Mostak Ι
todd@map-d.com Ι
1830 Sansome St.
San Francisco, CA 94104
#mapd
@datarefined

map-D? super-fast database
built into GPU memory
Do?
world’s fastest
real-time big data analytics
interactive visualization
Demo?
twitter analytics platform
1billion+ tweets
milliseconds

The importance of interactivity
People have struggled for a long time to build interactive
visualizations of big data that can deliver insight
Interactivity means:
• Hypothesis testing can occur at “speed of thought”
How Interactive is interactive enough?
• According to a study by Jeffrey Heer and Zhicheng Liu, “an injected
delay of half a second per operation adversely affects user
performance in exploratory data analysis.”
• Some types of latency are more detrimental than others:
• For example, linking and brushing more sensitive than zooming

Strategies for interactivity
• Sampling:
• Ex. BlinkDB
• Issues:
• Need statistically robust method for sampling
• Sampling can miss “long-tail” phenomena
• Pre-computation
• Ex. ImMems (datacubing)
• Issues:
• Only can show what curator thought was relevant
• Can only store a certain number of binned attributes
• Must be curated!

The Arrival of In-Memory Systems
• Traditional RDBMS used to be too slow to serve as a back-end
for interactive visualizations.
• Queries of over a billion records could take minutes if not
hours
• But in-memory systems can execute such queries in a fraction
of the time.
• Both full DBMS and “pseudo”-DBMS solutions
• But still often too slow

Core Innovation
SQL-enabled column store database built into the memory
architecture on GPUs and CPUs
Code developed from scratch to take advantage of:
• Memory and computational bandwidth of multiple GPUs
• Heterogeneous architectures (CPUs and GPUs)
• Fast RDMA between GPUs on different nodes
• GPU Graphics pipeline
Double-level buffer pool across GPU and CPU memory
Shared scans – multiple queries of the same data can share
memory bandwidth
System can scan data at 2TB/sec per node, with 10TB/sec per
node logical throughput with shared scans

The
Hardware
IB
IB
GPU
1
GPU
2
GPU
3
PCI
PCI
CPU
0
S1
CPU
1
QPI
RAID
Controller
GPU
0
S2
S3
S4
IB
IB
GPU
1
GPU
2
GPU
3
PCI
PCI
CPU
0
S1
CPU
1
QPI
RAID
Controller
GPU
0
S2
S3
S4
Switch
Node
0
Node
1

The
Two-‐Level
Buffer
Pool
GPU
Memory
CPU
Memory
SSD

Shared Nothing Processing
Multiple GPUs, with data partitioned between them
Filter
text ILIKE ‘rain’!
Filter
Filter
Node
1
Node
2
Node
3

Complex
AnalyKcs
Image
processing
VisualizaKon
GPU
in-‐memory
SQL
database
OpenGL
H.264/VP8
streaming
GPU
pipeline
Machine
learning
Graph
analyKcs
License
Simple
#
of
GPUs
Mobile/server
versions
Scale
to
cluster
of
GPU
nodes
SQL
compiler
Shared
scans
User
defined
funcKons
Hybrid
GPU/CPU
execuKon
OpenCL
and
CUDA
Product GPU
powered
end-‐to-‐end
big
data
analyKcs
and
visualizaKon
plaYorm

Map-D code
Single GPU
12GB memory
Map-D code
integrated into
GPU memory
Single CPU
768GB memory
Map-D code
integrated into
CPU memory
NVIDIA TEGRA
Mobile chip
4GB memory
Map-D code
integrated into
chip memory
8 cards = 4U box
4 sockets = 4U box
Map-D code
runs on GPU +
CPU memory
36U rack:
~400GB GPU
~12TB CPU
Mobile Map-D running
small datasets
Native App
Web-based
service
Map-D hardware architecture
Large Data Big Data
Small Data
Next Gen Flash
40TB
100GB/s

map-D
www.map-d.com
@datarefined
info@map-d.com

[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

More Related Content

What's hot (16)

Viewers also liked (20)

Similar to [2C5]Map-D: A GPU Database for Interactive Big Data Analytics (20)

More from NAVER D2 (20)

Recently uploaded (20)

[2C5]Map-D: A GPU Database for Interactive Big Data Analytics