SlideShare a Scribd company logo
map-D 
GDWDUHÀQHG 
www.map-d.com 
@datarefined Todd Mostak Ι 
todd@map-d.com Ι 
1830 Sansome St. 
San Francisco, CA 94104 
#mapd 
@datarefined
map-D? super-fast database 
built into GPU memory 
Do? 
world’s fastest 
real-time big data analytics 
interactive visualization 
Demo? 
twitter analytics platform 
1billion+ tweets 
milliseconds
The importance of interactivity 
People have struggled for a long time to build interactive 
visualizations of big data that can deliver insight 
Interactivity means: 
• Hypothesis testing can occur at “speed of thought” 
How Interactive is interactive enough? 
• According to a study by Jeffrey Heer and Zhicheng Liu, “an injected 
delay of half a second per operation adversely affects user 
performance in exploratory data analysis.” 
• Some types of latency are more detrimental than others: 
• For example, linking and brushing more sensitive than zooming
Strategies for interactivity 
• Sampling: 
• Ex. BlinkDB 
• Issues: 
• Need statistically robust method for sampling 
• Sampling can miss “long-tail” phenomena 
• Pre-computation 
• Ex. ImMems (datacubing) 
• Issues: 
• Only can show what curator thought was relevant 
• Can only store a certain number of binned attributes 
• Must be curated!
The Arrival of In-Memory Systems 
• Traditional RDBMS used to be too slow to serve as a back-end 
for interactive visualizations. 
• Queries of over a billion records could take minutes if not 
hours 
• But in-memory systems can execute such queries in a fraction 
of the time. 
• Both full DBMS and “pseudo”-DBMS solutions 
• But still often too slow
Enter Map-D
the technology
Core Innovation 
SQL-enabled column store database built into the memory 
architecture on GPUs and CPUs 
Code developed from scratch to take advantage of: 
• Memory and computational bandwidth of multiple GPUs 
• Heterogeneous architectures (CPUs and GPUs) 
• Fast RDMA between GPUs on different nodes 
• GPU Graphics pipeline 
Double-level buffer pool across GPU and CPU memory 
Shared scans – multiple queries of the same data can share 
memory bandwidth 
System can scan data at  2TB/sec per node, with  10TB/sec per 
node logical throughput with shared scans
The 
Hardware 
IB 
IB 
GPU 
1 
GPU 
2 
GPU 
3 
PCI 
PCI 
CPU 
0 
S1 
CPU 
1 
QPI 
RAID 
Controller 
GPU 
0 
S2 
S3 
S4 
IB 
IB 
GPU 
1 
GPU 
2 
GPU 
3 
PCI 
PCI 
CPU 
0 
S1 
CPU 
1 
QPI 
RAID 
Controller 
GPU 
0 
S2 
S3 
S4 
Switch 
Node 
0 
Node 
1
The 
Two-­‐Level 
Buffer 
Pool 
GPU 
Memory 
CPU 
Memory 
SSD
Shared Nothing Processing 
Multiple GPUs, with data partitioned between them 
Filter 
text ILIKE ‘rain’! 
Filter 
text ILIKE ‘rain’! 
Filter 
text ILIKE ‘rain’! 
Node 
1 
Node 
2 
Node 
3
the product
Complex 
AnalyKcs 
Image 
processing 
VisualizaKon 
GPU 
in-­‐memory 
SQL 
database 
OpenGL 
H.264/VP8 
streaming 
GPU 
pipeline 
Machine 
learning 
Graph 
analyKcs 
License 
Simple 
# 
of 
GPUs 
Mobile/server 
versions 
Scale 
to 
cluster 
of 
GPU 
nodes 
SQL 
compiler 
Shared 
scans 
User 
defined 
funcKons 
Hybrid 
GPU/CPU 
execuKon 
OpenCL 
and 
CUDA 
Product GPU 
powered 
end-­‐to-­‐end 
big 
data 
analyKcs 
and 
visualizaKon 
plaYorm
Map-D code 
Single GPU 
12GB memory 
Map-D code 
integrated into 
GPU memory 
Single CPU 
768GB memory 
Map-D code 
integrated into 
CPU memory 
NVIDIA TEGRA 
Mobile chip 
4GB memory 
Map-D code 
integrated into 
chip memory 
8 cards = 4U box 
4 sockets = 4U box 
Map-D code 
runs on GPU + 
CPU memory 
36U rack: 
~400GB GPU 
~12TB CPU 
Mobile Map-D running 
small datasets 
Native App 
Web-based 
service 
Map-D hardware architecture 
Large Data Big Data 
Small Data 
Next Gen Flash 
40TB 
100GB/s
map-D 
www.map-d.com 
@datarefined 
info@map-d.com

More Related Content

PDF
[214]유연하고 확장성 있는 빅데이터 처리
PDF
SCasia 2018 MSFT hands on session for Azure Batch AI
PDF
[212]big models without big data using domain specific deep networks in data-...
PDF
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
PDF
Caching methodology and strategies
PPTX
High Performance Computing (HPC) in cloud
PPTX
MEW22 22nd Machine Evaluation Workshop Microsoft
PDF
Deep Learning Computer Build
[214]유연하고 확장성 있는 빅데이터 처리
SCasia 2018 MSFT hands on session for Azure Batch AI
[212]big models without big data using domain specific deep networks in data-...
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Caching methodology and strategies
High Performance Computing (HPC) in cloud
MEW22 22nd Machine Evaluation Workshop Microsoft
Deep Learning Computer Build

What's hot (16)

PDF
deep learning in production cff 2017
PDF
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
PDF
re:dash is awesome
PPTX
R in Minecraft
PDF
Microsoft Azure in HPC scenarios
PDF
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
PPTX
Speeding up R with Parallel Programming in the Cloud
ODP
Nyc summit intro_to_cassandra
PDF
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
PPTX
Tensorflow vs MxNet
PPTX
Arc305 how netflix leverages multiple regions to increase availability an i...
PDF
Move Over, Rsync
PDF
DataStax and Esri: Geotemporal IoT Search and Analytics
PDF
Managing your Black Friday Logs
PDF
Building the Right Platform Architecture for Hadoop
PDF
An Introduction to Using PostgreSQL with Docker & Kubernetes
deep learning in production cff 2017
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
re:dash is awesome
R in Minecraft
Microsoft Azure in HPC scenarios
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Speeding up R with Parallel Programming in the Cloud
Nyc summit intro_to_cassandra
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Tensorflow vs MxNet
Arc305 how netflix leverages multiple regions to increase availability an i...
Move Over, Rsync
DataStax and Esri: Geotemporal IoT Search and Analytics
Managing your Black Friday Logs
Building the Right Platform Architecture for Hadoop
An Introduction to Using PostgreSQL with Docker & Kubernetes
Ad

Viewers also liked (20)

PDF
[2B4]Live Broadcasting 추천시스템
PDF
[2C4]Clustered computing with CoreOS, fleet and etcd
PDF
[2A1]Line은 어떻게 글로벌 메신저 플랫폼이 되었는가
PDF
[1D4]오타 수정과 편집 기능을 가진 Android Keyboard Service 개발기
PDF
[1B5]github first-principles
PDF
[1A1]행복한프로그래머를위한철학
PDF
[2D4]Python에서의 동시성_병렬성
PDF
[1B2]자신있는개발자에서훌륭한개발자로
PDF
[1D2]아이비컨과 공유기 해킹을 통한 인도어 IOT 삽질기
PDF
[Hello world]nodejs helloworld chaesuwon
PDF
How to Solve Real-Time Data Problems
PDF
[Hello world]git internal
PDF
[Hello world]n forge
PDF
[Hello world]play framework소개
PDF
Deview2014 Live Broadcasting 추천시스템 발표 자료
PDF
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
PDF
S3332 peter bakkum
PDF
GPU Programming
PDF
제2회 hello world 오픈세미나 collie html5-animationlibrary
PPTX
RHQ를 활용한 Legacy system 모니터링
[2B4]Live Broadcasting 추천시스템
[2C4]Clustered computing with CoreOS, fleet and etcd
[2A1]Line은 어떻게 글로벌 메신저 플랫폼이 되었는가
[1D4]오타 수정과 편집 기능을 가진 Android Keyboard Service 개발기
[1B5]github first-principles
[1A1]행복한프로그래머를위한철학
[2D4]Python에서의 동시성_병렬성
[1B2]자신있는개발자에서훌륭한개발자로
[1D2]아이비컨과 공유기 해킹을 통한 인도어 IOT 삽질기
[Hello world]nodejs helloworld chaesuwon
How to Solve Real-Time Data Problems
[Hello world]git internal
[Hello world]n forge
[Hello world]play framework소개
Deview2014 Live Broadcasting 추천시스템 발표 자료
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
S3332 peter bakkum
GPU Programming
제2회 hello world 오픈세미나 collie html5-animationlibrary
RHQ를 활용한 Legacy system 모니터링
Ad

Similar to [2C5]Map-D: A GPU Database for Interactive Big Data Analytics (20)

PDF
SoCal Data Science Conference: Machine Learning & Data Science in the Age of ...
PDF
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
PDF
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
PDF
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
PDF
DSDT Meetup January 2018
PDF
Dsdt meetup-january2018
PPT
Presentation
PDF
[db tech showcase Tookyo 2018] #dbts2018 #B24 『Speed Meets Scale: Analyzing &...
PDF
Where Does Big Data Meet Big Database - QCon 2012
PDF
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
PDF
GPU databases - How to use them and what the future holds
PDF
GPU-Accelerating A Deep Learning Anomaly Detection Platform
PPTX
Cloud Computing y Big Data, próxima frontera de la innovación
PPTX
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
PDF
GPU Technology Conference 2014 Keynote
PPTX
London hug
PDF
GOAI: GPU-Accelerated Data Science DataSciCon 2017
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
PDF
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
PPTX
Predictive Analytics San Diego
SoCal Data Science Conference: Machine Learning & Data Science in the Age of ...
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
DSDT Meetup January 2018
Dsdt meetup-january2018
Presentation
[db tech showcase Tookyo 2018] #dbts2018 #B24 『Speed Meets Scale: Analyzing &...
Where Does Big Data Meet Big Database - QCon 2012
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
GPU databases - How to use them and what the future holds
GPU-Accelerating A Deep Learning Anomaly Detection Platform
Cloud Computing y Big Data, próxima frontera de la innovación
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
GPU Technology Conference 2014 Keynote
London hug
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Predictive Analytics San Diego

More from NAVER D2 (20)

PDF
[211] 인공지능이 인공지능 챗봇을 만든다
PDF
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
PDF
[215] Druid로 쉽고 빠르게 데이터 분석하기
PDF
[245]Papago Internals: 모델분석과 응용기술 개발
PDF
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
PDF
[235]Wikipedia-scale Q&A
PDF
[244]로봇이 현실 세계에 대해 학습하도록 만들기
PDF
[243] Deep Learning to help student’s Deep Learning
PDF
[234]Fast & Accurate Data Annotation Pipeline for AI applications
PDF
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
PDF
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
PDF
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
PDF
[224]네이버 검색과 개인화
PDF
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
PDF
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
PDF
[213] Fashion Visual Search
PDF
[232] TensorRT를 활용한 딥러닝 Inference 최적화
PDF
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
PDF
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
PDF
[223]기계독해 QA: 검색인가, NLP인가?
[211] 인공지능이 인공지능 챗봇을 만든다
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[215] Druid로 쉽고 빠르게 데이터 분석하기
[245]Papago Internals: 모델분석과 응용기술 개발
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[235]Wikipedia-scale Q&A
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[243] Deep Learning to help student’s Deep Learning
[234]Fast & Accurate Data Annotation Pipeline for AI applications
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[224]네이버 검색과 개인화
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[213] Fashion Visual Search
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[223]기계독해 QA: 검색인가, NLP인가?

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars

[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

  • 1. map-D GDWDUHÀQHG www.map-d.com @datarefined Todd Mostak Ι todd@map-d.com Ι 1830 Sansome St. San Francisco, CA 94104 #mapd @datarefined
  • 2. map-D? super-fast database built into GPU memory Do? world’s fastest real-time big data analytics interactive visualization Demo? twitter analytics platform 1billion+ tweets milliseconds
  • 3. The importance of interactivity People have struggled for a long time to build interactive visualizations of big data that can deliver insight Interactivity means: • Hypothesis testing can occur at “speed of thought” How Interactive is interactive enough? • According to a study by Jeffrey Heer and Zhicheng Liu, “an injected delay of half a second per operation adversely affects user performance in exploratory data analysis.” • Some types of latency are more detrimental than others: • For example, linking and brushing more sensitive than zooming
  • 4. Strategies for interactivity • Sampling: • Ex. BlinkDB • Issues: • Need statistically robust method for sampling • Sampling can miss “long-tail” phenomena • Pre-computation • Ex. ImMems (datacubing) • Issues: • Only can show what curator thought was relevant • Can only store a certain number of binned attributes • Must be curated!
  • 5. The Arrival of In-Memory Systems • Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations. • Queries of over a billion records could take minutes if not hours • But in-memory systems can execute such queries in a fraction of the time. • Both full DBMS and “pseudo”-DBMS solutions • But still often too slow
  • 8. Core Innovation SQL-enabled column store database built into the memory architecture on GPUs and CPUs Code developed from scratch to take advantage of: • Memory and computational bandwidth of multiple GPUs • Heterogeneous architectures (CPUs and GPUs) • Fast RDMA between GPUs on different nodes • GPU Graphics pipeline Double-level buffer pool across GPU and CPU memory Shared scans – multiple queries of the same data can share memory bandwidth System can scan data at 2TB/sec per node, with 10TB/sec per node logical throughput with shared scans
  • 9. The Hardware IB IB GPU 1 GPU 2 GPU 3 PCI PCI CPU 0 S1 CPU 1 QPI RAID Controller GPU 0 S2 S3 S4 IB IB GPU 1 GPU 2 GPU 3 PCI PCI CPU 0 S1 CPU 1 QPI RAID Controller GPU 0 S2 S3 S4 Switch Node 0 Node 1
  • 10. The Two-­‐Level Buffer Pool GPU Memory CPU Memory SSD
  • 11. Shared Nothing Processing Multiple GPUs, with data partitioned between them Filter text ILIKE ‘rain’! Filter text ILIKE ‘rain’! Filter text ILIKE ‘rain’! Node 1 Node 2 Node 3
  • 13. Complex AnalyKcs Image processing VisualizaKon GPU in-­‐memory SQL database OpenGL H.264/VP8 streaming GPU pipeline Machine learning Graph analyKcs License Simple # of GPUs Mobile/server versions Scale to cluster of GPU nodes SQL compiler Shared scans User defined funcKons Hybrid GPU/CPU execuKon OpenCL and CUDA Product GPU powered end-­‐to-­‐end big data analyKcs and visualizaKon plaYorm
  • 14. Map-D code Single GPU 12GB memory Map-D code integrated into GPU memory Single CPU 768GB memory Map-D code integrated into CPU memory NVIDIA TEGRA Mobile chip 4GB memory Map-D code integrated into chip memory 8 cards = 4U box 4 sockets = 4U box Map-D code runs on GPU + CPU memory 36U rack: ~400GB GPU ~12TB CPU Mobile Map-D running small datasets Native App Web-based service Map-D hardware architecture Large Data Big Data Small Data Next Gen Flash 40TB 100GB/s