SlideShare a Scribd company logo
GPU Algorithms
David Hauck
github.com/davidhauck
@david_hauck_mke
davidhauck40.blogspot.com
dhauck@skylinetechnologies.com
Graphics Processing
Unit
Intro to Cuda
Intro to Cuda
Why?
Intro to Cuda
Graphics Processing
Unit
Graphics Processing
Unit
General Purpose
T E
M S
R
HOST
DEVICE
Intro to Cuda
PCI Bus
Copy initial
data to
DEVICE
PCI Bus
Run DEVICE
Executable
PCI Bus
Copy Results
Back To HOST
Still Running on CPU
Still Running on CPU
GPU is a Resource
Intro to Cuda
MEMORY
CONSCIOUSNESS
HOST DEVICE
POINTERSPOINTERS
int *a;
int *a;
int *d_a;
arr = malloc(size);
arr = malloc(size);
cudaMalloc(&d_arr, size);
free(arr);
free(arr);
cudaFree(d_arr);
memcpy(dest, source, size);
memcpy(dest, source, size);
cudaMemcpy(&dest, src, size, …);
1: HOST DEVICE
2: EXECUTE
3: DEVICE HOST
1: HOST DEVICE
3: DEVICE HOST
cudaMemcpy();
1: HOST DEVICE
cudaMemcpy(
&dest,
source,
size, ..hostToDevice);
EXECUTION
__global__ void myKernel(int *a){
}
myKernel<<<1,1>>>(d_arr);
Let’s do an example
a
b
c
d
+
e
f
g
h
=
i
j
k
l
a
b
c
d
+
e
f
g
h
=
i
j
k
l
a
b
c
d
+
e
f
g
h
=
i
j
k
l
threadIdx.x
0
1
2
3
int index = threadIdx.x;
c[index] =
a[index] +
b[index];
Let’s invent an
ALGORITHM
K-Means Clustering
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
Intro to Cuda
CODE
Shared Memory
• ~48k
• Multiple GB device memory (100x higher latency)
• Access memory in order
• 1 2 3
• 4 5 6
• 7 8 9
Considerations
• Transistors are allocated to arithmetic, not memory. Sometimes
better to recompute rather than cache
• Copying to/from host takes a while. Sometimes sequential operations
can stay on gpu
• Avoid serialization (shared memory bank conflicts)
• Asynchronous memory operations

More Related Content

PPTX
General Programming on the GPU - Confoo
PPTX
C++ AMP 실천 및 적용 전략
PDF
Some Commonly asked Function/Objects Vs. header files (CBSE 12th Exam)
PDF
Tools.cpp
PPTX
Conflux:gpgpu for .net (en)
TXT
Problemas de Arreglos en c++
PDF
An Introduction to JavaScript: Week 4
PDF
Basicsof c make and git for a hello qt application
General Programming on the GPU - Confoo
C++ AMP 실천 및 적용 전략
Some Commonly asked Function/Objects Vs. header files (CBSE 12th Exam)
Tools.cpp
Conflux:gpgpu for .net (en)
Problemas de Arreglos en c++
An Introduction to JavaScript: Week 4
Basicsof c make and git for a hello qt application

What's hot (16)

PDF
bpftrace - Tracing Summit 2018
TXT
Simulador carrera de caballos desarrollado en C++
PDF
Ct479 9
DOCX
Array using recursion
PPTX
2 19-2018-mean of all runs
PPT
Jan 2012 HUG: RHadoop
PDF
Powered by Python - PyCon Germany 2016
PDF
Docopt
PDF
A tour of Python
PPT
Swug July 2010 - windows debugging by sainath
PDF
Unleash your build with nuke
PPTX
Animaton Package in R An Example:
PDF
Data structure programs in c++
DOCX
Conversion of data types in java
bpftrace - Tracing Summit 2018
Simulador carrera de caballos desarrollado en C++
Ct479 9
Array using recursion
2 19-2018-mean of all runs
Jan 2012 HUG: RHadoop
Powered by Python - PyCon Germany 2016
Docopt
A tour of Python
Swug July 2010 - windows debugging by sainath
Unleash your build with nuke
Animaton Package in R An Example:
Data structure programs in c++
Conversion of data types in java
Ad

Viewers also liked (12)

PPT
Mod5 forms
PDF
Word processing v5
PDF
Ii l11 te_intro_memo
PPT
Mod3 word corrections
PPT
Mod2 ljud soundcloud
PPT
La crégut 3 équipements hydroélectriques du territoire ; fil de l'eau haute t...
PDF
Mod4 Google-Drive
PPTX
Law of Exponents and the Law of Logarithms
PPT
Introduction of sequence
PDF
Modul 08 clouds google drive
PPTX
The binomial theorem
Mod5 forms
Word processing v5
Ii l11 te_intro_memo
Mod3 word corrections
Mod2 ljud soundcloud
La crégut 3 équipements hydroélectriques du territoire ; fil de l'eau haute t...
Mod4 Google-Drive
Law of Exponents and the Law of Logarithms
Introduction of sequence
Modul 08 clouds google drive
The binomial theorem
Ad

Similar to Intro to Cuda (20)

PDF
Tema3_Introduction_to_CUDA_C.pdf
PPTX
introduction to CUDA_C.pptx it is widely used
PPT
Cuda intro
PDF
Cuda introduction
PPTX
Introduction_to_CUDA_C_simple et parfiat.pptx
PPTX
Parallel Futures of a Game Engine
PPT
Intro2 Cuda Moayad
PPTX
Track c-High speed transaction-based hw-sw coverification -eve
PDF
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PDF
Cuda materials
PDF
clWrap: Nonsense free control of your GPU
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PDF
NVIDIA cuda programming, open source and AI
PDF
NVIDIA HPC ソフトウエア斜め読み
PPTX
Introduction to Accelerators
PDF
Introduction to cuda geek camp singapore 2011
PPTX
ISCA Final Presentaiton - Compilations
PDF
C++ amp on linux
PDF
Study on Android Emulator
PPT
Intercepting Windows Printing by Modifying GDI Subsystem
Tema3_Introduction_to_CUDA_C.pdf
introduction to CUDA_C.pptx it is widely used
Cuda intro
Cuda introduction
Introduction_to_CUDA_C_simple et parfiat.pptx
Parallel Futures of a Game Engine
Intro2 Cuda Moayad
Track c-High speed transaction-based hw-sw coverification -eve
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
Cuda materials
clWrap: Nonsense free control of your GPU
Using GPUs to handle Big Data with Java by Adam Roberts.
NVIDIA cuda programming, open source and AI
NVIDIA HPC ソフトウエア斜め読み
Introduction to Accelerators
Introduction to cuda geek camp singapore 2011
ISCA Final Presentaiton - Compilations
C++ amp on linux
Study on Android Emulator
Intercepting Windows Printing by Modifying GDI Subsystem

Recently uploaded (20)

PPTX
history of c programming in notes for students .pptx
PDF
Nekopoi APK 2025 free lastest update
PDF
Understanding Forklifts - TECH EHS Solution
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
Transform Your Business with a Software ERP System
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
top salesforce developer skills in 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
System and Network Administraation Chapter 3
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Essential Infomation Tech presentation.pptx
history of c programming in notes for students .pptx
Nekopoi APK 2025 free lastest update
Understanding Forklifts - TECH EHS Solution
wealthsignaloriginal-com-DS-text-... (1).pdf
How to Migrate SBCGlobal Email to Yahoo Easily
CHAPTER 2 - PM Management and IT Context
Navsoft: AI-Powered Business Solutions & Custom Software Development
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Digital Strategies for Manufacturing Companies
Transform Your Business with a Software ERP System
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
System and Network Administraation Chapter 3
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms I-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
Upgrade and Innovation Strategies for SAP ERP Customers
Essential Infomation Tech presentation.pptx

Intro to Cuda