Boost.Compute
Kyle Lutz
A C++ library for GPU computing
“STL for Parallel Devices”
GPUs
(NVIDIA, AMD, Intel)
Multi-core CPUs
(Intel, AMD)
FPGAs
(Altera, Xilinx)
Accelerators
(Xeon Phi, Adapteva Epiphany)
accumulate()
adjacent_difference()
adjacent_find()
all_of()
any_of()
binary_search()
copy()
copy_if()
copy_n()
count()
count_if()
equal()
equal_range()
exclusive_scan()
fill()
fill_n()
find()
find_end()
find_if()
find_if_not()
for_each()
gather()
generate()
generate_n()
includes()
inclusive_scan()
inner_product()
inplace_merge()
iota()
is_partitioned()
is_permutation()
is_sorted()
lower_bound()
lexicographical_compare()
max_element()
merge()
min_element()
minmax_element()
mismatch()
next_permutation()
none_of()
nth_element()
partial_sum()
partition()
partition_copy()
partition_point()
prev_permutation()
random_shuffle()
reduce()
remove()
remove_if()
replace()
replace_copy()
reverse()
reverse_copy()
rotate()
rotate_copy()
scatter()
search()
search_n()
set_difference()
set_intersection()
set_symmetric_difference()
set_union()
sort()
sort_by_key()
stable_partition()
stable_sort()
swap_ranges()
transform()
transform_reduce()
unique()
unique_copy()
upper_bound()
Algorithms
Iterators
buffer_iterator<T>
constant_buffer_iterator<T>
constant_iterator<T>
counting_iterator<T>
discard_iterator
function_input_iterator<Function>
permutation_iterator<Elem, Index>
transform_iterator<Iter, Function>
zip_iterator<IterTuple>
array<T, N>
dynamic_bitset<T>
flat_map<Key, T>
flat_set<T>
stack<T>
string
valarray<T>
vector<T>
Containers
bernoulli_distribution
default_random_engine
discrete_distribution
linear_congruential_engine
mersenne_twister_engine
normal_distribution
uniform_int_distribution
uniform_real_distribution
Random Number Generators
Library Architecture
OpenCL
GPU CPU FPGA
Boost.Compute
Core
STL-like API
Lambda Expressions
RNGs
Interoperability
Why OpenCL?
(or why not CUDA/Thrust/Bolt/SYCL/OpenACC/OpenMP/C++AMP?)
• Standard C++ (no special compiler or compiler extensions)
• Library-based solution (no special build-system integration)
• Vendor-neutral, open-standard
Low-level API
Low-level API
• Provides classes to wrap OpenCL objects such as buffer, context,
program, and command_queue.
• Takes care of reference counting and error checking
• Also provides utility functions for handling error codes or setting up
the default device
Low-level API
#include <boost/compute/core.hpp>
// lookup default compute device
auto gpu = boost::compute::system::default_device();
// create opencl context for the device
auto ctx = boost::compute::context(gpu);
// create command queue for the device
auto queue = boost::compute::command_queue(ctx, gpu);
// print device name
std::cout << “device = “ << gpu.name() << std::endl;
Boost.Compute GTC 2015
High-level API
Sort Host Data
#include <vector>
#include <algorithm>
std::vector<int> vec = { ... };
std::sort(vec.begin(), vec.end());
Sort Host Data
#include <vector>
#include <boost/compute/algorithm/sort.hpp>
std::vector<int> vec = { ... };
boost::compute::sort(vec.begin(), vec.end(), queue);
0
2000
4000
6000
8000
1M 10M 100M
STL
Boost.Compute
Parallel Reduction
#include <boost/compute/algorithm/reduce.hpp>
#include <boost/compute/container/vector.hpp>
boost::compute::vector<int> data = { ... };
int sum = 0;
boost::compute::reduce(
data.begin(), data.end(), &sum, queue
);
std::cout << “sum = “ << sum << std::endl;
Algorithm Internals
• Fundamentally, STL-like algorithms produce OpenCL kernel objects which
are executed on a compute device.
C++
OpenCL
Custom Functions
BOOST_COMPUTE_FUNCTION(int, plus_two, (int x),
{
return x + 2;
});
boost::compute::transform(
v.begin(), v.end(), v.begin(), plus_two, queue
);
Lambda Expressions
using boost::compute::lambda::_1;
boost::compute::transform(
v.begin(), v.end(), v.begin(), _1 + 2, queue
);
• Offers a concise syntax for specifying custom operations
• Fully type-checked by the C++ compiler
Additional Features
OpenGL Interop
• OpenCL provides mechanisms for synchronizing with OpenGL to implement
direct rendering on the GPU
• Boost.Compute provides easy to use functions for interacting with OpenGL
in a portable manner.
OpenGLOpenCL
Program Caching
• Helps mitigate run-time kernel compilation costs
• Frequently-used kernels are stored and retrieved from the global cache
• Offline cache reduces this to one compilation per system
Auto-tuning
• OpenCL supports a wide variety of hardware with diverse execution
characteristics
• Algorithms support different execution parameters such as work-group size,
amount of work to execute serially
• These parameters are tunable and their results are measurable
• Boost.Compute includes benchmarks and tuning utilities to find the optimal
parameters for a given device
Auto-tuning
Recent News
Coming soon to Boost
• Went through Boost peer-review in December 2014
• Accepted as an official Boost library in January 2015
• Should be packaged in a Boost release this year (1.59)
Boost in GSoC
• Boost is an accepted organization for the Google Summer of Code 2015
• Last year Boost.Compute mentored a student who implemented many new
algorithms and features
• Open to mentoring another student this year
• See: https://svn.boost.org/trac/boost/wiki/SoC2015
Thank You
Source
http://github.com/kylelutz/compute
Documentation
http://kylelutz.github.io/compute

More Related Content

PDF
Making Hardware Accelerator Easier to Use
PDF
Ehsan parallel accelerator-dec2015
PPTX
HPAT presentation at JuliaCon 2016
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
PDF
Odsc workshop - Distributed Tensorflow on Hops
PDF
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
PDF
Python и программирование GPU (Ивашкевич Глеб)
PDF
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Making Hardware Accelerator Easier to Use
Ehsan parallel accelerator-dec2015
HPAT presentation at JuliaCon 2016
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Odsc workshop - Distributed Tensorflow on Hops
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Python и программирование GPU (Ивашкевич Глеб)
Scaling TensorFlow with Hops, Global AI Conference Santa Clara

What's hot (20)

PDF
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PPTX
Using Docker for GPU Accelerated Applications
PDF
Ixgbe internals
PPTX
GPU and Deep learning best practices
PPTX
Get moving: An overview of physics in DOTS – Unite Copenhagen 2019
PDF
Post-K: Building the Arm HPC Ecosystem
PDF
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
PDF
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
PDF
OpenPOWER Application Optimisation meet up
PPTX
Parallelizing Conqueror's Blade
DOCX
First fare 2010 java-beta-2011
PPTX
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
PDF
TinyML as-a-Service
PDF
Profiling PyTorch for Efficiency & Sustainability
PDF
LeFlowを調べてみました
PDF
20150318-SFPUG-Meetup-PGStrom
PDF
Performance evaluation with Arm HPC tools for SVE
PDF
Some experiences for porting application to Intel Xeon Phi
PDF
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
Using Docker for GPU Accelerated Applications
Ixgbe internals
GPU and Deep learning best practices
Get moving: An overview of physics in DOTS – Unite Copenhagen 2019
Post-K: Building the Arm HPC Ecosystem
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
OpenPOWER Application Optimisation meet up
Parallelizing Conqueror's Blade
First fare 2010 java-beta-2011
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
TinyML as-a-Service
Profiling PyTorch for Efficiency & Sustainability
LeFlowを調べてみました
20150318-SFPUG-Meetup-PGStrom
Performance evaluation with Arm HPC tools for SVE
Some experiences for porting application to Intel Xeon Phi
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Ad

Viewers also liked (9)

PDF
Giới thiệu dự án River Gate-Khu căn hộ cao cấp - Văn phòng - Thương mại
PPTX
электрочайники
PDF
Exploring Large Chemical Data Sets
PPTX
Căn hộ Era Town liền kề Phú Mỹ Hưng-0123.456.6171
PPT
My Work Samples
PPT
Software testing and software development process
PDF
Ai group-seminar-2013 nbc
PDF
Cce manual class_vi_vii_2010
PDF
Software quality management standards
Giới thiệu dự án River Gate-Khu căn hộ cao cấp - Văn phòng - Thương mại
электрочайники
Exploring Large Chemical Data Sets
Căn hộ Era Town liền kề Phú Mỹ Hưng-0123.456.6171
My Work Samples
Software testing and software development process
Ai group-seminar-2013 nbc
Cce manual class_vi_vii_2010
Software quality management standards
Ad

Similar to Boost.Compute GTC 2015 (20)

PPT
20111018 boost and gtest
PPTX
Fun with Lambdas: C++14 Style (part 2)
PPTX
Whats New in Visual Studio 2012 for C++ Developers
PDF
C++ How I learned to stop worrying and love metaprogramming
PDF
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
PDF
C++ Training
PDF
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PDF
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
PDF
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PDF
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PDF
Replace OutputIterator and Extend Range
PDF
(eBook PDF) C++ How to Program 10th Edition by Paul J. Deitel
PPTX
Let's Take A Look At The Boost Libraries
PDF
The Goal and The Journey - Turning back on one year of C++14 Migration
PDF
(eBook PDF) C++ How to Program 10th Edition by Paul J. Deitel
PDF
CUDA by Example : Parallel Programming in CUDA C : Notes
PPTX
Return of c++
PPTX
C++11: Feel the New Language
PDF
C++ tutorial boost – 2013
PDF
Modern C++
20111018 boost and gtest
Fun with Lambdas: C++14 Style (part 2)
Whats New in Visual Studio 2012 for C++ Developers
C++ How I learned to stop worrying and love metaprogramming
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
C++ Training
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
Replace OutputIterator and Extend Range
(eBook PDF) C++ How to Program 10th Edition by Paul J. Deitel
Let's Take A Look At The Boost Libraries
The Goal and The Journey - Turning back on one year of C++14 Migration
(eBook PDF) C++ How to Program 10th Edition by Paul J. Deitel
CUDA by Example : Parallel Programming in CUDA C : Notes
Return of c++
C++11: Feel the New Language
C++ tutorial boost – 2013
Modern C++

Recently uploaded (20)

PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
Cybersecurity: Protecting the Digital World
PPTX
"Secure File Sharing Solutions on AWS".pptx
PPTX
Introduction to Windows Operating System
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
Designing Intelligence for the Shop Floor.pdf
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Cost to Outsource Software Development in 2025
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Computer Software - Technology and Livelihood Education
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
Types of Token_ From Utility to Security.pdf
PDF
AI Guide for Business Growth - Arna Softech
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Tech Workshop Escape Room Tech Workshop
Weekly report ppt - harsh dattuprasad patel.pptx
Cybersecurity: Protecting the Digital World
"Secure File Sharing Solutions on AWS".pptx
Introduction to Windows Operating System
CCleaner 6.39.11548 Crack 2025 License Key
Visual explanation of Dijkstra's Algorithm using Python
Designing Intelligence for the Shop Floor.pdf
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Autodesk AutoCAD Crack Free Download 2025
Cost to Outsource Software Development in 2025
Why Generative AI is the Future of Content, Code & Creativity?
Computer Software - Technology and Livelihood Education
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
How Tridens DevSecOps Ensures Compliance, Security, and Agility
CNN LeNet5 Architecture: Neural Networks
Types of Token_ From Utility to Security.pdf
AI Guide for Business Growth - Arna Softech
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Computer Software and OS of computer science of grade 11.pptx
Tech Workshop Escape Room Tech Workshop

Boost.Compute GTC 2015