SlideShare a Scribd company logo
High Performance GPU
Computing with Ruby
Prasun Anand
About me
● SciRuby Contributor
● Google Summer of Code 2016, 2017
● Genenetwork project
● Ruby Grant 2017
● Projects:
○ JRuby port of NMatrix
○ ArrayFire gem
○ RbCUDA
Scientific Computing
Arrays / Matrices
BLAS and LAPACK
GPU Computing is not easy !
CUDA and OpenCL
High performance GPU computing with Ruby  RubyConf 2017
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
[2] pry(main)> b = a + a
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
2.0000 6.0000
4.0000 8.0000
=> #<ArrayFire::Af_Array:0x000000020625c8>
[1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10]
No Name Array
[3 3 1 1]
1.0000 4.0000 -5.0000
4.0000 11.0000 8.0000
6.0000 2.0000 10.0000
=> #<ArrayFire::Af_Array:0x000000014e56c8>
[2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8]
No Name Array
[3 2 1 1]
1.0000 10.0000
0.0000 -11.0000
8.0000 8.0000
=> #<ArrayFire::Af_Array:0x00000001591db0>
[3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE)
No Name Array
[3 2 1 1]
-39.0000 -74.0000
68.0000 -17.0000
86.0000 118.0000
=> #<ArrayFire::Af_Array:0x000000016136f8>
VALUE arf_init(int argc, VALUE* argv, VALUE self)
{
afstruct* afarray;
Data_Get_Struct(self, afstruct, afarray);
dim_t ndims = (dim_t)NUM2LONG(argv[0]);
dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t));
dim_t count = 1;
for (size_t index = 0; index < ndims; index++) {
dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index));
count *= dimensions[index];
}
double* host_array = (double*)malloc(count * sizeof(double));
for (size_t index = 0; index < count; index++) {
host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index));
}
af_create_array(&afarray->carray, host_array, ndims, dimensions, f64);
return self;
}
BLAS functionalities
● Matmult
● Transpose
LAPACK functionalities
● Det
● Inverse
● Norm
● Qr
● Cholesky
● Svd
● lu
Statistics
● Mean
● Median
● Covariance
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
High performance GPU computing with Ruby  RubyConf 2017
10 X
Faster than NMatrix-Ruby-Lapack
High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017
10,000 X
Faster than NMatrix-Ruby
High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017
100,000 X
Faster than NMatrix-Ruby-BLAS
High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017
RbCUDA
GPU Array
● Generic pointer used to handle an array of elements on the GPU.
● Memory copying from CPU to GPU and vice-versa.
● Interfaced with NMatrix and NArray
vadd_kernel_src = <<-EOS
extern "C" {
__global__ void matSum(int *a, int *b, int *c)
{
int tid = blockIdx.x;
if (tid < 100)
c[tid] = a[tid] + b[tid];
}
}
EOS
f = compile(vadd_kernel_src)
RbCUDA::Driver.run_kernel(f.path)
● CuBLAS
● CuSolver
● CuRand
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
High performance GPU computing with Ruby  RubyConf 2017
1,000,000 X
Faster than NMatrix-Ruby-BLAS
High performance GPU computing with Ruby  RubyConf 2017
Future Work
● Image Processing APIs and Indexers
● Multiple dtypes
● RbCUDA is under active development.
● Project RbCUDA is being funded by Ruby Association
● (Ruby Grant 2017)
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/rbcuda
Contributions are Welcome!
Acknowledgements
1. Pjotr Prins
2. Pradeep Garigipati
High performance GPU computing with Ruby  RubyConf 2017
Thanks!
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com

More Related Content

PDF
Rubyconfindia2018 - GPU accelerated libraries for Ruby
PDF
High Performance GPU computing with Ruby, Rubykaigi 2018
PPT
Artdm170 Week10 Arrays Math
PDF
ECMAScript 6 major changes
PDF
Python bokeh cheat_sheet
PDF
Python seaborn cheat_sheet
KEY
テストデータどうしてますか?
PDF
Pandas pythonfordatascience
Rubyconfindia2018 - GPU accelerated libraries for Ruby
High Performance GPU computing with Ruby, Rubykaigi 2018
Artdm170 Week10 Arrays Math
ECMAScript 6 major changes
Python bokeh cheat_sheet
Python seaborn cheat_sheet
テストデータどうしてますか?
Pandas pythonfordatascience

What's hot (20)

PPTX
Ricky Bobby's World
PPTX
30 分鐘學會實作 Python Feature Selection
PDF
Groovy collection api
PDF
Python matplotlib cheat_sheet
PDF
Hyperparameter optimization landscape
PPTX
Fact, Fiction, and FP
PPTX
Paperjs presentation
PDF
The Ring programming language version 1.7 book - Part 48 of 196
PDF
Statistical Schema Induction
ODP
Patterns for slick database applications
PDF
Useful javascript
PDF
The Ring programming language version 1.5 book - Part 8 of 31
PDF
Programmation fonctionnelle en JavaScript
PDF
The Ring programming language version 1.8 book - Part 35 of 202
PPTX
PDF
Pandas Cheat Sheet
PDF
Pybelsberg — Constraint-based Programming in Python
PDF
미려한 UI/UX를 위한 여정
PDF
Data Structures in javaScript 2015
PDF
Clustering com numpy e cython
Ricky Bobby's World
30 分鐘學會實作 Python Feature Selection
Groovy collection api
Python matplotlib cheat_sheet
Hyperparameter optimization landscape
Fact, Fiction, and FP
Paperjs presentation
The Ring programming language version 1.7 book - Part 48 of 196
Statistical Schema Induction
Patterns for slick database applications
Useful javascript
The Ring programming language version 1.5 book - Part 8 of 31
Programmation fonctionnelle en JavaScript
The Ring programming language version 1.8 book - Part 35 of 202
Pandas Cheat Sheet
Pybelsberg — Constraint-based Programming in Python
미려한 UI/UX를 위한 여정
Data Structures in javaScript 2015
Clustering com numpy e cython
Ad

Similar to High performance GPU computing with Ruby RubyConf 2017 (20)

PDF
High performance GPU computing with Ruby
PPTX
Fosdem2017 Scientific computing on Jruby
PPTX
Scientific computing on jruby
PDF
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PDF
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PDF
Numba: Array-oriented Python Compiler for NumPy
PDF
Accelerating microbiome research with OpenACC
PDF
PDF
BLAS - Basic Linear Algebra subrutinas varias
PDF
Solving large sparse linear systems on the GPU
PDF
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
PDF
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
PDF
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
PDF
TR-CIS-0420-09 BobZigon
PDF
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
PDF
Moving Toward Deep Learning Algorithms on HPCC Systems
PDF
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
PDF
GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion
PPTX
primitiv: Neural Network Toolkit
PDF
Using Sparse Matrix for the Contact Calculation_ZhanWang
High performance GPU computing with Ruby
Fosdem2017 Scientific computing on Jruby
Scientific computing on jruby
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
Numba: Array-oriented Python Compiler for NumPy
Accelerating microbiome research with OpenACC
BLAS - Basic Linear Algebra subrutinas varias
Solving large sparse linear systems on the GPU
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
TR-CIS-0420-09 BobZigon
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Moving Toward Deep Learning Algorithms on HPCC Systems
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
GPU_Based_Image_Compression_and_Interpolation_with_Anisotropic_Diffusion
primitiv: Neural Network Toolkit
Using Sparse Matrix for the Contact Calculation_ZhanWang
Ad

Recently uploaded (20)

PDF
System and Network Administration Chapter 2
PDF
Digital Strategies for Manufacturing Companies
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
System and Network Administration Chapter 2
Digital Strategies for Manufacturing Companies
Softaken Excel to vCard Converter Software.pdf
Reimagine Home Health with the Power of Agentic AI​
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Understanding Forklifts - TECH EHS Solution
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
2025 Textile ERP Trends: SAP, Odoo & Oracle
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms II-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PTS Company Brochure 2025 (1).pdf.......
Which alternative to Crystal Reports is best for small or large businesses.pdf
Computer Software and OS of computer science of grade 11.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development

High performance GPU computing with Ruby RubyConf 2017

Editor's Notes