SlideShare a Scribd company logo
Achieving Improved Performance In Multi-threaded Programming With GPU Computing CSE 4120 :  Technical Writing & Seminar MD.  Mesbah Uddin Khan  [mesbahuk@gmail.com] Dated, July 13, 2011
Things we need to know CPU GPU Threads Multi-thread Programming Parallel Programming OpenCL GPU Computing
Introduction Using graphics hardware to enhance CPU based standard desktop applications is a question not only of programming models but also of critical optimizations that are required to achieve true performance improvements.
Hardware Trends  Two major hardware trends make parallel programming a crucial issue for all software engineers today. The rise of many-core CPU architectures The inclusion of powerful graphics processing units(GPUs).
Central Processing Unit (CPU) CPU support parallel programming Assigns threads to different tasks and coordinates their activities.  Newer programming models also consider the non-uniform memory architecture (NUMA) of modern desktop systems They rely on the underlying concept of parallel threads CPU hardware is optimized for such coarse-grained,  task-parallel programming with synchronized shared memory.
Graphics Processing Unit (GPU) GPUs are mainly designed for fine-grained, data-parallel computation. Graphics processing is an embarrassingly parallel problem.  GPU hardware is optimized for heavy loads.  It aims at combining a maximum number of simple parallel processing elements, each having only a small amount of local memory.  For example, the Nvidia Geforce GTX 480 graphics card supports up to 1,536 GPU threads on each of its 15 compute units. So, at full operational capacity, it can run 23,040 parallel execution streams.
From CPU to GPU Applications running on a computer can access GPU resources with the help of a control API implemented in user-mode libraries and the graphics card driver. Leading GPU-interested companies defined Open Computing Language (OpenCL), a vendor neutral way of accessing computable resources.
Example Application – Sudoku (1/2) A Sudoku field typically consists of 3x3 subfields with each having 3x3 places.  Three facts make this problem a representative example of algorithms appropriate for GPU execution: Data validation is the primary application task, The computational effort grows with the game fields size (i.e. the problem size), The workload belongs to the GPU-friendly class of embarrassingly parallel problems that have only a  very small serial execution portion.
Example Application – Sudoku (2/2) Fig: Execution time of the Sudoku validation on different compute devices (a) problem size of 10,000 to 50,000 possible Sudoku places and  (b) problem size of 100,000 to 700,000 Sudoku places.
Best CPU-GPU Practices To push the performance of GPU-enabled desktop applications even further requires fine-grained tuning of data placement and parallel activities on the GPU card. Algorithm Design Memory Transfer Control Flow Memory Types Memory Access Sizing Instructions Precision
Developer Support Vendors (like Nvidia and AMD) offer software development kits with different C compilers for Windows and Linux based systems. Developers can also utilize special libraries, such as  AMD’s Core Math Library, Nvidia’s libraries for basic linear algebra subroutines and fast Fourier transforms, etc. Nvidia and AMD also provide big knowledge bases with tutorials, examples, articles, use cases, and developer forums on their websites.
Concluding Remarks The GPU market continues to evolve quickly Nvidia has already started distinguishing between GPU computing for normal graphic cards and as a sole-purpose activity on processors such as its Tesla series. Higher-level languages like Java and C-Sharp can be benefited from GPU computing by using GPU-based libraries.
References “ Joint Forces: From Multithreaded Programming to GPU Computing,” IEEE Software,  Jan/Feb 2011 By Frank Feinbube,  Peter Troger and Andreas Polze.  Pseudorandomness Advanced Micro Devices,  ATI Stream Computing OpenCL Programming Guide,  June 2010. Nvidia OpenCL Best Practices Guide,  Version 2.3,  August 2009.
Thank you all…  

More Related Content

PPTX
Lec04 gpu architecture
PPTX
GPU Computing: A brief overview
PDF
GPU Programming with Java
PPTX
Gpu databases
PPT
Parallel computing with Gpu
PPTX
graphics processing unit ppt
PPTX
GPU Computing
Lec04 gpu architecture
GPU Computing: A brief overview
GPU Programming with Java
Gpu databases
Parallel computing with Gpu
graphics processing unit ppt
GPU Computing

What's hot (19)

PDF
GPU - An Introduction
PPT
Gpu and The Brick Wall
PDF
19564926 graphics-processing-unit
PDF
GPU power consumption and performance trends
PDF
Introduction to Computing on GPU
PDF
Volume 2-issue-6-2040-2045
PPTX
Graphics processing unit ppt
PPTX
Graphics processing unit
PDF
LCU13: GPGPU on ARM Experience Report
PDF
GPU Programming
PPTX
Graphics processing unit (GPU)
PDF
GPU - Basic Working
PPT
Gpu presentation
PPTX
PPTX
Graphic Processing Unit (GPU)
PPTX
PDF
Intel optimized tensorflow, distributed deep learning
PPTX
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
GPU - An Introduction
Gpu and The Brick Wall
19564926 graphics-processing-unit
GPU power consumption and performance trends
Introduction to Computing on GPU
Volume 2-issue-6-2040-2045
Graphics processing unit ppt
Graphics processing unit
LCU13: GPGPU on ARM Experience Report
GPU Programming
Graphics processing unit (GPU)
GPU - Basic Working
Gpu presentation
Graphic Processing Unit (GPU)
Intel optimized tensorflow, distributed deep learning
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Ad

Viewers also liked (6)

PDF
INTRODUCTION TO GPGPU AND PARALLEL COMPUTING (GPU ARCHITECTURE AND CUDA PROGR...
PPT
GPU Programming
PDF
Haskell Accelerate
PPTX
Lrz kurs: gpu and mic programming with r
PDF
Kato Mivule: An Overview of CUDA for High Performance Computing
PPTX
GPU Programming with CUDA
INTRODUCTION TO GPGPU AND PARALLEL COMPUTING (GPU ARCHITECTURE AND CUDA PROGR...
GPU Programming
Haskell Accelerate
Lrz kurs: gpu and mic programming with r
Kato Mivule: An Overview of CUDA for High Performance Computing
GPU Programming with CUDA
Ad

Similar to Achieving Improved Performance In Multi-threaded Programming With GPU Computing (20)

PDF
Computing using GPUs
PPTX
GPU in Computer Science advance topic .pptx
PPT
Cuda intro
PPTX
lecture11_GPUArchCUDA01.pptx
PDF
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
PDF
Introduction to GPU Programming
PPTX
Introduction to Accelerators
PPT
Lecture2 cuda spring 2010
PDF
Newbie’s guide to_the_gpgpu_universe
PPTX
gpuprogram_lecture,architecture_designsn
PDF
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
PDF
Cuda Without a Phd - A practical guick start
PPTX
Gpu with cuda architecture
PPTX
SeaJUG 5 15-2018
PDF
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
PDF
PDF
CSTalks - GPGPU - 19 Jan
PDF
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
PDF
Nvidia cuda tutorial_no_nda_apr08
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
Computing using GPUs
GPU in Computer Science advance topic .pptx
Cuda intro
lecture11_GPUArchCUDA01.pptx
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
Introduction to GPU Programming
Introduction to Accelerators
Lecture2 cuda spring 2010
Newbie’s guide to_the_gpgpu_universe
gpuprogram_lecture,architecture_designsn
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Cuda Without a Phd - A practical guick start
Gpu with cuda architecture
SeaJUG 5 15-2018
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
CSTalks - GPGPU - 19 Jan
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Nvidia cuda tutorial_no_nda_apr08
Using GPUs to handle Big Data with Java by Adam Roberts.

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
A Presentation on Artificial Intelligence
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
A Presentation on Artificial Intelligence
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Assigned Numbers - 2025 - Bluetooth® Document
Spectroscopy.pptx food analysis technology

Achieving Improved Performance In Multi-threaded Programming With GPU Computing

  • 1. Achieving Improved Performance In Multi-threaded Programming With GPU Computing CSE 4120 : Technical Writing & Seminar MD. Mesbah Uddin Khan [mesbahuk@gmail.com] Dated, July 13, 2011
  • 2. Things we need to know CPU GPU Threads Multi-thread Programming Parallel Programming OpenCL GPU Computing
  • 3. Introduction Using graphics hardware to enhance CPU based standard desktop applications is a question not only of programming models but also of critical optimizations that are required to achieve true performance improvements.
  • 4. Hardware Trends Two major hardware trends make parallel programming a crucial issue for all software engineers today. The rise of many-core CPU architectures The inclusion of powerful graphics processing units(GPUs).
  • 5. Central Processing Unit (CPU) CPU support parallel programming Assigns threads to different tasks and coordinates their activities. Newer programming models also consider the non-uniform memory architecture (NUMA) of modern desktop systems They rely on the underlying concept of parallel threads CPU hardware is optimized for such coarse-grained, task-parallel programming with synchronized shared memory.
  • 6. Graphics Processing Unit (GPU) GPUs are mainly designed for fine-grained, data-parallel computation. Graphics processing is an embarrassingly parallel problem. GPU hardware is optimized for heavy loads. It aims at combining a maximum number of simple parallel processing elements, each having only a small amount of local memory. For example, the Nvidia Geforce GTX 480 graphics card supports up to 1,536 GPU threads on each of its 15 compute units. So, at full operational capacity, it can run 23,040 parallel execution streams.
  • 7. From CPU to GPU Applications running on a computer can access GPU resources with the help of a control API implemented in user-mode libraries and the graphics card driver. Leading GPU-interested companies defined Open Computing Language (OpenCL), a vendor neutral way of accessing computable resources.
  • 8. Example Application – Sudoku (1/2) A Sudoku field typically consists of 3x3 subfields with each having 3x3 places. Three facts make this problem a representative example of algorithms appropriate for GPU execution: Data validation is the primary application task, The computational effort grows with the game fields size (i.e. the problem size), The workload belongs to the GPU-friendly class of embarrassingly parallel problems that have only a very small serial execution portion.
  • 9. Example Application – Sudoku (2/2) Fig: Execution time of the Sudoku validation on different compute devices (a) problem size of 10,000 to 50,000 possible Sudoku places and (b) problem size of 100,000 to 700,000 Sudoku places.
  • 10. Best CPU-GPU Practices To push the performance of GPU-enabled desktop applications even further requires fine-grained tuning of data placement and parallel activities on the GPU card. Algorithm Design Memory Transfer Control Flow Memory Types Memory Access Sizing Instructions Precision
  • 11. Developer Support Vendors (like Nvidia and AMD) offer software development kits with different C compilers for Windows and Linux based systems. Developers can also utilize special libraries, such as AMD’s Core Math Library, Nvidia’s libraries for basic linear algebra subroutines and fast Fourier transforms, etc. Nvidia and AMD also provide big knowledge bases with tutorials, examples, articles, use cases, and developer forums on their websites.
  • 12. Concluding Remarks The GPU market continues to evolve quickly Nvidia has already started distinguishing between GPU computing for normal graphic cards and as a sole-purpose activity on processors such as its Tesla series. Higher-level languages like Java and C-Sharp can be benefited from GPU computing by using GPU-based libraries.
  • 13. References “ Joint Forces: From Multithreaded Programming to GPU Computing,” IEEE Software, Jan/Feb 2011 By Frank Feinbube, Peter Troger and Andreas Polze. Pseudorandomness Advanced Micro Devices, ATI Stream Computing OpenCL Programming Guide, June 2010. Nvidia OpenCL Best Practices Guide, Version 2.3, August 2009.