Achieving Improved Performance In Multi-threaded Programming With GPU Computing

Achieving Improved Performance In Multi-threaded Programming With GPU Computing CSE 4120 : Technical Writing & Seminar MD. Mesbah Uddin Khan [mesbahuk@gmail.com] Dated, July 13, 2011

Things we need to know CPU GPU Threads Multi-thread Programming Parallel Programming OpenCL GPU Computing

Introduction Using graphics hardware to enhance CPU based standard desktop applications is a question not only of programming models but also of critical optimizations that are required to achieve true performance improvements.

Hardware Trends Two major hardware trends make parallel programming a crucial issue for all software engineers today. The rise of many-core CPU architectures The inclusion of powerful graphics processing units(GPUs).

Central Processing Unit (CPU) CPU support parallel programming Assigns threads to different tasks and coordinates their activities. Newer programming models also consider the non-uniform memory architecture (NUMA) of modern desktop systems They rely on the underlying concept of parallel threads CPU hardware is optimized for such coarse-grained, task-parallel programming with synchronized shared memory.

Graphics Processing Unit (GPU) GPUs are mainly designed for fine-grained, data-parallel computation. Graphics processing is an embarrassingly parallel problem. GPU hardware is optimized for heavy loads. It aims at combining a maximum number of simple parallel processing elements, each having only a small amount of local memory. For example, the Nvidia Geforce GTX 480 graphics card supports up to 1,536 GPU threads on each of its 15 compute units. So, at full operational capacity, it can run 23,040 parallel execution streams.

From CPU to GPU Applications running on a computer can access GPU resources with the help of a control API implemented in user-mode libraries and the graphics card driver. Leading GPU-interested companies defined Open Computing Language (OpenCL), a vendor neutral way of accessing computable resources.

Example Application – Sudoku (1/2) A Sudoku field typically consists of 3x3 subfields with each having 3x3 places. Three facts make this problem a representative example of algorithms appropriate for GPU execution: Data validation is the primary application task, The computational effort grows with the game fields size (i.e. the problem size), The workload belongs to the GPU-friendly class of embarrassingly parallel problems that have only a very small serial execution portion.

Example Application – Sudoku (2/2) Fig: Execution time of the Sudoku validation on different compute devices (a) problem size of 10,000 to 50,000 possible Sudoku places and (b) problem size of 100,000 to 700,000 Sudoku places.

Best CPU-GPU Practices To push the performance of GPU-enabled desktop applications even further requires fine-grained tuning of data placement and parallel activities on the GPU card. Algorithm Design Memory Transfer Control Flow Memory Types Memory Access Sizing Instructions Precision

Developer Support Vendors (like Nvidia and AMD) offer software development kits with different C compilers for Windows and Linux based systems. Developers can also utilize special libraries, such as AMD’s Core Math Library, Nvidia’s libraries for basic linear algebra subroutines and fast Fourier transforms, etc. Nvidia and AMD also provide big knowledge bases with tutorials, examples, articles, use cases, and developer forums on their websites.

Concluding Remarks The GPU market continues to evolve quickly Nvidia has already started distinguishing between GPU computing for normal graphic cards and as a sole-purpose activity on processors such as its Tesla series. Higher-level languages like Java and C-Sharp can be benefited from GPU computing by using GPU-based libraries.

References “ Joint Forces: From Multithreaded Programming to GPU Computing,” IEEE Software, Jan/Feb 2011 By Frank Feinbube, Peter Troger and Andreas Polze. Pseudorandomness Advanced Micro Devices, ATI Stream Computing OpenCL Programming Guide, June 2010. Nvidia OpenCL Best Practices Guide, Version 2.3, August 2009.

Achieving Improved Performance In Multi-threaded Programming With GPU Computing

More Related Content

What's hot (19)

Viewers also liked (6)

Similar to Achieving Improved Performance In Multi-threaded Programming With GPU Computing (20)

Recently uploaded (20)

Achieving Improved Performance In Multi-threaded Programming With GPU Computing