SlideShare a Scribd company logo
An Introduction to GPU
          3D Games to HPC
           Krishnaraj Rao
Presented at Bangalore DV Club, 03/12/2010
Agenda

 3D Graphics
    The Big Picture
    Quick Overview
    Programming Model
    Importance of 3D


 High Performance Parallel Computing
    Why GPUs for HPPC?
    Available APIs
    GPU Computing architecture


 Q&A
The Big Picture – Movies




Capture      Models   Scene   Rendering   Post
                      API                 Processing


Creation
           Creation
The Big Picture - Games




Capture      Models   Scene   Rendering   Post
                      API                 Processing
                              GPU’s
                              Drivers
Creation              HLSL,
           Creation   Cg
Models end up in World Space
     Worldspace includes everything!
     Position and orientation for all
     items is needed to accurately calculate
     transformations into screen space.
                                               Light Source




               Y

                           Z                                  View Point
                                                              or Camera

                                                 Screen



                                  X
         World Coordinate Space
View Transformation world ends up
on Screen




   Screen Coordinate Space
Simple Interactive 3D Graphics App

 A simple example
     Static scene geometry,       Vertex
                                                Setup       Raster
                                                                       Fragment      Raster
                                  Engine                                Engine        Ops
     moving viewer
 Repeat this loop:                                          Z Cull     Texture
     CPU takes user input from
     joystick or mouse
     CPU re-calculates viewer
     position, view direction,
     and light positions in 3-D
     world space
     GPU clears memory and                               Update Viewer
                                      Read                                        Draw all
     draws the complete scene        Joystick           Position and Light         Scene
     geometry with the new           Position               Direction             Objects

     viewer and light positions
     Repeat forever
Adding Programmability to the
    Graphics Pipeline
           3D Application
             or Game


      3D API
    Commands

             3D API:
            OpenGL or
             Direct3D
                                                          CPU – GPU Boundary


           GPU                                          Assembled
    Command &                                           Polygons,                            Pixel
    Data Stream             Vertex Index                Lines, and                          Location                  Pixel
                              Stream                      Points                            Stream                   Updates
               GPU
                                           Primitive                   Rasterization &                   Raster
               Front                                                                                                           Framebuffer
                                           Assembly                     Interpolation                   Operations
               End


                                                                   Rasterized
Pre-transformed                                  Transformed Pre-transformed                                  Transformed
        Vertices                                 Vertices          Fragments                                  Fragments

                        Programmable                                                     Programmable
                            Vertex                                                         Fragment
                          Processor                                                        Processor
A History of Innovation




    1995                 1999          2002          2003          2004          2005        2006-2007
    NV1               GeForce 256    GeForce4      GeForce FX    GeForce 6     GeForce 7     GeForce 8
  1 Million            22 Million    63 Million    130 Million   222 Million   302 Million   754 Million
 Transistors           Transistors   Transistors   Transistors   Transistors   Transistors   Transistors
                                                                                                                2008
                                                                                                           GeForce GTX 200
                                                                                                              1.4 Billion
                                                                                                             Transistors



                 …. but what do all these extra transistors do?

NVIDIA Confidential
GPU continues to offload CPU work
        Geom               Geom       Triangle         Pixel
                                                                     Z / Blend
        Gather             Proc         Proc           Proc
                                                                                             1996
                  CPU                                          GPU




        Geom               Geom       Triangle         Pixel
                                                                     Z / Blend
        Gather             Proc         Proc           Proc                                  2000
   CPU                                             GPU


Scene            Physics     Geom       Geom       Triangle      Pixel
                                                                                 Z / Blend
Mgmt              and AI     Gather     Proc         Proc        Proc
                                                                                             2004
        CPU                                            GPU



Scene            Physics     Geom       Geom       Triangle      Pixel
                                                                                 Z / Blend
Mgmt              and AI     Gather     Proc         Proc        Proc
                                                                                             2008
  CPU                                            GPU
Programming Model
 API: Set of functions, procedures or classes
 that an OS, library or service provides to
 support requests made by computer
 programs
 DirectX: Collection of APIs to handle
 multimedia, esp. game programming and
 video tasks, on MS platforms.
 OpenGL (Open Graphics Library) is a
 standard specification defining a cross-
 language, cross-platform API for writing
 applications that produce 2D and 3D
 computer graphics.
Why is 3D Graphics important?
More than just Fun and Games....




Tokyo, Japan                       California Coastline
3D Consumer Applications
  Vista      Office        PDFs




  Music       Photos       Maps
GPUS IN HPC
Evolution of Processors


 Massive
   Data
Parallelism




Instruction
   Level
Parallelism



              Data Fits in Cache   Huge Data Sets
GPU Processing Power
CPU, meet your new partner!

                                         GPU



                 CPU    GPU
    Intel Core i7 965   NVIDIA GTX 285
              4 cores   240 cores
         102 GFLOPS     1.04 TFLOPS


                                         CPU
Beyond Graphics

 With floating-point math and textures, graphics
 processors can be used for more than just graphics
    GPGPU = “General Purpose Computing on GPUs”


 Lots of ongoing research mapping algorithms and
 problems onto programmable GPUs
    Solving Linear Equations
    Black-Scholes Options Pricing
    Rigid- and Soft-Body Dynamics


 Middleware layers being developed to accelerate
 “eye candy” game physics on GPUs (HavokFX)
What is GPGPU ?
  General Purpose computation using GPU
  in applications other than 3D graphics
     GPU accelerates critical path of application
  Data parallel algorithms leverage GPU attributes
     Large data arrays, streaming throughput
     Fine-grain SIMD parallelism
     Floating point (FP) computation
  Great for “embarrassingly parallel” algorithms

  Applications – see //GPGPU.org
     Game effects (FX) physics, image processing
     Physical modeling, computational engineering, matrix
     algebra, convolution, correlation, sorting
Why Computation on the GPU?
  A quiet buildup of potential
       Calculation Throughput and Memory Bandwidth: 10X
       Equivalent performance at fraction of power & cost
       GPU in every PC – pervasive presence and massive impact
  GPUs have always been parallel “multi-core”
  Natively designed to handle massive threading
  Every pixel is a thread
  Increased precision (fp32), programmability, flexibility
  GPUs are a mass-market parallel processor
       Economies of scale
  Peak floating point performance is much higher than comparable
  CPUs

    ATI x1900XT                     Intel Core 2 Duo E6600
    $400 (video card)              $400 (processor only)
    250 GFLOPs (SP Float)          40 GFLOPS (SP Float)
    46 GB main memory BW           8.5 GB main memory BW
Why Computation on the GPU?
  Supercomputing Performance
     Inherently Parallel Architecture
     1000+ cores, massively parallel processing
     250x the compute performance of a PC
  Personal
     “One Researcher, One Supercomputer”
     Supercomputer in a desktop system
     Plugs into standard power strip
  Accessible
     Program in C, C++, Fortran for Windows or Linux
     Available from OEMs and resellers worldwide and priced
     like a workstation
Compute Applications
  Computational Fluid Dynamics   Data Mining, Analytics &
  Computer Aided Engineering     Databases
  Digital Content Creation       MATLAB Acceleration
  Electronic Design Automation   Molecular Dynamics
  Finance                        Weather, Atmospheric, Ocean
  Game Physics                   Modeling, and Space Sciences
  Graphics                       Libraries
  Imaging and Computer Vision    Oil & Gas
  Medical Imaging                Programming Tools
  Numerics                       Ray Tracing
  Bio-Informatics and Life       Signal Processing
  Sciences                       Video & Audio
  Computational Chemistry
  Computational
  Electromagnetics &
  Electrodynamics
Heterogeneous Computing




   Multi-Core   Parallel-Core
     CPU            GPU
APIS FOR HETEROGENEOUS COMPUTING
APIs for Heterogeneous Computing
 CUDA (Compute Unified Device Architecture) is a
 parallel computing architecture developed by NVIDIA.
 Programmers use 'C for CUDA' (C with NVIDIA
 extensions), compiled through a PathScale Open64 C
 compiler, to code algorithms for execution on the
 GPU. Both low/high level APIs are provided
 OpenCL (Open Computing Language) is a framework
 for writing programs that execute across
 heterogeneous platforms consisting of CPUs, GPUs,
 and other processors.
 Microsoft DirectCompute is an API that supports
 General-purpose computing on GPUs on Microsoft
 Win Vista or Win 7. DirectCompute is part of the
 Microsoft DirectX collection of APIs.
OpenCL
OpenCL: Platform Model & Program Structure

   One Host+ one or more Compute Devices
      Each Compute Device is composed of one or more
      Compute Units
      Each Compute Unit is further divided into one or more
      Processing Elements
CUDA Parallel Computing Architecture


ISA and hardware
compute engine

Includes a C-compiler
plus support for
OpenCL and
DX11 Compute

Architected to natively
support all
computational
interfaces
(standard languages
and APIs)
Option 1
OpenCL and C for CUDA


                                         Entry point for
                            C for CUDA   developers who
                                         prefer high-level C

      Entry point for
developers who want     OpenCL
       low-level API

   Shared back-end
      compiler and          PTX
       optimization
        technology

                            GPU
CUDA Success—Science & Computation
Not 2x or 3x, but speedups are 20x to 150x




    146X            36X              18X              50X             100X

    Medical      Molecular           Video          Matlab         Astrophysic
   Imaging       Dynamics         Transcoding     Computing             s
   U of Utah    U of Illinois,   Elemental Tech   AccelerEyes        RIKEN
                  Urbana




    149X             47X             20X             130X              30X

    Financial   Linear Algebra        3D           Quantum             Gene
   simulation    Universidad      Ultrasound       Chemistry        Sequencing
     Oxford         Jaime         Techniscan      U of Illinois,   U of Maryland
                                                    Urbana
100x more affordable
                                        20x less power
                                                                     Tesla
      250x                               consumption               Personal
                                                                 Supercomputer
Performance




                    Supercomputing
                        Cluster                       250x
                                                     Faster




              1x                                                    Today’s
                                                                  Workstations




                   $100K - $1M                              < $10 K
                                        Accessibility
Solving the World’s Most Complex 
Challenges


                                  Film



      Science
                                           Auto Design
                      Oil & Gas




    Medicine



                Broadcast           Space Exploration
Grand Computing Challenges




                Personalized    Mathematics for   Information
 Renewable
                 Medicine          Scientific     Data Mining
  Energy
                                  Discovery




Machines That   Natural Human       Predict
                                                  Economic
   Think           Machine       Environmental
                                                  Analysis
                  Interaction      Changes
Final Thoughts

 GPU and heterogeneous parallel
 architecture will revolutionize computing

 Parallel computing needed to solve some of
 the most interesting and important human
 challenges ahead

 Learning parallel programming is imperative
 for students in computing and sciences
From Virtua Fighter to Tsubame


       1995 – NV1         2008 – GT200
       0.8M transistors   1,200M transistors

           50MHz                1.3GHz

          1M Bytes            4G Bytes

         0 GFLOPS            1 TFLOPS



   Another 1000x in 15 years?
BACKUP
Graphics API History
Open GL
1992: OpenGL 1.0
1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing)
1998: OpenGL 1.2 (3D Textures, BGRA pixel format)
1998: OpenGL 1.2.1 (Multi-Texture)
2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures)
2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation)
2003: OpenGL 1.5 (Vertex Attr from Vid Mem)
2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex)
2006: OpenGL 2.1 (GLSL1.2, sRGB Textures)
2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures)
2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI)
2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)
OpenGL ES

 Designed for hand-held and embedded devices
    Goal is smaller footprint to support OpenGL
    PlayStation 3 and cell phone industry adopting ES
 OpenGL ES 1.1
    Strips out anything deemed extra in OpenGL
    Keeps conventional fixed-function vertex and fragment
    processing
 OpenGL ES 2.0
    Adds programmable vertex and fragment shaders
    Shaders specified in binary format
    Drops support for fixed-function vertex and fragment
    processing
OpenGL ES – Cont


 OpenGL ES 1.0 : Symbian OS, Android Platform
 OpenGL ES 1.0+ : Playstation 3
 OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)
 Open GL ES 2.0 : iPhone 3GS, iPOD touch
DirectX

GDI: legacy Windows graphics API ~1985
DirectX 1.0 – 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput)
DirectX 3.0 – 1996 (Rasterization only 3D Support, Akward prog. Model, Not
successful)
DirectX 5.0 – 1997 (Draw Primitives, DirectX vs OpenGL War)
DirectX 6.0 – 1998 (Multitexture, OGL/Glide features, Texture Compression)
DirectX 7.0 – 1999 (Geometry HW accleration and Blending, Cube mapping)
DirectX 8.0 – 2000/1 (Programable VS/PS Shaders, XBOX)
DirectX 9.0 – 2002-2003 (More programmability, Branching, FP pixel prog.)
DirectX 9.0c – 2004 (ShaderModel 3.0)
DirectX 10.0 – 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output)
DirectX 10.1 – 2008 (SM4.1, Better Image Quality)
DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)

More Related Content

PDF
3 d to _hpc
PDF
Nvidia Cuda Apps Jun27 11
PDF
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
PDF
CG simple openGL point & line-course 2
PDF
Creating next-gen VR and MR experiences using Varjo VR-1 and XR-1 - Unite Cop...
PDF
Java me 08-mobile3d
PPTX
Getting Space Pirate Trainer* to Perform on Intel® Graphics
PPSX
TressFX The Fast and The Furry by Nicolas Thibieroz
3 d to _hpc
Nvidia Cuda Apps Jun27 11
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
CG simple openGL point & line-course 2
Creating next-gen VR and MR experiences using Varjo VR-1 and XR-1 - Unite Cop...
Java me 08-mobile3d
Getting Space Pirate Trainer* to Perform on Intel® Graphics
TressFX The Fast and The Furry by Nicolas Thibieroz

What's hot (16)

PPTX
Getting started with Ray Tracing in Unity 2019.3 - Unite Copenhagen 2019
PPTX
Siggraph 2016 - Vulkan and nvidia : the essentials
PDF
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
PDF
Compute API –Past & Future
PDF
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
PDF
Poser pro reference manual
PPTX
Optimizing Total War*: WARHAMMER II
PPTX
iMinds The Conference: Jan Lemeire
KEY
Gtug
PDF
Scalability for All: Unreal Engine* 4 with Intel
PDF
Hardware Accelerated 2D Rendering for Android
PPTX
Media SDK Webinar 2014
PPSX
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
PDF
Mini Robot Fighter
PPTX
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
PPTX
OpenGL ES EGL Spec&APIs
Getting started with Ray Tracing in Unity 2019.3 - Unite Copenhagen 2019
Siggraph 2016 - Vulkan and nvidia : the essentials
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Compute API –Past & Future
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
Poser pro reference manual
Optimizing Total War*: WARHAMMER II
iMinds The Conference: Jan Lemeire
Gtug
Scalability for All: Unreal Engine* 4 with Intel
Hardware Accelerated 2D Rendering for Android
Media SDK Webinar 2014
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
Mini Robot Fighter
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
OpenGL ES EGL Spec&APIs
Ad

Viewers also liked (6)

PPTX
How Productive are your Employees
PDF
2001 bj-apdiots
PPTX
Top 20 Business Buzzwords: What They Really Mean
 
PPTX
Questionnaire results
PPT
Escassez de água no brasil
PDF
How to Become a Thought Leader in Your Niche
How Productive are your Employees
2001 bj-apdiots
Top 20 Business Buzzwords: What They Really Mean
 
Questionnaire results
Escassez de água no brasil
How to Become a Thought Leader in Your Niche
Ad

Similar to 2D Games to HPC (20)

PDF
Verification of Graphics ASICs (Part II)
PDF
Yang greenstein part_2
PDF
GeForce 8800 OpenGL Extensions
PPT
CS 354 GPU Architecture
PDF
Modern Graphics Pipeline Overview
PDF
Realizing OpenGL
PDF
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
KEY
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
PDF
A0280105
PPT
NVIDIA Graphics, Cg, and Transparency
PPT
SIGGRAPH 2012: NVIDIA OpenGL for 2012
PDF
GPU Virtualization on VMware's Hosted I/O Architecture
PPT
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
PDF
Casing3d opengl
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PDF
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
PPT
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
PDF
GPU - how can we use it?
PPTX
Making a game with Molehill: Zombie Tycoon
ODP
Alternativa3D_en
Verification of Graphics ASICs (Part II)
Yang greenstein part_2
GeForce 8800 OpenGL Extensions
CS 354 GPU Architecture
Modern Graphics Pipeline Overview
Realizing OpenGL
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
A0280105
NVIDIA Graphics, Cg, and Transparency
SIGGRAPH 2012: NVIDIA OpenGL for 2012
GPU Virtualization on VMware's Hosted I/O Architecture
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
Casing3d opengl
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
GPU - how can we use it?
Making a game with Molehill: Zombie Tycoon
Alternativa3D_en

More from DVClub (20)

PDF
IP Reuse Impact on Design Verification Management Across the Enterprise
PDF
Cisco Base Environment Overview
PDF
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
PDF
Verification of Graphics ASICs (Part I)
PDF
Stop Writing Assertions! Efficient Verification Methodology
PPT
Validating Next Generation CPUs
PPT
Verification Automation Using IPXACT
PDF
Validation and Design in a Small Team Environment
PDF
Trends in Mixed Signal Validation
PDF
Verification In A Global Design Community
PDF
Design Verification Using SystemC
PDF
Verification Strategy for PCI-Express
PDF
SystemVerilog Assertions (SVA) in the Design/Verification Process
PDF
Efficiency Through Methodology
PDF
Pre-Si Verification for Post-Si Validation
PDF
OpenSPARC T1 Processor
PDF
Intel Atom Processor Pre-Silicon Verification Experience
PDF
Using Assertions in AMS Verification
PDF
Low-Power Design and Verification
PDF
UVM Update: Register Package
IP Reuse Impact on Design Verification Management Across the Enterprise
Cisco Base Environment Overview
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
Verification of Graphics ASICs (Part I)
Stop Writing Assertions! Efficient Verification Methodology
Validating Next Generation CPUs
Verification Automation Using IPXACT
Validation and Design in a Small Team Environment
Trends in Mixed Signal Validation
Verification In A Global Design Community
Design Verification Using SystemC
Verification Strategy for PCI-Express
SystemVerilog Assertions (SVA) in the Design/Verification Process
Efficiency Through Methodology
Pre-Si Verification for Post-Si Validation
OpenSPARC T1 Processor
Intel Atom Processor Pre-Silicon Verification Experience
Using Assertions in AMS Verification
Low-Power Design and Verification
UVM Update: Register Package

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Modernizing your data center with Dell and AMD
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Modernizing your data center with Dell and AMD
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

2D Games to HPC

  • 1. An Introduction to GPU 3D Games to HPC Krishnaraj Rao Presented at Bangalore DV Club, 03/12/2010
  • 2. Agenda 3D Graphics The Big Picture Quick Overview Programming Model Importance of 3D High Performance Parallel Computing Why GPUs for HPPC? Available APIs GPU Computing architecture Q&A
  • 3. The Big Picture – Movies Capture Models Scene Rendering Post API Processing Creation Creation
  • 4. The Big Picture - Games Capture Models Scene Rendering Post API Processing GPU’s Drivers Creation HLSL, Creation Cg
  • 5. Models end up in World Space Worldspace includes everything! Position and orientation for all items is needed to accurately calculate transformations into screen space. Light Source Y Z View Point or Camera Screen X World Coordinate Space
  • 6. View Transformation world ends up on Screen Screen Coordinate Space
  • 7. Simple Interactive 3D Graphics App A simple example Static scene geometry, Vertex Setup Raster Fragment Raster Engine Engine Ops moving viewer Repeat this loop: Z Cull Texture CPU takes user input from joystick or mouse CPU re-calculates viewer position, view direction, and light positions in 3-D world space GPU clears memory and Update Viewer Read Draw all draws the complete scene Joystick Position and Light Scene geometry with the new Position Direction Objects viewer and light positions Repeat forever
  • 8. Adding Programmability to the Graphics Pipeline 3D Application or Game 3D API Commands 3D API: OpenGL or Direct3D CPU – GPU Boundary GPU Assembled Command & Polygons, Pixel Data Stream Vertex Index Lines, and Location Pixel Stream Points Stream Updates GPU Primitive Rasterization & Raster Front Framebuffer Assembly Interpolation Operations End Rasterized Pre-transformed Transformed Pre-transformed Transformed Vertices Vertices Fragments Fragments Programmable Programmable Vertex Fragment Processor Processor
  • 9. A History of Innovation 1995 1999 2002 2003 2004 2005 2006-2007 NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8 1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million Transistors Transistors Transistors Transistors Transistors Transistors Transistors 2008 GeForce GTX 200 1.4 Billion Transistors …. but what do all these extra transistors do? NVIDIA Confidential
  • 10. GPU continues to offload CPU work Geom Geom Triangle Pixel Z / Blend Gather Proc Proc Proc 1996 CPU GPU Geom Geom Triangle Pixel Z / Blend Gather Proc Proc Proc 2000 CPU GPU Scene Physics Geom Geom Triangle Pixel Z / Blend Mgmt and AI Gather Proc Proc Proc 2004 CPU GPU Scene Physics Geom Geom Triangle Pixel Z / Blend Mgmt and AI Gather Proc Proc Proc 2008 CPU GPU
  • 11. Programming Model API: Set of functions, procedures or classes that an OS, library or service provides to support requests made by computer programs DirectX: Collection of APIs to handle multimedia, esp. game programming and video tasks, on MS platforms. OpenGL (Open Graphics Library) is a standard specification defining a cross- language, cross-platform API for writing applications that produce 2D and 3D computer graphics.
  • 12. Why is 3D Graphics important? More than just Fun and Games.... Tokyo, Japan California Coastline
  • 13. 3D Consumer Applications Vista Office PDFs Music Photos Maps
  • 15. Evolution of Processors Massive Data Parallelism Instruction Level Parallelism Data Fits in Cache Huge Data Sets
  • 16. GPU Processing Power CPU, meet your new partner! GPU CPU GPU Intel Core i7 965 NVIDIA GTX 285 4 cores 240 cores 102 GFLOPS 1.04 TFLOPS CPU
  • 17. Beyond Graphics With floating-point math and textures, graphics processors can be used for more than just graphics GPGPU = “General Purpose Computing on GPUs” Lots of ongoing research mapping algorithms and problems onto programmable GPUs Solving Linear Equations Black-Scholes Options Pricing Rigid- and Soft-Body Dynamics Middleware layers being developed to accelerate “eye candy” game physics on GPUs (HavokFX)
  • 18. What is GPGPU ? General Purpose computation using GPU in applications other than 3D graphics GPU accelerates critical path of application Data parallel algorithms leverage GPU attributes Large data arrays, streaming throughput Fine-grain SIMD parallelism Floating point (FP) computation Great for “embarrassingly parallel” algorithms Applications – see //GPGPU.org Game effects (FX) physics, image processing Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting
  • 19. Why Computation on the GPU? A quiet buildup of potential Calculation Throughput and Memory Bandwidth: 10X Equivalent performance at fraction of power & cost GPU in every PC – pervasive presence and massive impact GPUs have always been parallel “multi-core” Natively designed to handle massive threading Every pixel is a thread Increased precision (fp32), programmability, flexibility GPUs are a mass-market parallel processor Economies of scale Peak floating point performance is much higher than comparable CPUs ATI x1900XT Intel Core 2 Duo E6600 $400 (video card) $400 (processor only) 250 GFLOPs (SP Float) 40 GFLOPS (SP Float) 46 GB main memory BW 8.5 GB main memory BW
  • 20. Why Computation on the GPU? Supercomputing Performance Inherently Parallel Architecture 1000+ cores, massively parallel processing 250x the compute performance of a PC Personal “One Researcher, One Supercomputer” Supercomputer in a desktop system Plugs into standard power strip Accessible Program in C, C++, Fortran for Windows or Linux Available from OEMs and resellers worldwide and priced like a workstation
  • 21. Compute Applications Computational Fluid Dynamics Data Mining, Analytics & Computer Aided Engineering Databases Digital Content Creation MATLAB Acceleration Electronic Design Automation Molecular Dynamics Finance Weather, Atmospheric, Ocean Game Physics Modeling, and Space Sciences Graphics Libraries Imaging and Computer Vision Oil & Gas Medical Imaging Programming Tools Numerics Ray Tracing Bio-Informatics and Life Signal Processing Sciences Video & Audio Computational Chemistry Computational Electromagnetics & Electrodynamics
  • 22. Heterogeneous Computing Multi-Core Parallel-Core CPU GPU
  • 24. APIs for Heterogeneous Computing CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. Both low/high level APIs are provided OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. Microsoft DirectCompute is an API that supports General-purpose computing on GPUs on Microsoft Win Vista or Win 7. DirectCompute is part of the Microsoft DirectX collection of APIs.
  • 26. OpenCL: Platform Model & Program Structure One Host+ one or more Compute Devices Each Compute Device is composed of one or more Compute Units Each Compute Unit is further divided into one or more Processing Elements
  • 27. CUDA Parallel Computing Architecture ISA and hardware compute engine Includes a C-compiler plus support for OpenCL and DX11 Compute Architected to natively support all computational interfaces (standard languages and APIs)
  • 28. Option 1 OpenCL and C for CUDA Entry point for C for CUDA developers who prefer high-level C Entry point for developers who want OpenCL low-level API Shared back-end compiler and PTX optimization technology GPU
  • 29. CUDA Success—Science & Computation Not 2x or 3x, but speedups are 20x to 150x 146X 36X 18X 50X 100X Medical Molecular Video Matlab Astrophysic Imaging Dynamics Transcoding Computing s U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN Urbana 149X 47X 20X 130X 30X Financial Linear Algebra 3D Quantum Gene simulation Universidad Ultrasound Chemistry Sequencing Oxford Jaime Techniscan U of Illinois, U of Maryland Urbana
  • 30. 100x more affordable 20x less power Tesla 250x consumption Personal Supercomputer Performance Supercomputing Cluster 250x Faster 1x Today’s Workstations $100K - $1M < $10 K Accessibility
  • 31. Solving the World’s Most Complex  Challenges Film Science Auto Design Oil & Gas Medicine Broadcast Space Exploration
  • 32. Grand Computing Challenges Personalized Mathematics for Information Renewable Medicine Scientific Data Mining Energy Discovery Machines That Natural Human Predict Economic Think Machine Environmental Analysis Interaction Changes
  • 33. Final Thoughts GPU and heterogeneous parallel architecture will revolutionize computing Parallel computing needed to solve some of the most interesting and important human challenges ahead Learning parallel programming is imperative for students in computing and sciences
  • 34. From Virtua Fighter to Tsubame 1995 – NV1 2008 – GT200 0.8M transistors 1,200M transistors 50MHz 1.3GHz 1M Bytes 4G Bytes 0 GFLOPS 1 TFLOPS Another 1000x in 15 years?
  • 37. Open GL 1992: OpenGL 1.0 1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing) 1998: OpenGL 1.2 (3D Textures, BGRA pixel format) 1998: OpenGL 1.2.1 (Multi-Texture) 2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures) 2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation) 2003: OpenGL 1.5 (Vertex Attr from Vid Mem) 2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex) 2006: OpenGL 2.1 (GLSL1.2, sRGB Textures) 2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures) 2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI) 2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)
  • 38. OpenGL ES Designed for hand-held and embedded devices Goal is smaller footprint to support OpenGL PlayStation 3 and cell phone industry adopting ES OpenGL ES 1.1 Strips out anything deemed extra in OpenGL Keeps conventional fixed-function vertex and fragment processing OpenGL ES 2.0 Adds programmable vertex and fragment shaders Shaders specified in binary format Drops support for fixed-function vertex and fragment processing
  • 39. OpenGL ES – Cont OpenGL ES 1.0 : Symbian OS, Android Platform OpenGL ES 1.0+ : Playstation 3 OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models) Open GL ES 2.0 : iPhone 3GS, iPOD touch
  • 40. DirectX GDI: legacy Windows graphics API ~1985 DirectX 1.0 – 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput) DirectX 3.0 – 1996 (Rasterization only 3D Support, Akward prog. Model, Not successful) DirectX 5.0 – 1997 (Draw Primitives, DirectX vs OpenGL War) DirectX 6.0 – 1998 (Multitexture, OGL/Glide features, Texture Compression) DirectX 7.0 – 1999 (Geometry HW accleration and Blending, Cube mapping) DirectX 8.0 – 2000/1 (Programable VS/PS Shaders, XBOX) DirectX 9.0 – 2002-2003 (More programmability, Branching, FP pixel prog.) DirectX 9.0c – 2004 (ShaderModel 3.0) DirectX 10.0 – 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output) DirectX 10.1 – 2008 (SM4.1, Better Image Quality) DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)