SlideShare a Scribd company logo
HETEROGENEOUS SYSTEM ARCHITECTURE

AND

THE HSA FOUNDATION
INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)



HSA is a purpose designed architecture to enable the
software ecosystem to combine and exploit the
complementary capabilities of sequential programming
elements (CPUs) and parallel processing elements (such as
GPUs) to deliver new capabilities to users that go beyond
the traditional usage scenarios

AMD is making HSA an open standard to jumpstart the
ecosystem



2 | Heterogeneous System Architecture   | June 2012
EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA


   APP Accelerated Software                                                  Accelerated Processing Unit
         Applications




                                        Graphics Workloads


                                        Data Parallel Workloads


                                        Serial and Task Parallel Workloads




3 | Heterogeneous System Architecture   | June 2012
AMD HSA FEATURE ROADMAP


         Physical                                     Optimized          Architectural              System
        Integration                                   Platforms           Integration             Integration


 Integrate CPU & GPU                            GPU Compute C++       Unified Address Space   GPU compute context
        in silicon                                  support             for CPU and GPU             switch


                                                                       GPU uses pageable
      Unified Memory                              HSA Memory                                   GPU graphics pre-
                                                                       system memory via
        Controller                               Management Unit                                   emption
                                                                          CPU pointers


         Common                                Bi-Directional Power
                                                                      Fully coherent memory
       Manufacturing                           Mgmt between CPU                                 Quality of service
                                                                       between CPU & GPU
        Technology                                   and GPU




4 | Heterogeneous System Architecture   | June 2012
HSA COMPLIANT FEATURES



             Optimized
             Platforms

                                            Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language.
     GPU Compute C++                        This eases programming of both CPU and GPU working together to process
         support                            parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc.


                                            CPU and GPU can share system memory. This means all system memory is
        HSA Memory                          accessible by both CPU or GPU, depending on need. In today’s world, only a
       Management Unit                      subset of system memory can be used by the GPU.


    Bi-Directional Power                    Enables “power sloshing” where CPU and GPU are able to dynamically lower or
    Mgmt between CPU                        raise their power and performance, depending on the activity and which one is
          and GPU                           more suited to the task at hand.



5 | Heterogeneous System Architecture   | June 2012
HSA COMPLIANT FEATURES



        Architectural
         Integration

                                            The unified address space provides ease of programming for developers to create
   Unified Address Space
     for CPU and GPU
                                            applications. For HSA platforms, a pointer is really a pointer and does not require
                                            separate memory pointers for CPU and GPU.

     GPU uses pageable                      The GPU can take advantage of the CPU virtual address space. With pageable
     system memory via                      system memory, the GPU can reference the data directly in the CPU domain. In
        CPU pointers                        prior architectures, data had to be copied between the two spaces or page-locked
                                            prior to use.
                                            Allows for data to be cached by both the CPU and the GPU, and referenced by
   Fully coherent memory                    either. In all previous generations, GPU caches had to be flushed at command
    between CPU & GPU                       buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU
                                            and GPU in an APU share a high speed coherent bus.


6 | Heterogeneous System Architecture   | June 2012
FULL HSA FEATURES


            System
          Integration

                                            GPU tasks can be context switched, making the GPU a multi-tasker. Context
   GPU compute context                      switching means faster application, graphics and compute
         switch
                                            interoperation. Users get a snappier, more interactive experience.

                                            As more applications enjoy the performance and features of the GPU, it is important
     GPU graphics pre-                      that interactivity of the system is good. This means low latency access to the GPU
         emption                            from any process.


                                            With context switching and pre-emption, time criticality is added to the tasks
       Quality of service                   assigned to the processors. Direct access to the hardware for multi-users or
                                            multiple applications are either prioritized or equalized.




7 | Heterogeneous System Architecture   | June 2012
UNLEASHING DEVELOPER INNOVATION
PROBLEM                               HSA + SDKs =                                                                SOLUTION
                                      Productivity & Performance with low Power

                                          Few M
                                                        Few K
                                                                    Wide range of        GPU/HW blocks hard to program
                                           HSA                      Differentiated       Not all workloads accelerate
                                                        Apps
                                          Coders                    Experiences
 Developer
  Return                                                                                      ~100K
                                                                                                       ~200+
                                                                                                                Significant
                                                                                               GPU                niche
(Differentiation in                                                                                    Apps
                                                                                              Coders              Value
  Performance,                    Developers historically program CPUs
      Power,
    Features,                        ~30+M
                                                        ~4M+        Good User
  Time2Market)                        CPU
                                                        Apps       Experiences
                                     Coders




                                                                Developer Investment
                                                                 (Effort, Time, New skills)

  8 | Heterogeneous System Architecture   | June 2012
HSA SOLUTION STACK

 How we deliver the HSA value
  proposition                                                                                               Application




                                                         SW Developers
                                                                                                         Domain Specific Libs
 Overall Vision:                                                           Standard SW                   (Bolt, OpenCV,…)
   – Make GPU easily accessible
                                                                                                             OpenCL       DirectX       Other
        Support mainstream languages                                                                        Runtime      Runtime      Runtime
        Expandable to domain specific
         languages
                                                                                                                                 Legacy
   – Make compute offload efficient                                                                   HSA Runtime
                                                                                                                                User Mode
        Direct path to GPU (avoid Graphics                                                                                      Drivers
         overhead)                                                                                           HSAIL
        Eliminate memory copy                           HW Vendors
                                                                                                        Finalizer
        Low-latency dispatch                                             Custom Drivers
                                                                                                             GPU ISA
   – Make it ubiquitous
                                                                                                                                       Other
        Drive HSA as a standard through                                 Differentiated HW   CPU(s)           GPU(s)
                                                                                                                                    Accelerators
         HSA Foundation
        Open Source key components

   9 | Heterogeneous System Architecture   | June 2012
HSA INTERMEDIATE LAYER - HSAIL

 HSAIL is a virtual ISA for parallel programs
     Finalized to native ISA by a JIT compiler or “Finalizer”

 Allow rapid innovations in native GPU architectures
     HSAIL will be constant across implementations

 Explicitly parallel
     Designed for data parallel programming

 Support for exceptions, virtual functions, and other high level language features

 Syscall methods
     GPU code can call directly to system services, IO, printf, etc

 Debugging support


10 | Heterogeneous System Architecture   | June 2012
C++ AMP

 C++ AMP: a data parallel programming model initiated by Microsoft for accelerators
     First announced at the 2011 AFDS

 C++ based higher level programming model with advanced C++11 features

 Single source model to well integrate host and device programming

 Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding
  host-to-device copies

 A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta
  release




11 | Heterogeneous System Architecture   | June 2012
C++ AMP AND HSA

 Compute-focused efficient HSA implementation to replace a graphics-centric implementation
  for C++ AMP
     E.g. low latency dispatch, HSAIL enabled

 The shared virtual memory in HSA eliminates the data copies between host and device in
  existing C++ AMP programs without any source changes.

 Additional advanced C++ features on GPU, e.g.
     More data types
     Function calls
     Virtual functions
     Arbitrary control flow
     Exceptional handling
     Device and platform atomics


12 | Heterogeneous System Architecture   | June 2012
OPENCL™ AND HSA

    HSA is an optimized platform architecture for OpenCL™
        Not an alternative to OpenCL™
    OpenCL™ on HSA will benefit from
          Avoidance of wasteful copies
          Low latency dispatch
          Improved memory model
          Pointers shared between CPU and GPU
    HSA also exposes a lower level programming interface, for those that want the
     ultimate in control and performance
        Optimized libraries may choose the lower level interface




13 | Heterogeneous System Architecture   | June 2012
HSA TAKING PLATFORM TO PROGRAMMERS

 Balance between CPU and GPU for performance and power efficiency

 Make GPUs accessible to wider audience of programmers
     Programming models close to today’s CPU programming models
     Enabling more advanced language features on GPU
     Shared virtual memory enables complex pointer-containing data structures (lists, trees,
      etc) and hence more applications on GPU
     Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)
        • Enabling task-graph style algorithms, Ray-Tracing, etc

 Clearly defined HSA memory model enables effective reasoning for parallel programming

 HSA provides a compatible architecture across a wide range of programming models and
  HW implementations.


14 | Heterogeneous System Architecture   | June 2012
THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM



 An open standardization body to bring about broad industry support for Heterogeneous
  Computing via the full value chain Silicon IP to ISV.
 GPU computing as a first class co-processor to the CPU through architecture definition
 Architectural support for special purpose hardware accelerators ( Rasterizer, Security
  Processors, DSP, etc.)
 Own and evolve the specifications and conformance suite
 Bring to market strong development solutions to drive innovative advanced content and
  applications
 Cultivate programing talent via HSA developer training and academic programs




15 | Heterogeneous System Architecture   | June 2012
THANK YOU




16 | Heterogeneous System Architecture   | June 2012
Disclaimer & Attribution
            The information presented in this document is for informational purposes only and may contain technical inaccuracies,
            omissions and typographical errors.

            The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
            limited to product and roadmap changes, component and motherboard version changes, new model and/or product
            releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
            like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise
            this information and to make changes from time to time to the content hereof without obligation to notify any person of such
            revisions or changes.

            NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
            RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
            INFORMATION.

            ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE
            EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT,
            INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
            CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

            AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names
            used in this presentation are for informational purposes only and may be trademarks of their respective owners.

            OpenCL is a trademark of Apple Inc. used by permission by Khronos.

            © 2012 Advanced Micro Devices, Inc.


17 | Heterogeneous System Architecture   | June 2012

More Related Content

PDF
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
PDF
Hsa10 whitepaper
PDF
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
PPTX
HSA Introduction Hot Chips 2013
PPTX
HSA Queuing Hot Chips 2013
PDF
HSAemu a Full System Emulator for HSA
PDF
HSA From A Software Perspective
PDF
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
Hsa10 whitepaper
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
HSA Introduction Hot Chips 2013
HSA Queuing Hot Chips 2013
HSAemu a Full System Emulator for HSA
HSA From A Software Perspective
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU

What's hot (20)

PPTX
HSA Memory Model Hot Chips 2013
PPTX
ISCA Final Presentation - Applications
PPTX
HSA HSAIL Introduction Hot Chips 2013
PPTX
HSA Foundation Overview
PDF
HSA-4123, HSA Memory Model, by Ben Gaster
PDF
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
PPT
Guide to heterogeneous system architecture (hsa)
PDF
Gpu Compute
PDF
Cuda lab manual
PDF
openCL Paper
PDF
Greenplum Database on HDFS
PDF
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PPTX
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
PDF
Compute API –Past & Future
PDF
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intel
PDF
Droidcon ndk cpu_architecture_optimization
PDF
CUDA by Example : The Final Countdown : Notes
PDF
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
PDF
IBM Power 755 Server Data Sheet
PDF
User Group Bi
HSA Memory Model Hot Chips 2013
ISCA Final Presentation - Applications
HSA HSAIL Introduction Hot Chips 2013
HSA Foundation Overview
HSA-4123, HSA Memory Model, by Ben Gaster
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Guide to heterogeneous system architecture (hsa)
Gpu Compute
Cuda lab manual
openCL Paper
Greenplum Database on HDFS
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Compute API –Past & Future
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intel
Droidcon ndk cpu_architecture_optimization
CUDA by Example : The Final Countdown : Notes
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
IBM Power 755 Server Data Sheet
User Group Bi
Ad

Similar to HSA Overview (20)

PPT
Amd fusion apus
PPTX
Graphics processing unit ppt
PDF
The International Journal of Engineering and Science (The IJES)
PDF
Cg 4278
PDF
GPGPU algorithms in games
PDF
Heterogenous system architecture(HSA)
PDF
AMD 2012: HSA in Gaming
PPTX
GPU Computing
PDF
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
PPTX
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
PDF
Volume 2-issue-6-2040-2045
PDF
Volume 2-issue-6-2040-2045
PDF
GPU Programming with Java
PPTX
CPU VS GPU Performance a: a comparative analysis
PDF
Graphics Processing Unit: An Introduction
PPTX
HSA Introduction
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
GPU - Basic Working
PPT
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
PDF
19564926 graphics-processing-unit
Amd fusion apus
Graphics processing unit ppt
The International Journal of Engineering and Science (The IJES)
Cg 4278
GPGPU algorithms in games
Heterogenous system architecture(HSA)
AMD 2012: HSA in Gaming
GPU Computing
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
GPU Programming with Java
CPU VS GPU Performance a: a comparative analysis
Graphics Processing Unit: An Introduction
HSA Introduction
GPGPU Accelerates PostgreSQL (English)
GPU - Basic Working
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
19564926 graphics-processing-unit
Ad

More from HSA Foundation (19)

PDF
Hsa Runtime version 1.00 Provisional
PDF
Hsa programmers reference manual (version 1.0 provisional)
PPTX
ISCA final presentation - Runtime
PPTX
ISCA final presentation - Queuing Model
PPTX
ISCA final presentation - Memory Model
PPTX
ISCA Final Presentaiton - Compilations
PPTX
ISCA Final Presentation - HSAIL
PPTX
ISCA Final Presentation - Intro
PDF
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
PDF
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
PPT
Apu13 cp lu-keynote-final-slideshare
PDF
HSA Foundation BoF -Siggraph 2013 Flyer
PDF
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
PDF
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
PDF
Phil Rogers IFA Keynote 2012
PDF
Deeper Look Into HSAIL And It's Runtime
PDF
Hsa2012 logo guidelines.
PDF
What Fabric Engine Can Do With HSA
PDF
Fabric Engine: Why HSA is Invaluable
Hsa Runtime version 1.00 Provisional
Hsa programmers reference manual (version 1.0 provisional)
ISCA final presentation - Runtime
ISCA final presentation - Queuing Model
ISCA final presentation - Memory Model
ISCA Final Presentaiton - Compilations
ISCA Final Presentation - HSAIL
ISCA Final Presentation - Intro
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Apu13 cp lu-keynote-final-slideshare
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
Phil Rogers IFA Keynote 2012
Deeper Look Into HSAIL And It's Runtime
Hsa2012 logo guidelines.
What Fabric Engine Can Do With HSA
Fabric Engine: Why HSA is Invaluable

Recently uploaded (20)

PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hybrid model detection and classification of lung cancer
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
1. Introduction to Computer Programming.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid model detection and classification of lung cancer
OMC Textile Division Presentation 2021.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Zenith AI: Advanced Artificial Intelligence
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Getting Started with Data Integration: FME Form 101
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
Chapter 5: Probability Theory and Statistics
Programs and apps: productivity, graphics, security and other tools
Hindi spoken digit analysis for native and non-native speakers
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A Presentation on Artificial Intelligence
1. Introduction to Computer Programming.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
DP Operators-handbook-extract for the Mautical Institute
A comparative study of natural language inference in Swahili using monolingua...
Enhancing emotion recognition model for a student engagement use case through...
Assigned Numbers - 2025 - Bluetooth® Document

HSA Overview

  • 2. INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA) HSA is a purpose designed architecture to enable the software ecosystem to combine and exploit the complementary capabilities of sequential programming elements (CPUs) and parallel processing elements (such as GPUs) to deliver new capabilities to users that go beyond the traditional usage scenarios AMD is making HSA an open standard to jumpstart the ecosystem 2 | Heterogeneous System Architecture | June 2012
  • 3. EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA APP Accelerated Software Accelerated Processing Unit Applications Graphics Workloads Data Parallel Workloads Serial and Task Parallel Workloads 3 | Heterogeneous System Architecture | June 2012
  • 4. AMD HSA FEATURE ROADMAP Physical Optimized Architectural System Integration Platforms Integration Integration Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute context in silicon support for CPU and GPU switch GPU uses pageable Unified Memory HSA Memory GPU graphics pre- system memory via Controller Management Unit emption CPU pointers Common Bi-Directional Power Fully coherent memory Manufacturing Mgmt between CPU Quality of service between CPU & GPU Technology and GPU 4 | Heterogeneous System Architecture | June 2012
  • 5. HSA COMPLIANT FEATURES Optimized Platforms Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language. GPU Compute C++ This eases programming of both CPU and GPU working together to process support parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc. CPU and GPU can share system memory. This means all system memory is HSA Memory accessible by both CPU or GPU, depending on need. In today’s world, only a Management Unit subset of system memory can be used by the GPU. Bi-Directional Power Enables “power sloshing” where CPU and GPU are able to dynamically lower or Mgmt between CPU raise their power and performance, depending on the activity and which one is and GPU more suited to the task at hand. 5 | Heterogeneous System Architecture | June 2012
  • 6. HSA COMPLIANT FEATURES Architectural Integration The unified address space provides ease of programming for developers to create Unified Address Space for CPU and GPU applications. For HSA platforms, a pointer is really a pointer and does not require separate memory pointers for CPU and GPU. GPU uses pageable The GPU can take advantage of the CPU virtual address space. With pageable system memory via system memory, the GPU can reference the data directly in the CPU domain. In CPU pointers prior architectures, data had to be copied between the two spaces or page-locked prior to use. Allows for data to be cached by both the CPU and the GPU, and referenced by Fully coherent memory either. In all previous generations, GPU caches had to be flushed at command between CPU & GPU buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU in an APU share a high speed coherent bus. 6 | Heterogeneous System Architecture | June 2012
  • 7. FULL HSA FEATURES System Integration GPU tasks can be context switched, making the GPU a multi-tasker. Context GPU compute context switching means faster application, graphics and compute switch interoperation. Users get a snappier, more interactive experience. As more applications enjoy the performance and features of the GPU, it is important GPU graphics pre- that interactivity of the system is good. This means low latency access to the GPU emption from any process. With context switching and pre-emption, time criticality is added to the tasks Quality of service assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized. 7 | Heterogeneous System Architecture | June 2012
  • 8. UNLEASHING DEVELOPER INNOVATION PROBLEM HSA + SDKs = SOLUTION Productivity & Performance with low Power Few M Few K Wide range of GPU/HW blocks hard to program HSA Differentiated Not all workloads accelerate Apps Coders Experiences Developer Return ~100K ~200+ Significant GPU niche (Differentiation in Apps Coders Value Performance, Developers historically program CPUs Power, Features, ~30+M ~4M+ Good User Time2Market) CPU Apps Experiences Coders Developer Investment (Effort, Time, New skills) 8 | Heterogeneous System Architecture | June 2012
  • 9. HSA SOLUTION STACK  How we deliver the HSA value proposition Application SW Developers Domain Specific Libs  Overall Vision: Standard SW (Bolt, OpenCV,…) – Make GPU easily accessible OpenCL DirectX Other  Support mainstream languages Runtime Runtime Runtime  Expandable to domain specific languages Legacy – Make compute offload efficient HSA Runtime User Mode  Direct path to GPU (avoid Graphics Drivers overhead) HSAIL  Eliminate memory copy HW Vendors Finalizer  Low-latency dispatch Custom Drivers GPU ISA – Make it ubiquitous Other  Drive HSA as a standard through Differentiated HW CPU(s) GPU(s) Accelerators HSA Foundation  Open Source key components 9 | Heterogeneous System Architecture | June 2012
  • 10. HSA INTERMEDIATE LAYER - HSAIL  HSAIL is a virtual ISA for parallel programs  Finalized to native ISA by a JIT compiler or “Finalizer”  Allow rapid innovations in native GPU architectures  HSAIL will be constant across implementations  Explicitly parallel  Designed for data parallel programming  Support for exceptions, virtual functions, and other high level language features  Syscall methods  GPU code can call directly to system services, IO, printf, etc  Debugging support 10 | Heterogeneous System Architecture | June 2012
  • 11. C++ AMP  C++ AMP: a data parallel programming model initiated by Microsoft for accelerators  First announced at the 2011 AFDS  C++ based higher level programming model with advanced C++11 features  Single source model to well integrate host and device programming  Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding host-to-device copies  A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta release 11 | Heterogeneous System Architecture | June 2012
  • 12. C++ AMP AND HSA  Compute-focused efficient HSA implementation to replace a graphics-centric implementation for C++ AMP  E.g. low latency dispatch, HSAIL enabled  The shared virtual memory in HSA eliminates the data copies between host and device in existing C++ AMP programs without any source changes.  Additional advanced C++ features on GPU, e.g.  More data types  Function calls  Virtual functions  Arbitrary control flow  Exceptional handling  Device and platform atomics 12 | Heterogeneous System Architecture | June 2012
  • 13. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  HSA also exposes a lower level programming interface, for those that want the ultimate in control and performance  Optimized libraries may choose the lower level interface 13 | Heterogeneous System Architecture | June 2012
  • 14. HSA TAKING PLATFORM TO PROGRAMMERS  Balance between CPU and GPU for performance and power efficiency  Make GPUs accessible to wider audience of programmers  Programming models close to today’s CPU programming models  Enabling more advanced language features on GPU  Shared virtual memory enables complex pointer-containing data structures (lists, trees, etc) and hence more applications on GPU  Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU) • Enabling task-graph style algorithms, Ray-Tracing, etc  Clearly defined HSA memory model enables effective reasoning for parallel programming  HSA provides a compatible architecture across a wide range of programming models and HW implementations. 14 | Heterogeneous System Architecture | June 2012
  • 15. THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM  An open standardization body to bring about broad industry support for Heterogeneous Computing via the full value chain Silicon IP to ISV.  GPU computing as a first class co-processor to the CPU through architecture definition  Architectural support for special purpose hardware accelerators ( Rasterizer, Security Processors, DSP, etc.)  Own and evolve the specifications and conformance suite  Bring to market strong development solutions to drive innovative advanced content and applications  Cultivate programing talent via HSA developer training and academic programs 15 | Heterogeneous System Architecture | June 2012
  • 16. THANK YOU 16 | Heterogeneous System Architecture | June 2012
  • 17. Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used by permission by Khronos. © 2012 Advanced Micro Devices, Inc. 17 | Heterogeneous System Architecture | June 2012