SlideShare a Scribd company logo
ArrayFire Webinar:
OpenCL and CUDA Trade-Offs and Comparisons
GPU Software Features
              Programmability
Portability                 Scalability

Performance                Community
Leading GPU Computing Platforms
              Programmability
Portability                 Scalability

Performance                Community
Performance
• Both CUDA and OpenCL are fast
• Both can fully utilize the hardware
• Devil in the details:
  – Hardware type, algorithm type, code quality
Performance
• ArrayFire results at end of webinar
Scalability
• Laptops --> Single GPU Machine
  – Both CUDA and OpenCL scale, no code change
• Single GPU Machine --> Multi-GPU Machine
  – User managed, low-level synchronization
• Multi-GPU Machine --> Cluster
  – MPI
Scalability
Interesting developments:
• Memory Transfer Optimizations
  – CUDA GPUDirect technology
• Mobile GPU Computing
  – OpenCL available on ARM, Imgtec, Freescale, …
Scalability
• Laptops --> Single GPU Machine
   – ArrayFire’s JIT optimizes for GPU type
• Single GPU Machine --> Multi-GPU Machine
   – ArrayFire’s deviceset() function is super easy
• Multi-GPU Machine --> Cluster
   – MPI
Portability
• CUDA is NVIDIA-only
  – Open source announcement
  – Does not provide CPU fallback
• OpenCL is the open industry standard
  – Runs on AMD, Intel, and NVIDIA
  – Provides CPU fallback
Portability
• ArrayFire is fully portable
  – Same ArrayFire code runs on CUDA or OpenCL
  – Simply select the right library
Community
• NVIDIA CUDA Forums – 26,893 topics
• AMD OpenCL Forums – 4,038 topics

• Stackoverflow CUDA Tag – 1,709 tags
• Stackoverflow OpenCL Tag – 564 tags
Community
• AccelerEyes GPU Forums – 1,435 topics
  – Largest GPU forums by a software company
• Next largest
  – PGI GPU Forums – 485 topics
Programmability
• Both CUDA and OpenCL are low-level
  – Time consuming kernel development
  – Data-parallel algorithm design
• Focus on programmability interfaces
Programmability

      Faster




                   SSE or
      Slower
                    AVX


               Time-consuming   Easy-to-use
Programmability
                  Writing
      Faster
                  Kernels




                   SSE or
      Slower
                    AVX


               Time-consuming   Easy-to-use
Programmability
                  Writing
      Faster
                  Kernels




                   SSE or       Compiler
      Slower
                    AVX         Directives


               Time-consuming   Easy-to-use
Programmability
                  Writing          Using
      Faster
                  Kernels        Libraries




                   SSE or       Compiler
      Slower
                    AVX         Directives


               Time-consuming   Easy-to-use
Libraries Make the Difference
Raw math libraries in NVIDIA CUDA
• CUBLAS, CUFFT, CULA, Magma
  – Provides all BLAS, LAPACK, and FFT routines
    necessary for most dense matrix operations
• CUSPARSE
  – A good start for sparse linear algebra
Libraries Make the Difference
Raw math libraries in AMD OpenCL
• clAmdBlas, clAmdFft
  – Provides most important Blas routines
  – Provides radix 2, 3, and 5 FFT routines
  – No LAPACK support
  – No sparse data support
Libraries Make the Difference
Raw math libraries in AMD OpenCL
• clAmdBlas, clAmdFft
  – Provides most important Blas routines
  – Provides radix 2, 3, and 5 FFT routines
  – No LAPACK support
  – No sparse data support
  – Runs on any OpenCL-compliant device
ArrayFire Code and Benchmarks

More Related Content

PPTX
OGDC2012 Lua In Game_Mr. Van, Nguyen Ngoc
PPTX
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
PPTX
Continuous Deployment with Cassandra
ODP
Couch base
PDF
What's new in MySQL 5.5? FOSDEM 2011
PPTX
Db spof(mssql, my sql)
PPTX
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
PDF
Apache Kafka - Martin Podval
OGDC2012 Lua In Game_Mr. Van, Nguyen Ngoc
OGDC Datastorage Solution_Mr.Dung, Dinh Nguyen Anh
Continuous Deployment with Cassandra
Couch base
What's new in MySQL 5.5? FOSDEM 2011
Db spof(mssql, my sql)
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Apache Kafka - Martin Podval

What's hot (20)

PDF
My ambariexperience
PPTX
How Pulsar Stores Your Data - Pulsar Summit NA 2021
PPTX
Introduction Apache Kafka
PDF
Fundamentals of Apache Kafka
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
PPTX
Kafka Tutorial, Kafka ecosystem with clustering examples
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
Kafka and Avro with Confluent Schema Registry
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PDF
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
PDF
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
PPTX
Deploying microservices on AWS
PDF
Lessons Learned From Running Spark On Docker
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Modern software architectures - PHP UK Conference 2015
PPTX
Best Practices for Running Kafka on Docker Containers
PPTX
Kafka Intro With Simple Java Producer Consumers
PPTX
Apache spark
My ambariexperience
How Pulsar Stores Your Data - Pulsar Summit NA 2021
Introduction Apache Kafka
Fundamentals of Apache Kafka
Avro Tutorial - Records with Schema for Kafka and Hadoop
Kafka Tutorial, Kafka ecosystem with clustering examples
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Apache Kafka Fundamentals for Architects, Admins and Developers
Kafka and Avro with Confluent Schema Registry
Kafka Tutorial - introduction to the Kafka streaming platform
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Deploying microservices on AWS
Lessons Learned From Running Spark On Docker
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Apache Kafka Architecture & Fundamentals Explained
Modern software architectures - PHP UK Conference 2015
Best Practices for Running Kafka on Docker Containers
Kafka Intro With Simple Java Producer Consumers
Apache spark
Ad

Similar to CUDA vs OpenCL (20)

PPTX
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
PPTX
Scala and Spark are Ideal for Big Data
PDF
SOUG_GV_Flashgrid_V4
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
PDF
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
PPTX
Introduction to HPC & Supercomputing in AI
PPT
Current Trends in HPC
PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
PPTX
CA UNIT IV.pptx
PPTX
Go Faster with Ansible (AWS meetup)
PPTX
Tech Days 2015: Embedded Product Update
PPTX
1745037236968_ppt seminar.pptxhhhuuiiuiuujhuuu
PPTX
1745037236968_ppt seminar.pptxhhhbbbbbsjwjwjwn
PPTX
Tuning and Monitoring Deep Learning on Apache Spark
PPTX
FPGAs in the cloud? (October 2017)
PPTX
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
PPT
OpenPOWER Webinar
PDF
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
PPT
Dad i want a supercomputer on my next
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Scala and Spark are Ideal for Big Data
SOUG_GV_Flashgrid_V4
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Introduction to HPC & Supercomputing in AI
Current Trends in HPC
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
CA UNIT IV.pptx
Go Faster with Ansible (AWS meetup)
Tech Days 2015: Embedded Product Update
1745037236968_ppt seminar.pptxhhhuuiiuiuujhuuu
1745037236968_ppt seminar.pptxhhhbbbbbsjwjwjwn
Tuning and Monitoring Deep Learning on Apache Spark
FPGAs in the cloud? (October 2017)
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
OpenPOWER Webinar
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Dad i want a supercomputer on my next
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Big Data Technologies - Introduction.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
sap open course for s4hana steps from ECC to s4
Big Data Technologies - Introduction.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
Assigned Numbers - 2025 - Bluetooth® Document
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

CUDA vs OpenCL

  • 1. ArrayFire Webinar: OpenCL and CUDA Trade-Offs and Comparisons
  • 2. GPU Software Features Programmability Portability Scalability Performance Community
  • 3. Leading GPU Computing Platforms Programmability Portability Scalability Performance Community
  • 4. Performance • Both CUDA and OpenCL are fast • Both can fully utilize the hardware • Devil in the details: – Hardware type, algorithm type, code quality
  • 6. Scalability • Laptops --> Single GPU Machine – Both CUDA and OpenCL scale, no code change • Single GPU Machine --> Multi-GPU Machine – User managed, low-level synchronization • Multi-GPU Machine --> Cluster – MPI
  • 7. Scalability Interesting developments: • Memory Transfer Optimizations – CUDA GPUDirect technology • Mobile GPU Computing – OpenCL available on ARM, Imgtec, Freescale, …
  • 8. Scalability • Laptops --> Single GPU Machine – ArrayFire’s JIT optimizes for GPU type • Single GPU Machine --> Multi-GPU Machine – ArrayFire’s deviceset() function is super easy • Multi-GPU Machine --> Cluster – MPI
  • 9. Portability • CUDA is NVIDIA-only – Open source announcement – Does not provide CPU fallback • OpenCL is the open industry standard – Runs on AMD, Intel, and NVIDIA – Provides CPU fallback
  • 10. Portability • ArrayFire is fully portable – Same ArrayFire code runs on CUDA or OpenCL – Simply select the right library
  • 11. Community • NVIDIA CUDA Forums – 26,893 topics • AMD OpenCL Forums – 4,038 topics • Stackoverflow CUDA Tag – 1,709 tags • Stackoverflow OpenCL Tag – 564 tags
  • 12. Community • AccelerEyes GPU Forums – 1,435 topics – Largest GPU forums by a software company • Next largest – PGI GPU Forums – 485 topics
  • 13. Programmability • Both CUDA and OpenCL are low-level – Time consuming kernel development – Data-parallel algorithm design • Focus on programmability interfaces
  • 14. Programmability Faster SSE or Slower AVX Time-consuming Easy-to-use
  • 15. Programmability Writing Faster Kernels SSE or Slower AVX Time-consuming Easy-to-use
  • 16. Programmability Writing Faster Kernels SSE or Compiler Slower AVX Directives Time-consuming Easy-to-use
  • 17. Programmability Writing Using Faster Kernels Libraries SSE or Compiler Slower AVX Directives Time-consuming Easy-to-use
  • 18. Libraries Make the Difference Raw math libraries in NVIDIA CUDA • CUBLAS, CUFFT, CULA, Magma – Provides all BLAS, LAPACK, and FFT routines necessary for most dense matrix operations • CUSPARSE – A good start for sparse linear algebra
  • 19. Libraries Make the Difference Raw math libraries in AMD OpenCL • clAmdBlas, clAmdFft – Provides most important Blas routines – Provides radix 2, 3, and 5 FFT routines – No LAPACK support – No sparse data support
  • 20. Libraries Make the Difference Raw math libraries in AMD OpenCL • clAmdBlas, clAmdFft – Provides most important Blas routines – Provides radix 2, 3, and 5 FFT routines – No LAPACK support – No sparse data support – Runs on any OpenCL-compliant device
  • 21. ArrayFire Code and Benchmarks