SlideShare a Scribd company logo
Introducing PgOpenCL
        A New PostgreSQL
       Procedural Language
Unlocking the Power of the GPU!
                By
             Tim Child
Bio

Tim Child
• 35 years experience of software development
• Formerly
  •   VP Oracle Corporation
  •   VP BEA Systems Inc.
  •   VP Informix
  •   Leader at Illustra, Autodesk, Navteq, Intuit, …
• 30+ years experience in 3D, CAD, GIS and DBMS
Terminology
Term                  Description
Procedure Language    Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )
GPU                   Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU                 General Purpose GPU (non-graphics programming on a GPU)
CUDA                  Nvidia’s GPU programming environment
APU                   Accelerated Processing Unit      (AMD’s Hybrid CPU & GPU chip)
ISO C99               Modern standard version of the C language
OpenCL                Open Compute Language
OpenMP                Open Multi-Processing (parallelizing compilers)
SIMD                  Single Instruction Multiple Data (Vector instructions )
SSE                   x86, x64 (Intel, AMD) Streaming SIMD Extensions
xPU                   Any Processing Unit device (CPU, GPU, APU)
Kernel                Functions that execute on a OpenCL Device
Work Item             Instance of a Kernel
Workgroup             A group of Work Items
FLOP                  Floating Point Operation (single = SQL real type )
MIC                   Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends
            Impacting DBMS
• Solid State Storage
    – Reduced Access Time, Lower Power, Increasing in capacity
• Virtualization
    – Server consolidation, Specialized VM’s, lowers direct costs
• Cloud Computing
    – EC2, Azure, … lowers capital requirements
• Multi-Core
    – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications

• xPU (GPU/APU)
    –   GPU >1000 Cores
    –    > 1T FLOP /s @ €2500
    –   APU = CPU + GPU Chip Hybrids due in Mid 2011
    –   2 T FLOP /s for $2.10 per hour (AWS EC2)
    –   Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive
    xPU Database Applications
•   Bioinformatics

•   Signal/Audio/Image Processing/Video

•   Data Mining & Analytics

•   Searching

•   Sorting

•   Spatial Selections and Joins

•   Map/Reduce

•   Scientific Computing

•   Many Others …
GPU vs CPU
Vendor           NVidia       ATI Radeon      Intel
Architecture     Fermi         Evergreen    Nehalem
Cores              448           1600          4
                  Simple        Simple      Complex
Transistors       3.1 B         2.15 B       731 M
Clock            1.5 G Hz      851 M Hz      3 G Hz
Peak Float       1500 G        2720 G         96 G
Performance      FLOP / s      FLOP / s     FLOP / s
Peak Double       750 G         544 G         48 G
Performance      FLOP / s      FLOP / s     FLOP / s
Memory          ~ 190 G / s   ~ 153 G / s   ~ 30 G / s
Bandwidth
Power             250 W        > 250 W        80 W
Consumption
SIMD / Vector     Many          Many         SSE4+
Instructions
Multi-Core Performance




Source NVidia
Future (Mid 2011)
                 APU Based PC
APU (Accelerated Processing Unit)

              APU Chip
      CPU             CPU                 ~20 GB/s     System RAM


         North Bridge
        ~20 GB/s                                           APU’s
                          PCIE ~12 GB/s
                          PCIE ~12 GB/s




                                                     Adds an Embedded
      Embedded                                             GPU
        GPU


                   Discrete
                                          150 GB/s     Graphic RAM
                     GPU

             Source AMD
Scalar vs. SIMD
Scalar Instruction
          C=A+B                           1       +       2        =        3




SIMD Instruction                              1       3       5         7

                                                          +
      Vector C = Vector A + Vector B          2       4       6        8

                                                          =
                                              3       7       11       15


        OpenCL
                  Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU
            Trends
• Many more xPU Cores in our Future
• Compute Environment becoming Hybrid
  – CPU and GPU’s
  – Need CPU to give access to GPU power
• GPU Capabilities
  – Lots of cores
  – Vector/SIMD Instructions
  – Fast Memory
• GPU Futures
  – Virtual Memory
  – Multi-tasking / Pre-emption
Scaling PostgreSQL Queries
                       on xPU’s
            Multi-Core CPU                                           Many Core GPU


 PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL       PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
  Threads     Threads    Threads    Threads        Threads     Threads    Threads    Thread     Thread



                                                   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
Postgres                                            Threads    Threads    Threads    Thread     Thread
Process


                                                   PgOpenCL              PgOpenCL   PgOpenCL   PgOpenCL
                                                              PgOpenCL
                                                    Threads               Threads    Thread     Thread
                                                               Threads




                                              Using More
                                              Transistors
Parallel
      Programming Systems
Category             CUDA     OpenMP       OpenCL
Language               C      C, Fortran     C
Cross Platform         X          √           √
Standard             Vendor   OpenMP       Khronos
CPU                    X          √           √
GPU                    √          X           √
Clusters               X          √           X

Compilation / Link   Static     Static     Dynamic
What is OpenCL?
• OpenCL - Open Compute Language
  –   Subset of C 99
  –   Open Specification
  –   Proposed by Apple
  –   Many Companies Collaborated on the Specification
  –   Portable, Device Agnostic
  –   Specification maintained by Khronos Group
• PgOpenCL
  – OpenCL as a PostgreSQL Procedural Language
System Overview
                                    DBMS Server

                                                   PgOpenCL
                                                    PgOpenCL
  Web     HTTP     Web               SQL              SQL
                                                       SQL
Browser           Server             Statement     Procedure
                                                    Procedure

                                                       PCIe X2 Bus
                           TCP/IP

                   App
                                      PostgreSQL              GPGPU
                  Server




                                        Disk I/O     Tables
                           TCP/IP
          PostgreSQL
            Client
OpenCL
                       Language
• A subset of ISO C99
   – - But without some C99 features such as standard C99 headers,
   – function pointers, recursion, variable length arrays, and bit fields
• A superset of ISO C99 with additions for:
   –   - Work-items and Workgroups
   –   - Vector types
   –   - Synchronization
   –   - Address space qualifiers
• Also includes a large set of built-in functions
   – - Image manipulation
   – - Work-item manipulation,
   – - Specialized math routines, etc.
PgOpenCL
             Components
• New PostgreSQL Procedural Language
  – Language handler
     • Maps arguments
     • Calls function
     • Returns results
  – Language validator
     • Creates Function with parameter & syntax checking
     • Compiles Function to a Binary format
• New data types
  – cl_double4, cl_double8, ….
• System Admin Pseudo-Tables
  – Platform, Device, Run-Time, …
PgOpenCL
 Admin
PGOpenCL
                        Function Declaration
CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])
AS $BODY$

#pragma PGOPENCL Platform : ATI Stream
#pragma PGOPENCL Device : CPU

__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void VectorAdd( __global const float *a, __global const float *b, __global float *c)
  {
    int i = get_global_id(0);

      c[i] = a[i] + b[i];
  }

$BODY$
Language PgOpenCL;
PgOpenCL
                                   Execution Model
            A
Table
            B

            Select Table                    100’s - 1000’s of
              to Array                      Threads (Kernels)

                                        xPU
                                           VectorAdd(A, B)
        A           +        B                Returns C               =       C


                            Copy                                                  Unnest Array
                                                                 Copy               To Table
            Table

                C       C    C      C   C   C    C    C      C    C       C   C      C
Using
               Re-Shaped Tables
                       100’s - 1000’s of
    Table of           Threads (Kernels)                  Table of
     Arrays                                                Arrays
                  A    +   B     =         C

A
                                                      C     C        C   C
B
                   xPU
                      VectorAdd(A, B)
                         Returns C
A
                                                      C     C        C   C
B

                Copy
                                               Copy
Today’s GPGPU
              Challenges
• No Pre-emptive Multi-Tasking
• No Virtual Memory
• Limited Bandwidth to discrete GPGPU
   – 1 – 8 G/s over PCIe Bus
• Hard to Program
   – New Parallel Algorithms and constructs
   – “New” C language dialect
• Immature Tools
   – Compilers, IDE, Debuggers, Profilers - early years
• Data organization really matters
   – Types, Structure, and Alignment
   – SQL needs to Shape the Data
• Profiling and Debugging is not easy

Solves Well for Problem Sets with the Right Shape!
Making a Problem
                           Work for You
        • Determine % Parallelism Possible
for ( i = 0, i <  ∞, i++)
            for ( j = 0; j < ∞; j++ )
                      for ( k = 0; k <   ∞; k++ )


        • Arrange data to fit available GPU RAM
        •    Ensure calculation time >> I/O transfer overhead
        •    Learn about Parallel Algorithms and the OpenCL language
        •    Learn new tools
        •    Carefully choose Data Types, Organization and Alignments
        •    Profile and Measure at Every Stage
PgOpenCL
     System Requirements
• PostgreSQL 9.x
• For GPU’s
   – AMD ATI OpenCL Stream SDK 2.x
   – NVidia CUDA 3.x SDK
   – Recent Macs with O/S 11.6
• For CPU’s (Pentium M or more recent)
   – AMD ATI OpenCL Stream SDK 2.x
   – Intel OpenCL SDK Alpha Release (x86)
   – Recent Macs with O/S 11.6
PGOpenCL
                                   Status
    Today        1Q 2011
  Prototype       Beta


     2010             2011


• Wish List
       • Beta Testers
              – Existing OpenCL App?
              – Have a GPU App?
       • Contributors
              – Code server side functions?
       • Sponsors & Supporters
           – AMD Fusion Fund?
           – Khronos?
PgOpenCL
               Future Plans
• Increase Platform Support
• Scatter/Gather Functions
• Additional Type Support
   – Image Types
   – Sparse Matrices
• Run-Time
   –   Asynchronous
   –   Events
   –   Profiling
   –   Debugging
Using the
                                Whole Brain
                        APU Chip
PgOpenCl                           PgOpenCl
  PgOpenCL                           PgOpenCL
                 CPU
         CPU                    CPU
      Postgres                                  You can’t be in a
                                                parallel universe
                                                  with a single
                                                     brain!
                 North Bridge
             ~20 GB/s
                                                 • Heterogeneous Compute Environments
                          PgOpenCl
                            PgOpenCl                  • CPU’s, GPU’s, APU’s
             Embedded         PgOpenCl                • Expect 100’s – 1000’s of cores
                                PgOpenCl
               GPU                PgOpenCL




             The Future Is Parallel: What's a Programmer to Do?
Summarizing
              PgOpenCL
• Supports Heterogeneous Parallel Compute Environments
    • CPU’s, GPU’s, APU’s

• OpenCL
    • Portable and high-performance framework
        –Ideal for computationally intensive algorithms
        –Access to all compute resources (CPU, APU, GPU)
        –Well-defined computation/memory model
    •Efficient parallel programming language
        –C99 with extensions for task and data parallelism
        –Rich set of built-in functions
    •Open standard for heterogeneous parallel computing
• PgOpenCL
   • Integrates PostgreSQL with OpenCL
   • Provides Easy SQL Access to xPU’s
       • APU, CPU, GPGPU
   • Integrates OpenCL
       • SQL + Web Apps(PHP, Ruby, … )
More
                    Information
•   PGOpenCL
        • Twitter @3DMashUp

•   OpenCL

• www.khronos.org/opencl/


• www.amd.com/us/products/technologies/stream-technology/opencl/


• http://software.intel.com/en-us/articles/intel-opencl-sdk


• http://www.nvidia.com/object/cuda_opencl_new.html


• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A

• Using Parallel Applications?
• Benefits of OpenCL / PgOpenCL?
• Want to Collaborate on PgOpenCL?

More Related Content

PDF
PG-Strom - GPU Accelerated Asyncr
PDF
PG-Strom
PDF
SQL+GPU+SSD=∞ (English)
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
20160407_GTC2016_PgSQL_In_Place
PG-Strom - GPU Accelerated Asyncr
PG-Strom
SQL+GPU+SSD=∞ (English)
GPGPU Accelerates PostgreSQL (English)
PG-Strom - A FDW module utilizing GPU device
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
20160407_GTC2016_PgSQL_In_Place

What's hot (20)

PDF
20150318-SFPUG-Meetup-PGStrom
PDF
pgconfasia2016 plcuda en
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
20170602_OSSummit_an_intelligent_storage
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PDF
Easy and High Performance GPU Programming for Java Programmers
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PDF
Making Hardware Accelerator Easier to Use
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
PDF
20181212 - PGconfASIA - LT - English
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
PDF
Transparent GPU Exploitation for Java
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PPTX
GPGPU programming with CUDA
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
20190909_PGconf.ASIA_KaiGai
20150318-SFPUG-Meetup-PGStrom
pgconfasia2016 plcuda en
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
20170602_OSSummit_an_intelligent_storage
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Easy and High Performance GPU Programming for Java Programmers
Let's turn your PostgreSQL into columnar store with cstore_fdw
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Making Hardware Accelerator Easier to Use
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Using GPUs to handle Big Data with Java by Adam Roberts.
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
20181212 - PGconfASIA - LT - English
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Transparent GPU Exploitation for Java
20171206 PGconf.ASIA LT gstore_fdw
PG-Strom v2.0 Technical Brief (17-Apr-2018)
GPGPU programming with CUDA
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
20190909_PGconf.ASIA_KaiGai
Ad

Viewers also liked (8)

PDF
Task Parallel Library (TPL)
PDF
TPL Dataflow – зачем и для кого?
PPTX
Task Parallel Library 2014
PDF
An Intelligent Storage?
PDF
20170127 JAWS HPC-UG#8
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PDF
Convolutional Neural Networks (CNN)
Task Parallel Library (TPL)
TPL Dataflow – зачем и для кого?
Task Parallel Library 2014
An Intelligent Storage?
20170127 JAWS HPC-UG#8
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Networks (CNN)
Ad

Similar to PostgreSQL with OpenCL (20)

PDF
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
PPTX
Gpu archi
PPT
Current Trends in HPC
PPTX
Gpgpu intro
PDF
The Rise of Parallel Computing
PDF
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
PDF
Computing using GPUs
PDF
[05][cuda 및 fermi 최적화 기술] hryu optimization
PPTX
Cuda Architecture
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
PDF
Trip down the GPU lane with Machine Learning
PDF
Programming the PS3
PDF
The road to multi/many core computing
PDF
Open CL For Haifa Linux Club
PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
PPT
Vpu technology &gpgpu computing
PDF
OpenCL & the Future of Desktop High Performance Computing in CAD
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
PDF
Using GPUs for parallel processing
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
Gpu archi
Current Trends in HPC
Gpgpu intro
The Rise of Parallel Computing
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
Computing using GPUs
[05][cuda 및 fermi 최적화 기술] hryu optimization
Cuda Architecture
Accelerating Real Time Applications on Heterogeneous Platforms
Trip down the GPU lane with Machine Learning
Programming the PS3
The road to multi/many core computing
Open CL For Haifa Linux Club
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Vpu technology &gpgpu computing
OpenCL & the Future of Desktop High Performance Computing in CAD
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
Using GPUs for parallel processing

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Modernizing your data center with Dell and AMD
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Electronic commerce courselecture one. Pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Electronic commerce courselecture one. Pdf

PostgreSQL with OpenCL

  • 1. Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
  • 2. Bio Tim Child • 35 years experience of software development • Formerly • VP Oracle Corporation • VP BEA Systems Inc. • VP Informix • Leader at Illustra, Autodesk, Navteq, Intuit, … • 30+ years experience in 3D, CAD, GIS and DBMS
  • 3. Terminology Term Description Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … ) GPU Graphics Processing Unit (highly specialized CPU for graphics) GPGPU General Purpose GPU (non-graphics programming on a GPU) CUDA Nvidia’s GPU programming environment APU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip) ISO C99 Modern standard version of the C language OpenCL Open Compute Language OpenMP Open Multi-Processing (parallelizing compilers) SIMD Single Instruction Multiple Data (Vector instructions ) SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions xPU Any Processing Unit device (CPU, GPU, APU) Kernel Functions that execute on a OpenCL Device Work Item Instance of a Kernel Workgroup A group of Work Items FLOP Floating Point Operation (single = SQL real type ) MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
  • 4. Some Technology Trends Impacting DBMS • Solid State Storage – Reduced Access Time, Lower Power, Increasing in capacity • Virtualization – Server consolidation, Specialized VM’s, lowers direct costs • Cloud Computing – EC2, Azure, … lowers capital requirements • Multi-Core – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications • xPU (GPU/APU) – GPU >1000 Cores – > 1T FLOP /s @ €2500 – APU = CPU + GPU Chip Hybrids due in Mid 2011 – 2 T FLOP /s for $2.10 per hour (AWS EC2) – Intel MIC “Knights Corner “ > 50 x86 Cores
  • 5. Compute Intensive xPU Database Applications • Bioinformatics • Signal/Audio/Image Processing/Video • Data Mining & Analytics • Searching • Sorting • Spatial Selections and Joins • Map/Reduce • Scientific Computing • Many Others …
  • 6. GPU vs CPU Vendor NVidia ATI Radeon Intel Architecture Fermi Evergreen Nehalem Cores 448 1600 4 Simple Simple Complex Transistors 3.1 B 2.15 B 731 M Clock 1.5 G Hz 851 M Hz 3 G Hz Peak Float 1500 G 2720 G 96 G Performance FLOP / s FLOP / s FLOP / s Peak Double 750 G 544 G 48 G Performance FLOP / s FLOP / s FLOP / s Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s Bandwidth Power 250 W > 250 W 80 W Consumption SIMD / Vector Many Many SSE4+ Instructions
  • 8. Future (Mid 2011) APU Based PC APU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s APU’s PCIE ~12 GB/s PCIE ~12 GB/s Adds an Embedded Embedded GPU GPU Discrete 150 GB/s Graphic RAM GPU Source AMD
  • 9. Scalar vs. SIMD Scalar Instruction C=A+B 1 + 2 = 3 SIMD Instruction 1 3 5 7 + Vector C = Vector A + Vector B 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double
  • 10. Summarizing xPU Trends • Many more xPU Cores in our Future • Compute Environment becoming Hybrid – CPU and GPU’s – Need CPU to give access to GPU power • GPU Capabilities – Lots of cores – Vector/SIMD Instructions – Fast Memory • GPU Futures – Virtual Memory – Multi-tasking / Pre-emption
  • 11. Scaling PostgreSQL Queries on xPU’s Multi-Core CPU Many Core GPU PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Threads Threads Threads Threads Threads Thread Thread PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Postgres Threads Threads Threads Thread Thread Process PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Thread Thread Threads Using More Transistors
  • 12. Parallel Programming Systems Category CUDA OpenMP OpenCL Language C C, Fortran C Cross Platform X √ √ Standard Vendor OpenMP Khronos CPU X √ √ GPU √ X √ Clusters X √ X Compilation / Link Static Static Dynamic
  • 13. What is OpenCL? • OpenCL - Open Compute Language – Subset of C 99 – Open Specification – Proposed by Apple – Many Companies Collaborated on the Specification – Portable, Device Agnostic – Specification maintained by Khronos Group • PgOpenCL – OpenCL as a PostgreSQL Procedural Language
  • 14. System Overview DBMS Server PgOpenCL PgOpenCL Web HTTP Web SQL SQL SQL Browser Server Statement Procedure Procedure PCIe X2 Bus TCP/IP App PostgreSQL GPGPU Server Disk I/O Tables TCP/IP PostgreSQL Client
  • 15. OpenCL Language • A subset of ISO C99 – - But without some C99 features such as standard C99 headers, – function pointers, recursion, variable length arrays, and bit fields • A superset of ISO C99 with additions for: – - Work-items and Workgroups – - Vector types – - Synchronization – - Address space qualifiers • Also includes a large set of built-in functions – - Image manipulation – - Work-item manipulation, – - Specialized math routines, etc.
  • 16. PgOpenCL Components • New PostgreSQL Procedural Language – Language handler • Maps arguments • Calls function • Returns results – Language validator • Creates Function with parameter & syntax checking • Compiles Function to a Binary format • New data types – cl_double4, cl_double8, …. • System Admin Pseudo-Tables – Platform, Device, Run-Time, …
  • 18. PGOpenCL Function Declaration CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[]) AS $BODY$ #pragma PGOPENCL Platform : ATI Stream #pragma PGOPENCL Device : CPU __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void VectorAdd( __global const float *a, __global const float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } $BODY$ Language PgOpenCL;
  • 19. PgOpenCL Execution Model A Table B Select Table 100’s - 1000’s of to Array Threads (Kernels) xPU VectorAdd(A, B) A + B Returns C = C Copy Unnest Array Copy To Table Table C C C C C C C C C C C C C
  • 20. Using Re-Shaped Tables 100’s - 1000’s of Table of Threads (Kernels) Table of Arrays Arrays A + B = C A C C C C B xPU VectorAdd(A, B) Returns C A C C C C B Copy Copy
  • 21. Today’s GPGPU Challenges • No Pre-emptive Multi-Tasking • No Virtual Memory • Limited Bandwidth to discrete GPGPU – 1 – 8 G/s over PCIe Bus • Hard to Program – New Parallel Algorithms and constructs – “New” C language dialect • Immature Tools – Compilers, IDE, Debuggers, Profilers - early years • Data organization really matters – Types, Structure, and Alignment – SQL needs to Shape the Data • Profiling and Debugging is not easy Solves Well for Problem Sets with the Right Shape!
  • 22. Making a Problem Work for You • Determine % Parallelism Possible for ( i = 0, i < ∞, i++) for ( j = 0; j < ∞; j++ ) for ( k = 0; k < ∞; k++ ) • Arrange data to fit available GPU RAM • Ensure calculation time >> I/O transfer overhead • Learn about Parallel Algorithms and the OpenCL language • Learn new tools • Carefully choose Data Types, Organization and Alignments • Profile and Measure at Every Stage
  • 23. PgOpenCL System Requirements • PostgreSQL 9.x • For GPU’s – AMD ATI OpenCL Stream SDK 2.x – NVidia CUDA 3.x SDK – Recent Macs with O/S 11.6 • For CPU’s (Pentium M or more recent) – AMD ATI OpenCL Stream SDK 2.x – Intel OpenCL SDK Alpha Release (x86) – Recent Macs with O/S 11.6
  • 24. PGOpenCL Status Today 1Q 2011 Prototype Beta 2010 2011 • Wish List • Beta Testers – Existing OpenCL App? – Have a GPU App? • Contributors – Code server side functions? • Sponsors & Supporters – AMD Fusion Fund? – Khronos?
  • 25. PgOpenCL Future Plans • Increase Platform Support • Scatter/Gather Functions • Additional Type Support – Image Types – Sparse Matrices • Run-Time – Asynchronous – Events – Profiling – Debugging
  • 26. Using the Whole Brain APU Chip PgOpenCl PgOpenCl PgOpenCL PgOpenCL CPU CPU CPU Postgres You can’t be in a parallel universe with a single brain! North Bridge ~20 GB/s • Heterogeneous Compute Environments PgOpenCl PgOpenCl • CPU’s, GPU’s, APU’s Embedded PgOpenCl • Expect 100’s – 1000’s of cores PgOpenCl GPU PgOpenCL The Future Is Parallel: What's a Programmer to Do?
  • 27. Summarizing PgOpenCL • Supports Heterogeneous Parallel Compute Environments • CPU’s, GPU’s, APU’s • OpenCL • Portable and high-performance framework –Ideal for computationally intensive algorithms –Access to all compute resources (CPU, APU, GPU) –Well-defined computation/memory model •Efficient parallel programming language –C99 with extensions for task and data parallelism –Rich set of built-in functions •Open standard for heterogeneous parallel computing • PgOpenCL • Integrates PostgreSQL with OpenCL • Provides Easy SQL Access to xPU’s • APU, CPU, GPGPU • Integrates OpenCL • SQL + Web Apps(PHP, Ruby, … )
  • 28. More Information • PGOpenCL • Twitter @3DMashUp • OpenCL • www.khronos.org/opencl/ • www.amd.com/us/products/technologies/stream-technology/opencl/ • http://software.intel.com/en-us/articles/intel-opencl-sdk • http://www.nvidia.com/object/cuda_opencl_new.html • http://developer.apple.com/technologies/mac/snowleopard/opencl.html
  • 29. Q&A • Using Parallel Applications? • Benefits of OpenCL / PgOpenCL? • Want to Collaborate on PgOpenCL?