SlideShare a Scribd company logo
Introduction to OpenMP


Presenter: Vengada Karthik Rangaraju

           Fall 2012 Term

       September 13th, 2012
What is openMP?

•   Open Standard for Shared Memory Multiprocessing
•   Goal: Exploit multicore hardware with shared memory
•   Programmer’s view: The openMP API
•   Structure: Three primary API components:
    – Compiler directives,
    – Runtime Library routines and
    – Environment Variables
Shared Memory Architecture in a
    Multi-Core Environment
The key components of the API and its
             functions

• Compiler Directives
   - Spawning parallel regions (threads)
   - Synchronizing
   - Dividing blocks of code among threads
   - Distributing loop iterations
The key components of the API and its
             functions

• Runtime Library Routines
   - Setting & querying no. of threads
   - Nested parallelism
   - Control over locks
   - Thread information
The key components of the API and its
             functions

• Environment Variables
   - Setting no. of threads
   - Specifying how loop iterations are divided
   - Thread processor binding
   - Enabling/Disabling dynamic threads
   - Nested parallelism
Goals
• Standardization
• Ease of Use
• Portability
Paradigm for using openMP
          Write sequential
              program


         Find parallelizable
        portions of program

                                       Insert calls to
               Insert                 runtime library
        directives/pragmas     +   routines and modify
         into existing code            environment
                                    variables, if desired

          Use openMP’s
        extended Compiler
                                      What happens
                                         here?

        Compile and run !
Compiler translation


#pragma omp <directive-type> <directive-clauses></n>
{
……
…..// Block of code executed as per instruction !
}
Basic Example in C
{
… //Sequential
}
 #pragma omp parallel //fork
{
printf(“Hello from thread
   %d.n”,omp_get_thread_num());
} //join
{
… //Sequential
}
What exactly happens when lines of
    code are executed in parallel?


• A team of threads are created
• Each thread can have its own set of private
  variables
• All threads can have shared variables
• Original thread : Master Thread
• Fork-Join Model
• Nested Parallelism
openMP LifeCycle – Petrinet model
Compiler directives – The Multi Core
           Magic Spells !
  <directive type>   Description
  parallel           Each thread will perform
                     same computation as
                     others(replicated
                     computations)
  for / sections     These are called workshare
                     directives. Portions of
                     overall work divided among
                     threads(different
                     computations). They don’t
                     create threads. It has to be
                     enclosed inside a parallel
                     directive for threads to
                     takeover the divided work.
Compiler directives – The Multi Core
             Magic Spells !

• Types of workshare directives

   for                      Countable iteration[static]

   sections                 One or more sequential
                            sections of code, executed
                            by a single thread

   single                   Serializes a section of code
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive


    <directive type>       <directive clause>
    parallel               If(expression)
                           private(var1,var2,…)
                           firstprivate(var1,var2,..)
                           lastprivate(var1,var2,..)
                           shared(var1,var2,..)
                           NUM_THREADS(integer value)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive

   <directive type>       <directive clause>
   for                    schedule(type, chunk)
                          private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          shared(var1,var2,..)
                          collapse(n)
                          nowait
                          Reduction(operator:list)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive



   <directive type>       <directive clause>
   sections               private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          reduction(operator:list)
                          nowait
Matrix Multiplication using loop
                directive
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
Scheduling Parallel Loops
•   Static
•   Dynamic
•   Guided
•   Automatic
•   Runtime
Scheduling Parallel Loops
• Static - Amount of work/iteration - same
         - Set of contiguous chunks in RR fashion
         - 1 Chunk = x iterations
Scheduling Parallel Loops
• Dynamic - Amount of work/iteration - Varies
           - Each thread will grab chunk of
             iterations and return to grab another
             chunk when it has executed them.
• Guided - Same as dynamic, only difference,
         - a good proportion of iterations
            remaining are shared among each
            thread.
Scheduling Parallel Loops
• Runtime - Schedule determined using an
            environment variable. Library
            routine provided !
• Automatic - Implementation chooses any
               schedule
Matrix Multiplication using loop
      directive – with a schedule
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for schedule(static)
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
openMP worshare directive – sections
 int g;
 void foo(int m, int n)
{
      int p,i;
        #pragma omp sections firstprivate(g) nowait
        {
            #pragma omp section
            {
               p=f1(g);
               for(i=0;i<m;i++)
               do_stuff;
            }
            #pragma omp section
            {
               p=f2(g);
               for(i=0;i<n;i++)
               do_other_stuff;
            }
        }
return;
}
Parallelizing when the no.of Iterations
        is unknown[dynamic] !


• openMP has a directive called task
Explicit Tasks
 void processList(Node* list)
{
    #pragma omp parallel
    pragma omp single
    {
       Node *currentNode = list;
       while(currentNode)
        {
           #pragma omp task firstprivate(currentNode)
           doWork(currentNode);
          currentNode=currentNode->next;
        }
     }
}
Explicit Tasks – Petrinet Model
Synchronization
•   Barrier
•   Critical
•   Atomic
•   Flush
Performing Reductions
• A loop containing reduction will always be
  sequential, since each iteration would form a
  result depending on previous iteration.
• openMP allows these loops to be parallelized
  as long as the developer says, loop contains
  reduction and indicates the variable and kind
  of reduction via “Clauses”
Without using reduction
#pragma omp parallel shared(array,sum)
firstprivate(local_sum)
{
    #pragma omp for private(i,j)
    for(i=0;i<max_i;i++)
    {
          for(j=0;j<max_j;++j)
          local_sum+=array[i][j];
    }
}
#pragma omp critical
sum+=local_sum;
}
Using Reductions in openMP
sum=0;
#pragma omp parallel shared(array)
{
  #pragma omp for reduction(+:sum) private(i,j)
  for(i=0;i<max_i;i++)
  {
       for(j=0;j<max_j;++j)
       sum+=array[i][j];
  }
}
Programming for performance
• Use of IF clause before creating parallel
  regions
• Understanding Cache Coherence
• Judicious use of parallel and flush
• Critical and atomic - know the difference !
• Avoid unnecessary computations in critical
  region
• Use of barrier - a starvation alert !
References
• NUMA UMA

   http://vvirtual.wordpress.com/2011/06/13/what-is-numa/

   http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/

• openMP basics

   https://computing.llnl.gov/tutorials/openMP/

• Workshop on openMP SMP, by Tim Mattson from Intel (video)

  http://www.youtube.com/watch?v=TzERa9GA6vY
Interesting links

• openMP official page

   http://openmp.org/wp/

• 32 openMP Traps for C++ Developers

   http://www.viva64.com/en/a/0054/#ID0EMULM

More Related Content

PPTX
Paging and segmentation
PPTX
Aca11 bk2 ch9
PDF
Lecture 4 principles of parallel algorithm design updated
PPTX
Client server architecture
DOCX
Parallel computing persentation
PDF
Distributed Operating System_1
PPTX
Distributed shred memory architecture
PPTX
Parallel Processors (SIMD)
Paging and segmentation
Aca11 bk2 ch9
Lecture 4 principles of parallel algorithm design updated
Client server architecture
Parallel computing persentation
Distributed Operating System_1
Distributed shred memory architecture
Parallel Processors (SIMD)

What's hot (20)

PDF
Database 2 ddbms,homogeneous & heterognus adv & disadvan
PPTX
Introduction to Parallel and Distributed Computing
PPTX
Remote Procedure Call in Distributed System
PPTX
Dichotomy of parallel computing platforms
PPTX
Cache memory ppt
PPT
Distributed & parallel system
PPTX
Distributed Mutual Exclusion and Distributed Deadlock Detection
PDF
Unit 5 Advanced Computer Architecture
PPT
Flynns classification
PPTX
Data cube computation
PDF
Cs8591 Computer Networks
PPS
Cache memory
PDF
Array Processor
PPTX
Link state routing protocol
PPT
Scheduling algorithms
PPT
remote procedure calls
PPT
Socket System Calls
PPTX
Assembly and Machine Code
PPT
Distributed objects & components of corba
PPTX
Pram model
Database 2 ddbms,homogeneous & heterognus adv & disadvan
Introduction to Parallel and Distributed Computing
Remote Procedure Call in Distributed System
Dichotomy of parallel computing platforms
Cache memory ppt
Distributed & parallel system
Distributed Mutual Exclusion and Distributed Deadlock Detection
Unit 5 Advanced Computer Architecture
Flynns classification
Data cube computation
Cs8591 Computer Networks
Cache memory
Array Processor
Link state routing protocol
Scheduling algorithms
remote procedure calls
Socket System Calls
Assembly and Machine Code
Distributed objects & components of corba
Pram model
Ad

Viewers also liked (14)

PPTX
Intro to OpenMP
PDF
OpenMP Tutorial for Beginners
ODP
OpenMp
KEY
OpenMP
PDF
Open mp intro_01
PDF
Open mp
PDF
Openmp combined
PDF
Wolfgang Lehner Technische Universitat Dresden
PDF
Biref Introduction to OpenMP
PPTX
PDF
Parallel-kmeans
PDF
Deep Learning at Scale
PPTX
PDF
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Intro to OpenMP
OpenMP Tutorial for Beginners
OpenMp
OpenMP
Open mp intro_01
Open mp
Openmp combined
Wolfgang Lehner Technische Universitat Dresden
Biref Introduction to OpenMP
Parallel-kmeans
Deep Learning at Scale
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Ad

Similar to Presentation on Shared Memory Parallel Programming (20)

PPT
Lecture6
PPT
Nbvtalkataitamimageprocessingconf
PPT
openmp.New.intro-unc.edu.ppt
PPT
Programming using Open Mp
PPT
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
PDF
Introduction to OpenMP
PDF
Open MP cheet sheet
PPT
openmp.ppt
PPT
openmp.ppt
PPTX
MPI n OpenMP
PDF
Parallel Programming
PPTX
openmp final2.pptx
PPT
Lecture7
PDF
openmpfinal.pdf
PPTX
PDF
Introduction to OpenMP (Performance)
PPT
Lecture8
PPT
OpenMP
PPT
parallel programming models
PPT
OPEN MP TO FOR knowing more in the front
Lecture6
Nbvtalkataitamimageprocessingconf
openmp.New.intro-unc.edu.ppt
Programming using Open Mp
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
Introduction to OpenMP
Open MP cheet sheet
openmp.ppt
openmp.ppt
MPI n OpenMP
Parallel Programming
openmp final2.pptx
Lecture7
openmpfinal.pdf
Introduction to OpenMP (Performance)
Lecture8
OpenMP
parallel programming models
OPEN MP TO FOR knowing more in the front

Recently uploaded (20)

PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Cell Types and Its function , kingdom of life
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
O7-L3 Supply Chain Operations - ICLT Program
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Week 4 Term 3 Study Techniques revisited.pptx
human mycosis Human fungal infections are called human mycosis..pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Renaissance Architecture: A Journey from Faith to Humanism
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Complications of Minimal Access Surgery at WLH
Cell Types and Its function , kingdom of life
O5-L3 Freight Transport Ops (International) V1.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
102 student loan defaulters named and shamed – Is someone you know on the list?
Supply Chain Operations Speaking Notes -ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx

Presentation on Shared Memory Parallel Programming

  • 1. Introduction to OpenMP Presenter: Vengada Karthik Rangaraju Fall 2012 Term September 13th, 2012
  • 2. What is openMP? • Open Standard for Shared Memory Multiprocessing • Goal: Exploit multicore hardware with shared memory • Programmer’s view: The openMP API • Structure: Three primary API components: – Compiler directives, – Runtime Library routines and – Environment Variables
  • 3. Shared Memory Architecture in a Multi-Core Environment
  • 4. The key components of the API and its functions • Compiler Directives - Spawning parallel regions (threads) - Synchronizing - Dividing blocks of code among threads - Distributing loop iterations
  • 5. The key components of the API and its functions • Runtime Library Routines - Setting & querying no. of threads - Nested parallelism - Control over locks - Thread information
  • 6. The key components of the API and its functions • Environment Variables - Setting no. of threads - Specifying how loop iterations are divided - Thread processor binding - Enabling/Disabling dynamic threads - Nested parallelism
  • 7. Goals • Standardization • Ease of Use • Portability
  • 8. Paradigm for using openMP Write sequential program Find parallelizable portions of program Insert calls to Insert runtime library directives/pragmas + routines and modify into existing code environment variables, if desired Use openMP’s extended Compiler What happens here? Compile and run !
  • 9. Compiler translation #pragma omp <directive-type> <directive-clauses></n> { …… …..// Block of code executed as per instruction ! }
  • 10. Basic Example in C { … //Sequential } #pragma omp parallel //fork { printf(“Hello from thread %d.n”,omp_get_thread_num()); } //join { … //Sequential }
  • 11. What exactly happens when lines of code are executed in parallel? • A team of threads are created • Each thread can have its own set of private variables • All threads can have shared variables • Original thread : Master Thread • Fork-Join Model • Nested Parallelism
  • 12. openMP LifeCycle – Petrinet model
  • 13. Compiler directives – The Multi Core Magic Spells ! <directive type> Description parallel Each thread will perform same computation as others(replicated computations) for / sections These are called workshare directives. Portions of overall work divided among threads(different computations). They don’t create threads. It has to be enclosed inside a parallel directive for threads to takeover the divided work.
  • 14. Compiler directives – The Multi Core Magic Spells ! • Types of workshare directives for Countable iteration[static] sections One or more sequential sections of code, executed by a single thread single Serializes a section of code
  • 15. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> parallel If(expression) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) NUM_THREADS(integer value)
  • 16. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> for schedule(type, chunk) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) collapse(n) nowait Reduction(operator:list)
  • 17. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> sections private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) reduction(operator:list) nowait
  • 18. Matrix Multiplication using loop directive #pragma omp parallel private(i,j,k) { #pragma omp for for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 19. Scheduling Parallel Loops • Static • Dynamic • Guided • Automatic • Runtime
  • 20. Scheduling Parallel Loops • Static - Amount of work/iteration - same - Set of contiguous chunks in RR fashion - 1 Chunk = x iterations
  • 21. Scheduling Parallel Loops • Dynamic - Amount of work/iteration - Varies - Each thread will grab chunk of iterations and return to grab another chunk when it has executed them. • Guided - Same as dynamic, only difference, - a good proportion of iterations remaining are shared among each thread.
  • 22. Scheduling Parallel Loops • Runtime - Schedule determined using an environment variable. Library routine provided ! • Automatic - Implementation chooses any schedule
  • 23. Matrix Multiplication using loop directive – with a schedule #pragma omp parallel private(i,j,k) { #pragma omp for schedule(static) for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 24. openMP worshare directive – sections int g; void foo(int m, int n) { int p,i; #pragma omp sections firstprivate(g) nowait { #pragma omp section { p=f1(g); for(i=0;i<m;i++) do_stuff; } #pragma omp section { p=f2(g); for(i=0;i<n;i++) do_other_stuff; } } return; }
  • 25. Parallelizing when the no.of Iterations is unknown[dynamic] ! • openMP has a directive called task
  • 26. Explicit Tasks void processList(Node* list) { #pragma omp parallel pragma omp single { Node *currentNode = list; while(currentNode) { #pragma omp task firstprivate(currentNode) doWork(currentNode); currentNode=currentNode->next; } } }
  • 27. Explicit Tasks – Petrinet Model
  • 28. Synchronization • Barrier • Critical • Atomic • Flush
  • 29. Performing Reductions • A loop containing reduction will always be sequential, since each iteration would form a result depending on previous iteration. • openMP allows these loops to be parallelized as long as the developer says, loop contains reduction and indicates the variable and kind of reduction via “Clauses”
  • 30. Without using reduction #pragma omp parallel shared(array,sum) firstprivate(local_sum) { #pragma omp for private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) local_sum+=array[i][j]; } } #pragma omp critical sum+=local_sum; }
  • 31. Using Reductions in openMP sum=0; #pragma omp parallel shared(array) { #pragma omp for reduction(+:sum) private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) sum+=array[i][j]; } }
  • 32. Programming for performance • Use of IF clause before creating parallel regions • Understanding Cache Coherence • Judicious use of parallel and flush • Critical and atomic - know the difference ! • Avoid unnecessary computations in critical region • Use of barrier - a starvation alert !
  • 33. References • NUMA UMA http://vvirtual.wordpress.com/2011/06/13/what-is-numa/ http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/ • openMP basics https://computing.llnl.gov/tutorials/openMP/ • Workshop on openMP SMP, by Tim Mattson from Intel (video) http://www.youtube.com/watch?v=TzERa9GA6vY
  • 34. Interesting links • openMP official page http://openmp.org/wp/ • 32 openMP Traps for C++ Developers http://www.viva64.com/en/a/0054/#ID0EMULM