SlideShare a Scribd company logo
Chapter 8 Shared Memory Multiprocessors
A program consists of a collection of executable sub-program units. These units, which we refer to as  tasks,  are also sometimes called  programming grains.  They must be defined, scheduled, and coordinated by hardware and software before or during program execution.
Basic Issues  Multiprocessors usually are designed for two reasons:  fault tolerance and  program speedup.
These basic issues may  are as follows:  Partitioning. This is the process whereby the original program is decomposed into basic sub-program units or tasks, each of which can be assigned to a separate processor. Partitioning is performed either by programmer directives in the original source program or by the compiler at compile time Scheduling of tasks. Associated with each program is a flow of control among the sub-program units or tasks.  Certain tasks must be completed before others can be initiated (i.e., one is  dependent  on the other). Other tasks represent functions that can be executed independently of the main program execution. The scheduler's run-time function is to arrange the task order of execution in such a way as to minimize overall program execution time.
3. Communication and synchronization. It does the system no good to merely schedule the initiation of various tasks in the proper order, unless the data that the tasks require is made available in an efficient way. Thus, communication time has to be minimized and the receiver task must be aware of the synchronization protocol being used. An issue associated with communications is memory  coherency.  This property ensures that the transmitting and receiving elements have the same, or a coherent, picture of the contents of memory, at least for data which is communicated between the two tasks.
Suppose consider this Suppose a program p is converted into a parallel form, pp. This conversion consists of partitioning pp into a set of tasks,  T i. pp (as partitioned
 
Partitioning  Partitioning is the process of dividing a program into tasks, each of which can be assigned to an individual processor for execution at run time.  These tasks represented as a node. Portioning occur at run time well before execution. Program overhead  (o)  is the added time a task takes to be loaded into a processor prior to beginning execution.
Overhead affects speedup  For each task  Ti , there is an associated number of overhead operations  oi , so that if  Ti  takes  Oi  operations without overhead, then:
In order to achieve speedup over a uniprocessor, a multiprocessor system must achieve the maximum degree of parallelism among executing subtasks or control nodes. On the other hand, if we increase the amount of parallelism by using finer-and finer-grain task sizes, we necessarily increase the amount of overhead. This defines the well known "U" shaped curve for grain size
The effects of grain size.
If uniprocessor program  P1  does operation  O1,  then the parallel version of  P1  does operations  Op ,  where  Op ³  O1. For each task  Ti , there is an associated number of overhead operations  oi , so that if  Ti  takes  Oi  operations without overhead, then:
Clustering Clustering  is the grouping together of sub-tasks into a single assignable task. Clustering is usually performed both at partitioning time and during scheduling run time. :
The reasons for clustering during partition time might include
Moreover, the overhead time is Moreover, the overhead time is: 1.  Configuration dependent.  Different shared memory multiprocessors may have significantly different task overheads associated with them, depending on  cache size, organization, and the way caches are shared. 2. Overhead may be significantly different depending on  how tasks are actually assigned  (scheduled) at run time.
The detection of parallelism itself in the program is achieved by one of three methods:  Explicit statement of concurrency in the higher-level language, as in the use of such languages as CSP (communicating sequential processes) [131] or Occam [75], which allow programmers to delineate the boundaries among tasks that can be executed in parallel, and to specify communications between such tasks.
2. The use of programmer's hints in the source statement, which the compiler may choose to use or ignore.
Dependency  Task List  T 1  T1  T2  T 3 0  -  - 1-  0  - T 2  0  1  1 T 3 Dependency matrix. A 'one' entry indicates a dependency; e.g., in this figure a T2 depends on T1 and T3 depends on T2
8.3 Scheduling  Scheduling can be done either statically (at compile time) or dynamically (at run time)  Usually, it is performed at both times. Static scheduling information can be derived on the basis of the probable critical paths. This alone is insufficient to ensure optimum speedup or even fault tolerance.
Run time scheduling Run-time scheduling can be performed in a number of different ways.  The scheduler itself may run on a particular processor  or it may run on any processor.
Typical run-time information includes information about the dynamic state of the program and the state of the system. The program state may include details provided by the compiler, such as information about the control structure and identification of critical paths or dependencies. Dynamic information includes information about resource availability and work load distribution. Program information must be generated by the program itself, and then gathered by a run-time routine to centralize this information.The major run-time overheads in run-time scheduling include:1. Information gathering.2. Scheduling.
Table 8.2 Scheduling. When:  Scheduling can be performed at: Compile time (+) Advantage Less run time overhead Compiler lacks stall information Disadvantage May not be fault tolerant Run time (+) Advantage More efficient execution Disadvantage Higher overhead
How:  Scheduling can be performed by:ArrangementCommentDesignated  single processor  Simplest, least effort Any single processor  ¯ Multiple processors  Most complex,  potentially most difficult 3. Dynamic execution control. 4. Dynamic data management.
Dynamic execution control  is a provision for dynamic clustering or process creation at run time.  Dynamic data management  provides for the assignment of tasks and processors in such a way as to minimize the required amount of memory overhead delay in accessing data.
The overhead during scheduling is primarily a function of two specific program characteristics:  1.  Program dynamicity   2.  Granularity
8.4 Synchronization and Coherency In practice, a program obeys the synchronization model if and only if:  1. All synchronization operations must be performed before any subsequent memory operation can be performed. 2. All pending memory operations are performed before any synchronization operation is performed. 3. Synchronization operations are sequentially consistent.
8.5 The Effects of Partitioning and Scheduling Overhead When a program is partitioned into tasks, the maximum number of concurrent tasks can be determined. This is simply the maximum number of tasks that can be executed at any one time. It is sometimes called the  degree of parallelism  that exists in the program. Even if a program has a high degree of parallelism, a corresponding degree of speedup may not be achieved. Recall the definition of speedup:
T1  represents the time required for a uniprocessor to execute the program using the best uni processor algorithm.  Tp  is the time it takes for  p  processors to
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

More Related Content

PPTX
Multi processor scheduling
PPT
Multiprocessor scheduling 2
PPT
program partitioning and scheduling IN Advanced Computer Architecture
PDF
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
PPT
Chapter 2 (Part 2)
PPTX
Process in operating system
PPT
process management
Multi processor scheduling
Multiprocessor scheduling 2
program partitioning and scheduling IN Advanced Computer Architecture
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Chapter 2 (Part 2)
Process in operating system
process management

What's hot (20)

PDF
Operating Systems Part II-Process Scheduling, Synchronisation & Deadlock
PPT
Multiprocessor scheduling 1
PPT
Real-Time Scheduling Algorithms
PDF
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
PPT
advanced computer architesture-conditions of parallelism
PPTX
Real time Scheduling in Operating System for Msc CS
PDF
CSI-503 - 3. Process Scheduling
DOC
Bt0070
PDF
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
PDF
Real Time System
PPT
Evaluation of morden computer & system attributes in ACA
PPT
PPT
Chapter 2 part 1
PPTX
Operating System
PPTX
Distributed System Management
PPTX
Schudling os presentaion
PPT
Real time scheduling - basic concepts
PPT
Real time os(suga)
PDF
Operating system concepts ninth edition (2012), chapter 2 solution e1
PPTX
Memory management based on MCA
Operating Systems Part II-Process Scheduling, Synchronisation & Deadlock
Multiprocessor scheduling 1
Real-Time Scheduling Algorithms
SCHEDULING DIFFERENT CUSTOMER ACTIVITIES WITH SENSING DEVICE
advanced computer architesture-conditions of parallelism
Real time Scheduling in Operating System for Msc CS
CSI-503 - 3. Process Scheduling
Bt0070
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
Real Time System
Evaluation of morden computer & system attributes in ACA
Chapter 2 part 1
Operating System
Distributed System Management
Schudling os presentaion
Real time scheduling - basic concepts
Real time os(suga)
Operating system concepts ninth edition (2012), chapter 2 solution e1
Memory management based on MCA
Ad

Similar to Unit 8 (20)

PDF
Program and Network Properties
PPT
programnetwork_properties-parallelism_ch2.ppt
PPTX
Lec 4 (program and network properties)
PDF
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
PPT
Unit-3.ppt
PPTX
PPTX
PPT
SecondPresentationDesigning_Parallel_Programs.ppt
PPT
Grain Packing & scheduling Ch2 Hwang - Copy.ppt
PPT
BIL406-Chapter-6-Basic Parallelism and CPU.ppt
PDF
Report on High Performance Computing
PPT
1.prallelism
PPT
1.prallelism
PDF
Introduction to Parallel Computing
PPT
Chap5 slides
PDF
Lecture 2 more about parallel computing
PPTX
Performance measures
DOC
Aca module 1
PPTX
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
PDF
Example : parallelize a simple problem
Program and Network Properties
programnetwork_properties-parallelism_ch2.ppt
Lec 4 (program and network properties)
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
Unit-3.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
Grain Packing & scheduling Ch2 Hwang - Copy.ppt
BIL406-Chapter-6-Basic Parallelism and CPU.ppt
Report on High Performance Computing
1.prallelism
1.prallelism
Introduction to Parallel Computing
Chap5 slides
Lecture 2 more about parallel computing
Performance measures
Aca module 1
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
Example : parallelize a simple problem
Ad

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Modernizing your data center with Dell and AMD
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Unit 8

  • 1. Chapter 8 Shared Memory Multiprocessors
  • 2. A program consists of a collection of executable sub-program units. These units, which we refer to as tasks, are also sometimes called programming grains. They must be defined, scheduled, and coordinated by hardware and software before or during program execution.
  • 3. Basic Issues Multiprocessors usually are designed for two reasons: fault tolerance and program speedup.
  • 4. These basic issues may are as follows: Partitioning. This is the process whereby the original program is decomposed into basic sub-program units or tasks, each of which can be assigned to a separate processor. Partitioning is performed either by programmer directives in the original source program or by the compiler at compile time Scheduling of tasks. Associated with each program is a flow of control among the sub-program units or tasks. Certain tasks must be completed before others can be initiated (i.e., one is dependent on the other). Other tasks represent functions that can be executed independently of the main program execution. The scheduler's run-time function is to arrange the task order of execution in such a way as to minimize overall program execution time.
  • 5. 3. Communication and synchronization. It does the system no good to merely schedule the initiation of various tasks in the proper order, unless the data that the tasks require is made available in an efficient way. Thus, communication time has to be minimized and the receiver task must be aware of the synchronization protocol being used. An issue associated with communications is memory coherency. This property ensures that the transmitting and receiving elements have the same, or a coherent, picture of the contents of memory, at least for data which is communicated between the two tasks.
  • 6. Suppose consider this Suppose a program p is converted into a parallel form, pp. This conversion consists of partitioning pp into a set of tasks, T i. pp (as partitioned
  • 7.  
  • 8. Partitioning Partitioning is the process of dividing a program into tasks, each of which can be assigned to an individual processor for execution at run time. These tasks represented as a node. Portioning occur at run time well before execution. Program overhead (o) is the added time a task takes to be loaded into a processor prior to beginning execution.
  • 9. Overhead affects speedup For each task Ti , there is an associated number of overhead operations oi , so that if Ti takes Oi operations without overhead, then:
  • 10. In order to achieve speedup over a uniprocessor, a multiprocessor system must achieve the maximum degree of parallelism among executing subtasks or control nodes. On the other hand, if we increase the amount of parallelism by using finer-and finer-grain task sizes, we necessarily increase the amount of overhead. This defines the well known "U" shaped curve for grain size
  • 12. If uniprocessor program P1 does operation O1, then the parallel version of P1 does operations Op , where Op ³ O1. For each task Ti , there is an associated number of overhead operations oi , so that if Ti takes Oi operations without overhead, then:
  • 13. Clustering Clustering is the grouping together of sub-tasks into a single assignable task. Clustering is usually performed both at partitioning time and during scheduling run time. :
  • 14. The reasons for clustering during partition time might include
  • 15. Moreover, the overhead time is Moreover, the overhead time is: 1. Configuration dependent. Different shared memory multiprocessors may have significantly different task overheads associated with them, depending on cache size, organization, and the way caches are shared. 2. Overhead may be significantly different depending on how tasks are actually assigned (scheduled) at run time.
  • 16. The detection of parallelism itself in the program is achieved by one of three methods: Explicit statement of concurrency in the higher-level language, as in the use of such languages as CSP (communicating sequential processes) [131] or Occam [75], which allow programmers to delineate the boundaries among tasks that can be executed in parallel, and to specify communications between such tasks.
  • 17. 2. The use of programmer's hints in the source statement, which the compiler may choose to use or ignore.
  • 18. Dependency Task List T 1 T1 T2 T 3 0 - - 1- 0 - T 2 0 1 1 T 3 Dependency matrix. A 'one' entry indicates a dependency; e.g., in this figure a T2 depends on T1 and T3 depends on T2
  • 19. 8.3 Scheduling Scheduling can be done either statically (at compile time) or dynamically (at run time) Usually, it is performed at both times. Static scheduling information can be derived on the basis of the probable critical paths. This alone is insufficient to ensure optimum speedup or even fault tolerance.
  • 20. Run time scheduling Run-time scheduling can be performed in a number of different ways. The scheduler itself may run on a particular processor or it may run on any processor.
  • 21. Typical run-time information includes information about the dynamic state of the program and the state of the system. The program state may include details provided by the compiler, such as information about the control structure and identification of critical paths or dependencies. Dynamic information includes information about resource availability and work load distribution. Program information must be generated by the program itself, and then gathered by a run-time routine to centralize this information.The major run-time overheads in run-time scheduling include:1. Information gathering.2. Scheduling.
  • 22. Table 8.2 Scheduling. When: Scheduling can be performed at: Compile time (+) Advantage Less run time overhead Compiler lacks stall information Disadvantage May not be fault tolerant Run time (+) Advantage More efficient execution Disadvantage Higher overhead
  • 23. How: Scheduling can be performed by:ArrangementCommentDesignated single processor Simplest, least effort Any single processor ¯ Multiple processors Most complex, potentially most difficult 3. Dynamic execution control. 4. Dynamic data management.
  • 24. Dynamic execution control is a provision for dynamic clustering or process creation at run time. Dynamic data management provides for the assignment of tasks and processors in such a way as to minimize the required amount of memory overhead delay in accessing data.
  • 25. The overhead during scheduling is primarily a function of two specific program characteristics: 1. Program dynamicity 2. Granularity
  • 26. 8.4 Synchronization and Coherency In practice, a program obeys the synchronization model if and only if: 1. All synchronization operations must be performed before any subsequent memory operation can be performed. 2. All pending memory operations are performed before any synchronization operation is performed. 3. Synchronization operations are sequentially consistent.
  • 27. 8.5 The Effects of Partitioning and Scheduling Overhead When a program is partitioned into tasks, the maximum number of concurrent tasks can be determined. This is simply the maximum number of tasks that can be executed at any one time. It is sometimes called the degree of parallelism that exists in the program. Even if a program has a high degree of parallelism, a corresponding degree of speedup may not be achieved. Recall the definition of speedup:
  • 28. T1 represents the time required for a uniprocessor to execute the program using the best uni processor algorithm. Tp is the time it takes for p processors to
  • 29.  
  • 30.  
  • 31.  
  • 32.  
  • 33.  
  • 34.  
  • 35.  
  • 36.  
  • 37.  
  • 38.  
  • 39.  
  • 40.  
  • 41.  
  • 42.  
  • 43.  
  • 44.  
  • 45.  
  • 46.  
  • 47.