SlideShare a Scribd company logo
5
Most read
6
Most read
7
Most read
- B Y G A U R A V D A L V I
R O L L N O : 9 1 0 3
M E ( I T )
Multiprocessors
(performance and synchronization
issues.)
Symmetric shared memory architecture
 UMA.
Figure 1:UMA Architecture [1].
 NUMA.
Figure 2: NUMA Architecture [1].
 Distributed Shared Memory Architecture.
Figure 3:Distributed Shared Memory Architecture [2].
Performance Issues.
 Cache performance is depended on :
 Behavior of uniprocessor cache miss traffic.
 Traffic caused by communication.
 Factors affecting the two components of miss rate:
 Changing the CPU count.
 Cache size.
 Block size.
 The misses that arise from interprocessor
communication, which are often
called coherence misses, can be broken into two
separate sources.
 True Sharing Misses.
 False Sharing Misses [3].
Synchronization issues.
 Synchronization mechanisms are typically built with user-
level software routines that rely on hardware supplied
synchronization instructions.
 For smaller multiprocessors or low-contention situations,
instruction sequence capable of atomically retrieving.
 In larger-scale multiprocessors or high-contention
situations, synchronization can become a performance
bottleneck [4].
Types of Synchronization.
 Mutual exclusion.
 Synchronize entry into critical sections.
 Normally done with locks.
 Point-to-point synchronization.
 Tell a set of processors (normally set cardinality is one) that
they can proceed.
 Normally done with flags.
 Global synchronization.
 Bring every processor to sync.
 Wait at a point until everyone is there.
 Normally done with barriers [4].
Basic Hardware Primitives.
 Atomic Exchange.
addi register, r0, 0x1 /* r0 is hardwired to 0 */
Lock: xchg register, lock_addr /* An atomic load and store */
bnez register, Lock
Unlock remains unchanged
 Various processors support this type of instruction
 Intel x86 has xchg , Sun UltraSPARC has ldstub (load-store-
unsigned byte), UltraSPARC also has swap.
 Normally easy to implement for bus-based systems: whoever wins
the bus for xchg can lock the bus.
 Difficult to support in distributed memory systems [4].
 Test and Set
 which tests a value and sets it if the value passes the test.
 For example, we could define an operation that tested for 0
and set the value to 1, which can be used in a fashion similar to
how we used atomic exchange [4].
 Fetch-and-increment.
Algorithm:
<< atomic >> function FetchAndAdd(address location, int inc)
{ int value := *location
*location := value + inc return value }
 To implement a mutual exclusion lock, we define the operation
FetchAndIncrement, which is equivalent to FetchAndAdd with inc=1.
With this operation, a mutual exclusion lock can be implemented using
the ticket lock algorithm as:
 The pair of instructions includes a special load called
a load linked or load locked and a special store called
a store conditional.
 These instructions are used in sequence: If the contents of the
memory location specified by the load linked are changed
before the store conditional to the same address occurs, then
the store conditional fails.
 The store conditional is defined to return 1 if it was successful
and a 0 otherwise [4].

References
1. David E.Ott “Optimizing Software Applications for NUMA ”
Internet:http://www.drdobbs.com/go-
parallel/article/print?articleId=218401502, July 10 2009[Jan
29,2015].
2. Prof. H.P.Oscer “Technical Design Issues” Internet:
http://www.oser.org/~hp/ds/node15.html, June 08 2001 [Jan
29,2015].
3. John L. Hennessy , David A. Patterson. “Multiprocessors and Thread
Level Parallelism” in “Computer Architecture: A Quantitative
Approach”, 4th
edition, Morgan Kaufmann Publishers: San Francisco,
2007, pp. 218-219.
4. Prof. Rajat Moona, Dr. Mainak Chaudhuri, Prof. Sanjeev K.
Aggarwal, “Program Optimization for Multi-core Architectures”
Internet: http://nptel.ac.in/courses/106104025/13, [Jan 29,2015].

More Related Content

PPTX
System calls
PPT
Parallel processing
PPTX
Swap space management and protection in os
DOCX
Leaky bucket algorithm
PPTX
Computer architecture page replacement algorithms
PPTX
Bus aribration
PPT
Contiguous Memory Allocation.ppt
PPT
process creation OS
System calls
Parallel processing
Swap space management and protection in os
Leaky bucket algorithm
Computer architecture page replacement algorithms
Bus aribration
Contiguous Memory Allocation.ppt
process creation OS

What's hot (20)

PPTX
Operating system paging and segmentation
PPTX
Dead Lock in operating system
PPTX
Auxiliary memory
PPTX
DMA operation
PPTX
Memory Organization
PPTX
Directory structure
PPT
Shared memory
PPTX
Computer system architecture
PPTX
Data flow architecture
PPTX
Direct Memory Access(DMA)
PDF
Centralized shared memory architectures
PPT
Pipeline hazard
PPTX
Superscalar & superpipeline processor
PPTX
Chapter 03 arithmetic for computers
PPT
OS Process and Thread Concepts
PPT
Multiprocessor Systems
PPT
Chapter 12 - Mass Storage Systems
PPTX
Direct access memory
PPS
Virtual memory
PPTX
Cache coherence problem and its solutions
Operating system paging and segmentation
Dead Lock in operating system
Auxiliary memory
DMA operation
Memory Organization
Directory structure
Shared memory
Computer system architecture
Data flow architecture
Direct Memory Access(DMA)
Centralized shared memory architectures
Pipeline hazard
Superscalar & superpipeline processor
Chapter 03 arithmetic for computers
OS Process and Thread Concepts
Multiprocessor Systems
Chapter 12 - Mass Storage Systems
Direct access memory
Virtual memory
Cache coherence problem and its solutions
Ad

Viewers also liked (20)

PPT
Exokernel operating systems
PPTX
SYNCHRONIZATION IN MULTIPROCESSING
PDF
Pbcbt an improvement of ntbcbt algorithm
PPTX
Mutual Exclusion
PPTX
Lecture 4
PDF
Mutual Exclusion in Wireless Sensor and Actor Networks
PPTX
Chapter05 new
PDF
A New Function-based Framework for Classification and Evaluation of Mutual Ex...
PPSX
Mutual exclusion and synchronization
PPT
Inter process communication
PPT
dos mutual exclusion algos
PPTX
Mutual Exclusion using Peterson's Algorithm
PPT
Smp and asmp architecture.
PPT
Mutual exclusion
PPT
Lamport’s algorithm for mutual exclusion
PPT
Semaphores OS Basics
PDF
Jebathotta jeyageethangal lyrics book
PPT
Mutual Exclusion Election (Distributed computing)
PPTX
Multiprocessor
PDF
Semaphores
Exokernel operating systems
SYNCHRONIZATION IN MULTIPROCESSING
Pbcbt an improvement of ntbcbt algorithm
Mutual Exclusion
Lecture 4
Mutual Exclusion in Wireless Sensor and Actor Networks
Chapter05 new
A New Function-based Framework for Classification and Evaluation of Mutual Ex...
Mutual exclusion and synchronization
Inter process communication
dos mutual exclusion algos
Mutual Exclusion using Peterson's Algorithm
Smp and asmp architecture.
Mutual exclusion
Lamport’s algorithm for mutual exclusion
Semaphores OS Basics
Jebathotta jeyageethangal lyrics book
Mutual Exclusion Election (Distributed computing)
Multiprocessor
Semaphores
Ad

Similar to Multiprocessors(performance and synchronization issues) (20)

PDF
KA 5 - Lecture 1 - Parallel Processing.pdf
PPT
Executing Multiple Thread on Modern Processor
PPTX
Interactions complicate debugging
PPT
10 Multicore 07
PDF
2021Arch_15_Ch5_3_Syncronization.pdf Synchronization in Multiprocessor
PPTX
Multiprocessors and Thread-Level Parallelism.pptx
PPTX
Introduction to Thread Level Parallelism
PPT
Distributed shared memory in distributed systems.ppt
PPTX
PPTX
PPTX
PPTX
Coa presentation5
PPTX
6.distributed shared memory
PPTX
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
PPTX
Interprocess Communication
PPTX
Memory model
PPTX
Synchronization problem with threads
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PDF
PPT
Multicore Processors
KA 5 - Lecture 1 - Parallel Processing.pdf
Executing Multiple Thread on Modern Processor
Interactions complicate debugging
10 Multicore 07
2021Arch_15_Ch5_3_Syncronization.pdf Synchronization in Multiprocessor
Multiprocessors and Thread-Level Parallelism.pptx
Introduction to Thread Level Parallelism
Distributed shared memory in distributed systems.ppt
Coa presentation5
6.distributed shared memory
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
Interprocess Communication
Memory model
Synchronization problem with threads
CS9222 ADVANCED OPERATING SYSTEMS
Multicore Processors

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
A Presentation on Artificial Intelligence
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.
A Presentation on Artificial Intelligence
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25-Week II
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
A comparative analysis of optical character recognition models for extracting...
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Multiprocessors(performance and synchronization issues)

  • 1. - B Y G A U R A V D A L V I R O L L N O : 9 1 0 3 M E ( I T ) Multiprocessors (performance and synchronization issues.)
  • 2. Symmetric shared memory architecture  UMA. Figure 1:UMA Architecture [1].
  • 3.  NUMA. Figure 2: NUMA Architecture [1].
  • 4.  Distributed Shared Memory Architecture. Figure 3:Distributed Shared Memory Architecture [2].
  • 5. Performance Issues.  Cache performance is depended on :  Behavior of uniprocessor cache miss traffic.  Traffic caused by communication.  Factors affecting the two components of miss rate:  Changing the CPU count.  Cache size.  Block size.
  • 6.  The misses that arise from interprocessor communication, which are often called coherence misses, can be broken into two separate sources.  True Sharing Misses.  False Sharing Misses [3].
  • 7. Synchronization issues.  Synchronization mechanisms are typically built with user- level software routines that rely on hardware supplied synchronization instructions.  For smaller multiprocessors or low-contention situations, instruction sequence capable of atomically retrieving.  In larger-scale multiprocessors or high-contention situations, synchronization can become a performance bottleneck [4].
  • 8. Types of Synchronization.  Mutual exclusion.  Synchronize entry into critical sections.  Normally done with locks.  Point-to-point synchronization.  Tell a set of processors (normally set cardinality is one) that they can proceed.  Normally done with flags.  Global synchronization.  Bring every processor to sync.  Wait at a point until everyone is there.  Normally done with barriers [4].
  • 9. Basic Hardware Primitives.  Atomic Exchange. addi register, r0, 0x1 /* r0 is hardwired to 0 */ Lock: xchg register, lock_addr /* An atomic load and store */ bnez register, Lock Unlock remains unchanged  Various processors support this type of instruction  Intel x86 has xchg , Sun UltraSPARC has ldstub (load-store- unsigned byte), UltraSPARC also has swap.  Normally easy to implement for bus-based systems: whoever wins the bus for xchg can lock the bus.  Difficult to support in distributed memory systems [4].
  • 10.  Test and Set  which tests a value and sets it if the value passes the test.  For example, we could define an operation that tested for 0 and set the value to 1, which can be used in a fashion similar to how we used atomic exchange [4].
  • 11.  Fetch-and-increment. Algorithm: << atomic >> function FetchAndAdd(address location, int inc) { int value := *location *location := value + inc return value }
  • 12.  To implement a mutual exclusion lock, we define the operation FetchAndIncrement, which is equivalent to FetchAndAdd with inc=1. With this operation, a mutual exclusion lock can be implemented using the ticket lock algorithm as:
  • 13.  The pair of instructions includes a special load called a load linked or load locked and a special store called a store conditional.  These instructions are used in sequence: If the contents of the memory location specified by the load linked are changed before the store conditional to the same address occurs, then the store conditional fails.  The store conditional is defined to return 1 if it was successful and a 0 otherwise [4]. 
  • 14. References 1. David E.Ott “Optimizing Software Applications for NUMA ” Internet:http://www.drdobbs.com/go- parallel/article/print?articleId=218401502, July 10 2009[Jan 29,2015]. 2. Prof. H.P.Oscer “Technical Design Issues” Internet: http://www.oser.org/~hp/ds/node15.html, June 08 2001 [Jan 29,2015]. 3. John L. Hennessy , David A. Patterson. “Multiprocessors and Thread Level Parallelism” in “Computer Architecture: A Quantitative Approach”, 4th edition, Morgan Kaufmann Publishers: San Francisco, 2007, pp. 218-219. 4. Prof. Rajat Moona, Dr. Mainak Chaudhuri, Prof. Sanjeev K. Aggarwal, “Program Optimization for Multi-core Architectures” Internet: http://nptel.ac.in/courses/106104025/13, [Jan 29,2015].

Editor's Notes

  • #3: UMA gets its name from the fact that each processor must use the same shared bus to access memory, resulting in a memory access time that is uniform across all processors. Note that access time is also independent of data location within memory. That is, access time remains the same regardless of which shared memory module contains the data to be retrieved.
  • #4: In the NUMA shared memory architecture, each processor has its own local memory module that it can access directly and with a distinctive performance advantage. At the same time, it can also access any memory module belonging to another processor using a shared bus (or some other type of interconnect) as seen in the diagram below: What gives NUMA its name is that memory access time varies with the location of the data to be accessed. If data resides in local memory, access is fast. If data resides in remote memory, access is slower. The advantage of the NUMA architecture as a hierarchical shared memory scheme is its potential to improve average case access time through the introduction of fast, local memory.
  • #5: In computer architecture, distributed shared memory (DSM) is a form of memory architecture where the (physically separate) memories can be addressed as one (logically shared) address space. Here, the term shared does not mean that there is a single centralized memory but shared essentially means that the address space is shared (same physical address on two processors refers to the same location in memory).[1] Distributed Global Address Space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's non-shared private memory.
  • #7: The first source is the so-called true sharing misses that arise from the communication of data through the cache coherence mechanism. They directly arise from the sharing of data among processors. ·         The second effect, called false sharing, arises from the use of an invalidation based coherence algorithm with a single valid bit per cache block.
  • #8:  Synchronization mechanisms are typically built with user-level software routines that rely on hardware supplied synchronization instructions.      For smaller multiprocessors or low-contention situations, the key hardware capability is an uninterruptible instruction or instruction sequence capable of atomically retrieving.      In larger-scale multiprocessors or high-contention situations, synchronization can become a performance bottleneck because contention introduces additional delays and because latency is potentially greater in such a multiprocessor.
  • #10: The key ability we require to implement synchronization in a multiprocessor is   asset of hardware primitives with the ability to atomically read and modify a memory location.      There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically      These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers.