This document discusses performance and synchronization issues in multiprocessor systems. It describes shared memory architectures like UMA, NUMA and distributed shared memory. It discusses factors that affect cache performance like CPU count, cache size and block size. It also discusses synchronization mechanisms like locks, flags and barriers that are used to synchronize access to shared resources. Different hardware primitives for synchronization are described, including atomic exchange, test-and-set, and load-linked/store-conditional instructions.
Related topics: