This document summarizes the results of experiments measuring the performance impact of cache line contention on large NUMA systems with multiple sockets and cores. The experiments show that:
1) Cache line contention performance degrades smoothly as more cores within a socket participate in contention.
2) Involving cores from multiple sockets in contention causes a huge drop in performance due to increased latency of resolving contention across sockets.
3) Containing cache line contention within as few sockets as possible provides the best performance, with all contention within a single socket performing best.
Related topics: