1) Pipelining is a technique used in CPU design to improve throughput by allowing subsequent instructions to begin execution before previous instructions have finished. This document uses an example of laundry to illustrate how pipelining reduces the time to complete multiple loads from 6 hours to 3.5 hours.
2) Advances like superscalar, multi-core, and many-core architectures have attempted to improve CPU performance by executing multiple instructions simultaneously. However, fundamental limits like Amdahl's Law mean speedups from parallelism are limited by the fraction of a workload that can be parallelized.
3) GPUs and Intel's Xeon Phi coprocessor employ even more parallelism through massively multith