Below is a roughly 3000‐word summary of Lecture 16 (13th March 2025) on ML Accelerators with In-Memory Computing (IMC) from the “E0 294: Systems for Machine Learning” course. This lecture delves into the architecture and design of emerging ML accelerators that leverage in-memory computing to overcome data movement bottlenecks, reduce latency, and improve energy efficiency. The discussion covers the fundamentals of IMC, detailed case studies of architectures such as ISAAC, PipeLayer, and AtomLayer, and the challenges associated with designing and implementing these systems.
──────────────────────────────
Overview and Motivation
In modern machine learning systems, especially those deployed on mobile platforms and edge devices, energy efficiency and low latency are critical. Traditional computing architectures suffer from the “memory wall” problem: the energy and time cost associated with moving data between memory and processing units. In-memory computing (IMC) is presented as an innovative solution that combines storage and computation in a single unit, reducing or even eliminating costly data transfers.
This lecture emphasizes that the drive toward IMC is motivated by the need to develop accelerators that not only perform inference efficiently but also support training. With emerging technologies like resistive random access memory (ReRAM) and memristors, IMC architectures promise to bring analog computation into the mainstream, offering significant improvements in power efficiency and performance for deep neural network (DNN) applications.
──────────────────────────────
Fundamentals of In-Memory Computing (IMC)
The lecture begins with an introduction to the concept of in-memory computing. Traditional digital systems separate memory and compute, incurring energy penalties during data transfers. IMC, in contrast, integrates computation directly into the memory array, enabling operations to be performed in situ. Two primary forms of analog computation are discussed:
1. **Resistive (Current-Based) Computing:** Here, the conductance of memory cells—often based on ReRAM—is used to represent weights. Computation, such as multiply-accumulate operations, is performed by leveraging the analog properties of these devices.
2. **Capacitive (Charge-Based) Computing:** In this model, stored charge is used to perform operations, though the lecture primarily focuses on resistive methods.
A key advantage of IMC is that the weights are stored as conductance values (G = 1/R) in the memory cells, meaning that the same physical device performs both storage and computation. This results in dramatic energy savings because data does not have to shuttle back and forth between separate memory and processing units.
The lecture underscores that although analog computing introduces challenges—such as noise and precision limitations—the potential energy and performance benefits make it an attractive avenue for accelerating