Imec Memory to Improve GPUs
Much lesser known than Nvidia, Imec is a Belgium based chip R&D firm is bringing interesting improvements in how GPU chips utilize memory. Founded in 1984, Imec has wide expertise around nanoectronics and chip fabrication. This article talks about a new type of GPU memory that Imec has proposed.
Imec proposes 3D charge-coupled device with IGZO channel as buffer memory for data-intensive compute applications. The new compute express link (CXL) type-3 buffer memory promises to vastly surpass DRAM in terms of bit density and cost efficiency
Memory operation is demonstrated on a planar proof-of-concept CCD structure which can store 142 bits. Implementing an oxide semiconductor channel material (such as IGZO) ensures sufficiently long retention time and enables 3D integration in a cost-efficient, 3D NAND-like architecture. Imec expects the 3D CCD memory density to scale far beyond the DRAM limit.
The recent introduction of the compute express link (CXL) memory interface provides opportunities for new memories to complement DRAM in data-intensive compute applications like AI and ML. One example is the CXL type-3 buffer memory, envisioned as an off-chip pool of memories that ‘feeds’ the various processor cores with large data blocks via a high-bandwidth CXL switch. This class of memories meets different specifications than byte-addressable DRAM, which increasingly struggles to maintain the cost-per-bit-trend scaling line.
At IEDM 2024, imec proposes a charge-coupled device (CCD) with IGZO channel integrated into a 3D NAND-Flash string architecture as a promising candidate for CXL type-3 buffer memory – to achieve the required characteristics of block addressability, unlimited endurance, low fabrication cost, and sufficient data retention.
As a first step towards real implementations, imec demonstrated memory operation of the CCD with IGZO on a 2D proof-of-concept. The planar CCD structure of this 2D proof-of-concept consists of an input stage, 142 stages (each consisting of four phase gates) which can each store one bit, and a two-transistor-based read-out stage. The CCD register is written by injecting charges through the input stage and sequentially transferring them through all 142 stages by switching the voltages of the phase gates. The CCD offers more than 200s retention, an endurance of >1010 cycles without degradation, and a charge transfer speed exceeding 6MHz. Multilevel storage capability of the CCD register was also demonstrated, contributing to a higher bit density. Being widely adopted in the image sensor market, the charge-based CCD technology is well-known, reliable, and can operate at low voltages – benefiting power consumption.
The real value of the proposed buffer memory lies in its ability to be integrated into 3D NAND fashion, with IGZO-based CCD registers integrated into vertically aligned plugs. From what is possible with NAND Flash today (i.e., the capability of processing 230 layers), we estimate that our 3D buffer memory can already provide five times more bit density than what (2D) DRAM is expected to offer in 2030. Imec is currently investigating real 3D implementations with limited number of word lines.
I help Series A & B tech founders secure funding with 2x probability and cut cash burn by 40% within 90 days using the Financial Reliability Framework™ | Financial Growth Strategist
4moSuraj, thanks for sharing!
Strategic Pursuit Specialist | Enabling NVIDIA GPU Cluster Builds at SuperPOD Scale | Expert in Liquid-Cooled AI Infrastructure, International Data Centre Deployment, Storage & High-Performance Network Fabrics
6moI don’t want to be ‘that guy’…. (but why change habit of a lifetime 🤔) …. 3090 and 4090 were both 24GB VRAM. But rest is fair - DC GPUs have substantially more ram and vast majority of the consumer GPUs do fall into that range. The big differences also included sub division via MIGs, different FP precisions and just generally ‘more’ of various compute functions and capacity etc
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
7moImec's proposed memory architecture aims for density and power efficiency gains, but will it truly address the latency concerns inherent in large-scale AI models? Recent advancements in sparse model training suggest that reducing memory footprint might be less crucial than optimizing data access patterns. Could this new architecture effectively integrate with sparse model paradigms to achieve even greater performance improvements?