News

DeepSeek introduces Engram module for memory offloading to address DRAM shortages

Monday, January 12, 2026 at 09:12 PM

DeepSeek has introduced the Engram module designed to offload memory, which could alleviate DRAM supply constraints and complement Nvidia's upcoming Vera Rubin architecture.

Context

AI lab DeepSeek has introduced Engram, a new research module designed to break the "DRAM wall" by offloading massive embedding tables from GPU memory to host system RAM. This "conditional memory" approach uses a deterministic O(1) lookup mechanism, allowing next-generation large language models to scale parameter counts without being strictly limited by expensive and scarce High Bandwidth Memory (HBM). This technology is highly strategic for Nvidia and its upcoming Vera Rubin architecture, scheduled for release in 2026. Vera Rubin is designed to address the increasing demand for long-context inference and will feature the Vera CPU with 1.5 TB of memory alongside HBM4-equipped GPUs. By utilizing Engram, developers can leverage this massive system memory to run ultra-large models that would otherwise exceed GPU capacity, effectively lowering the hardware cost barrier for frontier-level AI. The release follows DeepSeek's success in training its 671-billion parameter V3 model on only 2,048 Nvidia H800 GPUs. By optimizing memory lookup and computation, Engram provides a scalable software solution to global semiconductor shortages, ensuring that future hardware like Nvidia's Rubin series can maximize throughput even as model complexity outpaces physical memory growth.

Sources (2)

medium.com tomshardware.com

Related Companies

Nvidia

NVDA