News

Nvidia reveals context memory and zero-downtime maintenance for rack-scale systems

Tuesday, January 6, 2026 at 11:59 PM

Nvidia has introduced a context memory storage platform and zero-downtime maintenance features designed for rack-scale computing environments to support AI infrastructure reliability.

Context

Nvidia CEO Jensen Huang unveiled the Vera Rubin platform at CES 2026, introducing a specialized "Inference Context Memory Storage Platform" and "zero-downtime" maintenance for NVL72 rack-scale systems. This new storage tier, powered by BlueField-4 DPUs and Spectrum-X networking, offloads KV cache to improve inference throughput by up to 5x while cutting token costs by 10x compared to the previous Blackwell generation. The technology specifically addresses scaling bottlenecks found in long-context and agentic AI applications. The architecture also features a modular, cable-free design that allows for 18x faster assembly and servicing. Crucially, the "zero-downtime" maintenance feature lets operators swap NVLink switch trays without taking the entire rack offline, ensuring continuous operation for massive AI clusters. Delivering 3.6 EFLOPS of inference compute and utilizing HBM4 memory, the Vera Rubin systems are slated for volume production and partner availability starting in the second half of 2026.

Related Companies

Nvidia
Nvidia
NVDA
US