Rumor

Nvidia Rubin GPU memory capacity expected to reach 288 GB of HBM4 as memory hierarchy constraints persist

Tuesday, March 17, 2026 at 07:46 AM

A discussion on memory architecture constraints for AI accelerators highlights that even with high SRAM capacity in the Groq LPU 3 (approximately 500 MB), it cannot replace the massive capacity provided by HBM4 in next-generation GPUs like Nvidia's Rubin, which is expected to feature 288 GB. The gap between on-chip SRAM and off-chip HBM remains several orders of magnitude, reinforcing the necessity of deep memory hierarchies in high-performance AI infrastructure.

Context

As of March 29, 2026, Nvidia has finalized the architectural specifications for its next-generation Vera Rubin platform, which is scheduled for initial delivery in H2 2026. The flagship Rubin GPU will feature 288 GB of HBM4 memory, a significant leap that targets the persistent memory bottleneck in large-scale AI training. While this matches the capacity of the previous Blackwell Ultra, the shift to HBM4 provides a massive bandwidth increase to 22 TB/s, allowing the chip to deliver up to 50 PFLOPS of FP4 compute performance—a 3.33x improvement over the B300. To address the inherent limitations of the memory hierarchy, Nvidia has officially integrated the Groq 3 LPU as the seventh chip in the Vera Rubin ecosystem. By pairing the Rubin GPU’s high-capacity HBM4 with the Groq 3’s ultra-fast 500 MB of on-chip SRAM, the platform splits the inference workload: GPUs handle the memory-intensive prefill phase, while LPUs manage high-speed token decoding. This hybrid architecture aims for a 10x reduction in inference token costs compared to the Blackwell generation.

Related Companies

Nvidia
Nvidia
NVDA
US