Rumor
Nvidia Rubin GPU memory capacity expected to reach 288 GB of HBM4 as memory hierarchy constraints persist
Tuesday, March 17, 2026 at 07:46 AM
A discussion on memory architecture constraints for AI accelerators highlights that even with high SRAM capacity in the Groq LPU 3 (approximately 500 MB), it cannot replace the massive capacity provided by HBM4 in next-generation GPUs like Nvidia's Rubin, which is expected to feature 288 GB. The gap between on-chip SRAM and off-chip HBM remains several orders of magnitude, reinforcing the necessity of deep memory hierarchies in high-performance AI infrastructure.
Context
As of March 29, 2026, Nvidia has finalized the architectural specifications for its next-generation Vera Rubin platform, which is scheduled for initial delivery in H2 2026. The flagship Rubin GPU will feature 288 GB of HBM4 memory, a significant leap that targets the persistent memory bottleneck in large-scale AI training. While this matches the capacity of the previous Blackwell Ultra, the shift to HBM4 provides a massive bandwidth increase to 22 TB/s, allowing the chip to deliver up to 50 PFLOPS of FP4 compute performance—a 3.33x improvement over the B300.
To address the inherent limitations of the memory hierarchy, Nvidia has officially integrated the Groq 3 LPU as the seventh chip in the Vera Rubin ecosystem. By pairing the Rubin GPU’s high-capacity HBM4 with the Groq 3’s ultra-fast 500 MB of on-chip SRAM, the platform splits the inference workload: GPUs handle the memory-intensive prefill phase, while LPUs manage high-speed token decoding. This hybrid architecture aims for a 10x reduction in inference token costs compared to the Blackwell generation.
Sources (9)
Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical BlogRack-Scale Agentic AI Supercomputer | NVIDIA Vera Rubin NVL72Speed over scale: Samsung pulls ahead with Nvidia’s HBM4 - The Korea HeraldVera Rubin – Extreme Co-Design: An Evolution from Grace Blackwell OberonHow Nvidia's $20 billion Groq 3 LPU deal reshapes the Nvidia Vera Rubin Platform — Samsung 4nm process serves as bedrock for SRAM-based AI accelerator chip | Tom's HardwareNVIDIA Rubin R100: Specs, Architecture, and GPU Cloud Availability | Spheron Blog
From AI Data Centres to Your Next Smartphone: The Memory Bottleneck Is Everyone’s Problem
- Centre for International Governance Innovation
GTC 2026 Preview | Implications of Nvidia's SRAM-Decode Hardware on the Inference Market
Related Companies
Nvidia
NVDA