News

Nvidia reveals Rubin and LPX heterogeneous architecture to challenge AI accelerator startups

Sunday, March 22, 2026 at 04:51 AM

Nvidia has introduced a new heterogeneous inference architecture featuring the Rubin GPU and LPX (LPU) accelerator to compete with AI startup specialized architectures. This design offloads the MLP phase of model decoding to the LPX's 128GB SRAM while keeping the memory-intensive Attention phase (KV cache) on the GPU's HBM. This configuration enables peak throughput exceeding 1,000 tokens per second, a 2x improvement over the Blackwell-based NVL72. The shift aims to close the latency and speed gaps previously exploited by startups like Cerebras, d-Matrix, and MatX. Additionally, Nvidia confirmed a speculative decoding workflow where the LPX generates draft tokens and Rubin GPUs verify them.

Context

At GTC 2026, Nvidia officially integrated the Groq 3 LPU into its Vera Rubin platform, signaling a major shift toward heterogeneous AI architectures. The new LPX inference accelerator, acquired via a $20 billion deal, is designed to work alongside Rubin GPUs to maximize efficiency. By offloading the Feed-Forward Network (FFN) phase of decoding to the LPU's 128GB of SRAM while keeping the Attention phase on the GPU's HBM4, Nvidia claims a 35x improvement in inference throughput per megawatt. This strategy effectively closes the low-latency speed gap previously exploited by startups like Cerebras and d-Matrix. During the keynote, CEO Jensen Huang described the launch as a landmark event, stating: "Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI." This integrated system is expected to ship in H2 2026, delivering up to 10x more revenue opportunity for trillion-parameter models. Notably, the previously announced Rubin CPX was absent from the roadmap, suggesting the Groq-based LPX has replaced it as the primary solution for scaling agentic AI and long-context reasoning.

Sources (13)

NVIDIA Vera Rubin Opens Agentic AI Frontier | NVIDIA Newsroom Inside NVIDIA Groq 3 LPX: The Low-Latency Inference Accelerator for the NVIDIA Vera Rubin Platform | NVIDIA Technical Blog NVIDIA STX - AI-Native Data Platform Architecture NVIDIA Corporation - NVIDIA Vera Rubin Opens Agentic AI Frontier Nvidia GTC 2026: CEO Jensen Huang keynote Blackwell Vera Rubin NVIDIA Groq 3 LPX: Everything we know - StorageReview.com LLM Inference Unveiled: Survey and Roofline Model Insights Vera Rubin – Extreme Co-Design: An Evolution from Grace Blackwell Oberon

Related Companies

Nvidia

NVDA