News

Nvidia and Groq LPU systems enable heterogeneous inference for AI infrastructure

Sunday, March 22, 2026 at 01:20 PM

Discussion regarding the implementation of heterogeneous inference using Nvidia hardware and Groq LPUs, focusing on the technical integration and the high volume of physical connectors required for such infrastructure setups.

Context

At GTC 2026, Nvidia officially integrated Groq's technology into its flagship roadmap, unveiling the Groq 3 LPX rack-scale inference accelerator. This follow-up to Nvidia's $20 billion licensing and "acquihire" deal from December 2025 marks a strategic shift toward heterogeneous AI infrastructure. The LPX system is designed to complement the Vera Rubin NVL72 platform, offloading latency-sensitive "decode" tasks to Groq's Linear Processor Units (LPUs) while Rubin GPUs handle heavy prefill and training workloads. This architecture aims to deliver up to 35x higher inference throughput per megawatt and is optimized for the 1,000 tokens per second speeds required by next-generation agentic AI. The LPX racks are fully liquid-cooled and built on Nvidia's MGX modular infrastructure, with a planned rollout in the second half of 2026. By incorporating Groq's deterministic silicon, Nvidia is moving to dominate the high-growth inference market and neutralize emerging competitors in the low-latency compute space.

Related Companies

Nvidia
Nvidia
NVDA
US