News

Software optimization improves SRAM utilization to reduce HBM requirements for Nvidia and Google hardware

Wednesday, March 25, 2026 at 01:28 AM

A new software optimization technique improves on-chip SRAM utilization, potentially reducing the reliance on high bandwidth memory (HBM) for AI workloads. The technology is reported to be hardware-agnostic, supporting both Google TPUs and Nvidia GPUs.

Context

A new software optimization layer has demonstrated the ability to significantly improve SRAM utilization across diverse hardware architectures, including Nvidia GPUs and Google TPUs. By maximizing on-chip memory efficiency, the software reduces the reliance on external High Bandwidth Memory (HBM), which remains one of the most expensive and supply-constrained components in the AI data center. This hardware-agnostic breakthrough allows developers to achieve higher throughput without requiring the massive HBM capacities found in flagship chips like the H100 or Trillium. This development is particularly timely as the industry faces a 'memory wall' where HBM costs can account for over 20% of total bill-of-materials. While Nvidia CEO Jensen Huang recently emphasized that HBM offers essential flexibility for evolving workloads, these software-driven gains in SRAM efficiency could shift the competitive landscape for inference. By alleviating memory bottlenecks through code rather than hardware, companies may extend the lifecycle of current silicon and reduce the premium paid for next-generation HBM4-equipped accelerators.

Sources (8)

H100 GPU | NVIDIA Nvidia CEO Jensen Huang explains why SRAM isn't here to eat HBM's lunch — high bandwidth memory offers more flexibility in AI deployments across a range of workloads | Tom's Hardware Scaling the Memory Wall: The Rise and Roadmap of HBM SRAM In AI: The Future Of Memory - Semiconductor Engineering A Case Study in the Role of Circuit-Microarchitecture Co-Design in ...Nvidia Stock to See New Growth Catalyst; 35X Faster AI with Groq 3 LPX Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device | Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture How do the NVIDIA A100 and H100 GPUs compare to Google\'s TPU v4 and TPU v5 in terms of performance and efficiency for large language models? - Massed Compute

Related Companies

Nvidia

NVDA

Google

GOOGL