News

AMD details architectural improvements for Instinct MI355X throughput efficiency

Thursday, February 12, 2026 at 04:15 PM

During ISSCC 2026, AMD detailed architectural improvements in the Instinct MI355X accelerator that allow it to double throughput per compute unit. This efficiency gain enables higher performance even with a reduced number of total compute units compared to previous generations.

Context

AMD revealed architectural breakthroughs for its Instinct MI355X at ISSCC 2026, detailing how it doubled per-compute unit (CU) throughput despite a lower total CU count. By redesigning matrix execution hardware, the company increased FP8 performance to 8,192 FLOPS per clock. The shift to a "power-of-two" structure with 32 CUs per die—down from 38 in the MI300X—optimizes AI kernel partitioning and workload tiling, significantly reducing the performance penalties caused by uneven data distribution. Built on the CDNA 4 architecture using TSMC 3nm nodes, the MI355X delivers 5 petaflops of FP8 compute. A key competitive advantage is its 288GB of HBM3E memory and 8 TB/s bandwidth, which exceeds the NVIDIA B200 and enables larger models to run without complex distribution across multiple GPUs. While peak board power has reached 1400W, these efficiency gains and simplified memory management position AMD as a primary high-density alternative for data centers scaling frontier AI models through 2026.

Sources (1)

tomshardware.com

Related Companies

AMD