News

Google adopts Clos network architecture for TPU infrastructure

Tuesday, December 30, 2025 at 08:02 AM

Google is implementing Clos network architecture for its Tensor Processing Unit (TPU) clusters. This infrastructure shift aims to improve scalability and accessibility for external cloud customers utilizing Google's AI hardware.

Context

Google is transitioning its proprietary TPU infrastructure from a 3D-torus topology to a Clos network architecture, a move designed to simplify access for external cloud customers. This shift, integrated into the latest TPU v5p deployments, replaces rigid interconnect constraints with a more flexible, non-blocking fabric. By adopting this industry-standard networking design, Google eliminates the need for developers to map AI workloads to specific physical grid shapes, significantly lowering the barrier to entry for third-party enterprises. This architectural pivot is a strategic attempt to better compete with NVIDIA by making Google Cloud infrastructure more versatile for large-scale generative AI training. The Clos-based clusters can now scale to 8,960 chips per pod, offering high bisection bandwidth and improved reliability. By prioritizing ease of use alongside raw performance, Google aims to capture more of the merchant silicon market, providing a seamless scaling path for developers who previously found proprietary TPU configurations too complex for standard production environments.

Related Companies

Google
Google
GOOGL
US