News

Amazon utilizes Elastic Fabric Adapter networking to link data center compute instances

Saturday, March 28, 2026 at 12:54 AM

Amazon Web Services utilizes its proprietary Elastic Fabric Adapter (EFA) networking interface to connect compute instances, providing high-bandwidth and low-latency communication essential for scaling AI training and HPC workloads across its data center infrastructure.

Context

As of March 2026, Amazon continues to scale its Elastic Fabric Adapter (EFA), a specialized network interface designed to bypass the traditional operating system kernel to accelerate high-performance computing (HPC) and AI workloads. By utilizing the Scalable Reliable Datagram (SRD) protocol, Amazon Web Services (AWS) enables its EC2 instances to achieve consistent low latency and high throughput. This infrastructure is critical for the deployment of EC2 UltraClusters, which can scale up to 20,000 NVIDIA H100 GPUs for distributed machine learning training. This development is significant as it allows Amazon to provide on-premises cluster performance within a flexible cloud environment. The EFA networking stack recently expanded support for the NVIDIA Inference Xfer Library (NIXL), optimizing disaggregated LLM inference by reducing communication overhead between compute nodes. By integrating with NVIDIA hardware and AWS Trainium chips, Amazon is positioning its proprietary networking as a foundational layer for the next generation of generative AI infrastructure.

Sources (6)

AWS and NVIDIA Collaborate on Next-Generation Infrastructure for Training Large Machine Learning Models and Building Generative AI Applications | NVIDIA Newsroom Elastic Fabric Adapter for AI/ML and HPC workloads on Amazon EC2 - Amazon Elastic Compute Cloud Elastic Fabric Adapter — Amazon Web Services AWS SRD (Scalable Reliable Datagram) - Ernest Chiang [PDF] A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC [PDF] A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC

Related Companies

Amazon

AMZN