News

SoftBank and Ampere partner to optimize small AI model efficiency using CPUs

Tuesday, February 17, 2026 at 06:40 AM

SoftBank and Ampere Computing are collaborating to verify the operational efficiency of small-scale AI models using cloud-native CPUs. This initiative aims to optimize AI infrastructure for the widespread adoption of AI agents by leveraging CPU-based inference rather than relying solely on high-cost GPUs.

Context

SoftBank Group and its subsidiary Ampere Computing have launched a joint validation project to optimize the efficiency of small language models (SLMs) and Mixture of Experts (MoE) architectures using ARM-based CPUs. By pairing SoftBank’s proprietary resource "Orchestrator" with Ampere’s Cloud Native Processors, the partners confirmed that CPUs can serve as a high-efficiency alternative to GPUs for distributed AI inference. This initiative specifically targets the deployment of "always-on" AI agents and automated network controls that require low-latency performance at a lower operational cost. The validation utilized a customized "Ampere optimized llama.cpp" framework, which significantly increased concurrent workload capacity while reducing power consumption compared to standard GPU configurations. Key performance benefits included drastically shorter model loading times, enabling the rapid model-switching necessary for real-time AI tasks. Announced in February 2026, this project leverages SoftBank’s $6.5 billion acquisition of Ampere to build a vertically integrated, cost-effective infrastructure designed to scale next-generation AI services globally.

Related Companies

S
SoftBank Group
SFTBY