News

Alphabet reduces Gemini serving costs by 78 percent through hardware utilization and model optimization

Wednesday, February 4, 2026 at 09:51 PM

Alphabet CEO Sundar Pichai reported a 78% reduction in Gemini serving unit costs throughout 2025, driven by architectural model optimizations and improved hardware utilization efficiency.

Context

Alphabet has successfully reduced Gemini serving unit costs by 78% throughout 2025, a milestone achieved through deep hardware-software integration. During the Q4 2025 earnings call, CEO Sundar Pichai attributed the efficiency to enhanced utilization of custom Tensor Processing Units (TPUs) and aggressive model optimizations like distillation and quantization. This dramatic drop in inference costs enables Google to scale generative AI across its multibillion-user ecosystem without the margin erosion typically associated with high-compute workloads. These gains are critical as Alphabet's capital expenditure reached a record $91 billion in 2025. By lowering the cost per query, the company is proving that its full-stack approach—owning both the silicon and the models—delivers significant operational leverage. This cost structure supports the broad deployment of AI Overviews and the launch of Gemini 3, allowing Google to maintain pricing power against competitors while sustaining its aggressive infrastructure investment cycle.

Related Companies

Google
Google
GOOGL
US