News
Analysis of LLM API usage shows massive inference supply growth and high model turnover in 2025
Wednesday, January 7, 2026 at 04:53 PM
A new analysis of LLM API usage data across platforms like OpenRouter and Azure reveals a significant expansion in inference supply, with the number of models and providers more than doubling in 2025. Despite massive price deflation for state-of-the-art models since 2023, average spend per token remains flat as users opt for higher-quality intelligence. The report highlights extreme churn in model leadership and identifies integration friction, rather than cost, as the primary constraint on compute demand growth.
Context
In early 2025, the LLM inference market witnessed a massive supply surge, with the total number of available models jumping from 253 to 651 and the number of inference providers tripling from 27 to 90. For Microsoft, these findings underscore a hyper-competitive ecosystem where Azure must manage extreme model turnover and rapid commoditization. While state-of-the-art pricing has seen a 1000x deflation since 2023, average spend per token has remained flat as users choose to reinvest savings into higher intelligence rather than simply increasing token volume.
The analysis reveals significant non-price moats, as open-source models remain 90% cheaper than proprietary counterparts but capture less than 30% of the market. Churn is at an all-time high; today’s top 10 models didn’t even exist 10 months ago and held only a 20% market share just 4 months ago. Critically, with price elasticity at 1.1, price cuts alone are failing to trigger a Jevons paradox of explosive compute demand. Instead, enterprise adoption is currently bottlenecked by integration friction rather than the cost of silicon or inference.
Related Companies
Microsoft
MSFT