Website icon Xpert.Digital

CFOs are sounding the alarm: The uncontrollable costs of new AI agents

No more token counters: Why companies should only pay for genuine AI results from now on

No more token counters: Why companies should only pay for genuine AI results from now on – Image: Xpert.Digital

No more token counters: Why companies should only pay for genuine AI results from now on

Generative AI is in a fundamental crisis – not because the technology is failing, but because its commercial architecture is collapsing.

Tech giants like Microsoft, Uber, and GitHub are already taking drastic action: annual budgets for AI tools are dwindling within months due to the use of autonomous agents, while the anticipated productivity gains are often immeasurable. The culprit is the industry-wide shift to token-based billing models. Under the guise of "pay-for-what-you-use," providers are shifting the full financial risk onto their enterprise customers, charging only for the computing power itself—regardless of whether the AI ​​correctly solves a task or delivers genuine economic value. This article analyzes the hidden risks of the current AI pricing transformation, explains the fatal tension between budget control and AI adoption, and demonstrates why outcome-based pricing is the only sustainable solution for the future of enterprise AI.

Related to this:

Who pays when AI delivers nothing? The reckoning of an industry that didn't understand its own value creation

The business model of generative AI is in a fundamental crisis. Not because the technology itself is failing, but because the way it's billed turns economic logic on its head: companies bear the entire financial risk – the provider collects regardless of the outcome. In May 2026, Microsoft canceled internal Claude Code licenses for thousands of employees in its Experiences & Devices division. Uber exhausted its entire 2026 AI budget in four months because 5,000 engineers were working intensively with Claude Code, generating monthly costs of $500 to $2,000 per person. GitHub, the world's largest developer platform owned by Microsoft, abolished flat-rate pricing on June 1, 2026, and switched to a token-based credit system. These three events within a few weeks are no coincidence – they are symptoms of a structural flaw deeply embedded in the pricing architecture of the AI ​​industry.

The end of the subsidy era: When the market discovers price

The first phase of generative AI was largely subsidized. Providers like Anthropic, OpenAI, and Microsoft offered their services significantly below the actual infrastructure costs in order to gain market share, understand user behavior, and build developer ecosystems. Flat fees for coding assistants, unlimited chat sessions for single-digit monthly amounts, and generous enterprise testing at the provider's expense—all this was possible because venture capital financed the price difference and because the true costs of using agent-based workflows were not yet known.

This phase has now demonstrably ended. GitHub explicitly justified its switch to token-based billing by stating that agent-based usage has become the norm and the associated computing costs simply can no longer sustain the previous flat-rate models. The company put it bluntly: A short chat question and a multi-hour autonomous coding session previously cost the same – this was unsustainable. Developers who had previously been able to work agent-based without limits for $10 to $39 per month saw their costs increase from as little as $50 to over $3,000 per month after the switch. The community thread announcing the change garnered almost 900 dissenting votes.

Gartner forecasts global AI spending of $2.52 trillion in 2026, a 44 percent increase year-over-year. With global expenditures of this magnitude, the question of who bears the costs and who reaps the benefits is no longer an academic discussion, but a fundamental question of corporate governance. AI infrastructure spending alone is projected to climb to $1.37 trillion in 2026. At the same time, according to an MIT study from July 2025, approximately 95 percent of enterprise-wide GenAI pilot projects failed to deliver a measurable P&L effect. This contradiction—rising expenditures, lack of return—is the core of the problem.

Five risk classes that token pricing models shift onto the company

Behind the innocuous phrase "pay for what you use" lies a systematic shift of five different risk classes from the provider to the corporate customer. Anyone who understands this mechanism recognizes why token billing is not a neutral billing method, but rather a structural disadvantage for the buyer.

Budget risk: The supplier controls the unit, not the buyer

With a token-based pricing model, the company commits to an annual budget for a cost unit whose price the provider can change at any time and whose consumption behaves non-linearly with increasing usage. For example, in May 2026, Anthropic announced that subscribers for agent tools and third-party integrations would receive separate monthly allowances billed at standard API rates. This is a unilateral price adjustment that immediately devalues ​​an existing budget. Uber experienced this firsthand: a budget calculated for twelve months ran out in four. Adoption wasn't the problem—it was actually a sign of success. The problem was that the "token" unit scales exponentially as soon as agent-based workflows are implemented, while the budget was planned linearly.

Adoption risk: Use and value creation are decoupled

A token-based system bills for computing power, not results. A model that uses 100,000 tokens and delivers an incorrect answer costs exactly the same as a model that uses 100,000 tokens and delivers a correct answer. This decoupling of costs and benefits is the fundamental economic problem. It means that a company can build a workflow around a token-based system, operate that workflow, and pay for it—without ever seeing any measurable added value. The fact that 42 percent of companies abandoned the majority of their AI initiatives in 2025, a dramatic increase from 17 percent the previous year, is, in this light, less a technological problem than a pricing problem. The flawed incentive architecture leads to misinvestments that only become apparent after months of operation.

Forecast risk: Uncontrollable variability in cost planning

For CFOs, token billing is a category of expense that behaves like currency hedging errors: it's fundamentally unmodelable because too many external variables influence the billing. Every new use case, every new internal user, every change in model behavior, every increase in the context window size—all of this pushes the bill in an unpredictable direction. Added to this is the so-called agent sprawl: when companies roll out agent-based workflows across different departments, the unpredictability multiplies. Each new agent adds another entry to the token ledger, without any guarantee of return. With Claude Opus 4.7, Anthropic introduced a version jump that, due to extended reasoning chains, consumes around 30 percent more tokens than its predecessor—a 30 percent cost increase overnight, without a single new transaction or customer order to justify it.

Governance risk: Data protection and compliance scale with consumption

In regulated industries—financial services, healthcare, insurance—every token call has a governance dimension: corporate data is routed through third-party inference infrastructure with every API call. This means that the more tokens are consumed, the more data leaves the internal security perimeter. In an environment regulated by GDPR, SOC 2, HIPAA, and the EU AI Act, this generates compliance costs, audit exposure, and liability risks that increase with usage intensity. Token billing and data sovereignty are thus in structural tension: those who use more AI automatically assume more regulatory risk—an incentive problem that hinders secure and scalable AI use.

Outcome risk: The silence of AI providers regarding impact

The least discussed risk is the most consequential. Token pricing models measure consumption, not value creation. The provider receives payment regardless of whether the company's AI program has a measurable P&L impact or joins the long list of corporate GenAI pilots that have failed to generate a measurable return. An MIT study puts this figure at 95 percent. In other words, in the vast majority of cases, the company pays without receiving any verifiable economic value – and the provider has no business model-related incentive to change that.

The industry's pricing logic: A market that didn't know its own value

The root cause of the current price crisis lies in the origins of the GenAI market. The industry marketed its products before understanding their true cost of use in productive enterprise environments. Flat rates and token-based pricing models were conceived as market entry strategies, not as sustainable commercial structures. GitHub itself admitted that the existing flat-rate models absorbed the actual inference costs and that this mechanism is not sustainable for providers in the long run.

This created a paradoxical situation: the more successful the adoption, the higher the risk of loss for the provider, and the higher the budget risk for the company. Uber is the most vivid example: Claude Code adoption increased from 32 to 84 percent of developers, 70 percent of committed code was AI-generated, and the productivity gains were real and measurable. And yet, Uber CTO Praveen Neppalli Naga described the situation as follows: "I'm back to the drawing board because the budget I thought was necessary has already been used up." The technology worked. The pricing model didn't.

This also explains why Microsoft decided to cancel the Claude Code licenses for its Experiences & Devices division and migrate developers to the GitHub Copilot CLI. The official reason given is "toolchain unification"—internally, it was a financial decision. Thousands of engineers developing Windows, Microsoft 365, Teams, Outlook, and Surface had been heavily using Claude Code since its pilot launch in December 2025, and the token costs had exhausted the annual budget well before the end of the year. Microsoft, the company that has invested $13 billion in OpenAI and operates the cloud on which most AI front-end labs run, looked at the numbers and decided based on cost, not perceived value.

Results-oriented pricing models: A different commercial architecture, no discount

The term outcome-based pricing is often misunderstood in the market. It's not about cheaper token prices, discount packages, or deferred payment. It's a fundamentally different commercial architecture: The provider is paid per task completed – if and only if a defined business outcome is verified on a defined workflow. Not for the computational effort incurred along the way.

For decades, enterprise software has operated on a system-and-SLA principle: The vendor is responsible for unit economics and ensures that the solution delivers the promised results. ERP systems, CRM platforms, accounting software – none of these categories have ever billed based on database accesses, API calls, or computation cycles. They bill based on users, modules, or performance outcomes. AI pricing must adhere to the same standard.

However, the outcome-based pricing model is only economically viable if the provider can absorb the variance itself – that is, if it has built a platform efficiency that allows it to internalize the risk. Most providers cannot do this. Their production costs are the same token counter that the company bears – and they simply pass the counter on. Outcome-based pricing requires the provider to link its own income to the outcome. This is a substantially different risk profile – and explains why this pricing model is still rare in the market.

 

🤖🚀 Managed AI Platform: Faster, safer & smarter to AI solutions with UNFRAME.AI

Managed AI Platform - Image: Xpert.Digital

Here you will learn how your company can implement customized AI solutions quickly, securely and without high entry barriers.

A managed AI platform is your all-inclusive, worry-free solution for artificial intelligence. Instead of dealing with complex technology, expensive infrastructure, and lengthy development processes, you receive a ready-made solution tailored to your needs from a specialized partner – often within just a few days.

The key advantages at a glance:

⚡ Rapid implementation: From idea to ready-to-use application in days, not months. We deliver practical solutions that create immediate added value.

🔒 Maximum data security: Your sensitive data stays with you. We guarantee secure and compliant processing without sharing data with third parties.

💸 No financial risk: You only pay for results. High upfront investments in hardware, software, or personnel are completely eliminated.

🎯 Focus on your core business: Concentrate on what you do best. We take care of the entire technical implementation, operation, and maintenance of your AI solution.

📈 Future-proof & scalable: Your AI grows with you. We ensure continuous optimization and scalability, and flexibly adapt the models to new requirements.

More information here:

 

Data sovereignty vs. hyperscalers: Who will win the AI ​​infrastructure battle?

Practical model: How results-oriented AI delivery works

Platforms that consistently implement the outcome-based principle follow a different engagement logic. Instead of renting out infrastructure and simply running the meter, they first identify the highest-value-generating workflow for the company's use case—that is, the process that can deliver measurable impact most quickly. A production-ready solution is then deployed within the company's infrastructure: in the enterprise cloud, on-premises, in a private cloud, or as a fully managed SaaS offering, with the data never leaving the company's perimeter. Payment only begins once the result is available and the customer is satisfied.

This model has far-reaching implications for risk sharing. It forces the provider to focus its resources on genuinely value-creating use cases rather than those that consume many tokens. It creates a direct alignment of interests between provider and customer: both profit when the AI ​​actually works; neither profits at the other's expense when it doesn't. For regulated industries, the premise that data does not leave the company perimeter also provides a compliance architecture compatible with GDPR, SOC 2, HIPAA, and the EU AI Act.

A key advantage of well-implemented, results-oriented platforms is their cumulative knowledge structure: Every successfully completed workflow builds on a shared internal knowledge base that becomes more valuable with each subsequent task. This stands in direct contrast to token-based deployments, which, while accumulating costs, do not anchor institutional knowledge within the company.

The CFO's perspective: Token billing as a categorical budget problem

For finance professionals, token billing represents a categorically new type of operating expense for which no established governance structures exist. Cloud costs—compute, storage, network—have been professionalized over the past fifteen years. FinOps as a discipline has spawned methods, tools, and organizational units that make cloud spending predictable and controllable. A full equivalent for AI agent runtime costs is still lacking.

Token consumption doesn't scale with the number of users, but rather with the ambition of the prompts, the length of the context windows, the number of concurrently running agents, and the complexity of the reasoning chains. This means that a company transitioning 100 engineers from simple autocomplete to agent-based workflows can multiply its monthly AI effort by a factor of five to twenty—without adding a single new user. Standard planning assumptions based on user numbers or session volumes are structurally flawed in this context.

This has concrete consequences for budget planning. The spending structure requires similar control mechanisms to those for energy: real-time measurement, threshold alerts, team quotas, and hard limits at the agent level. Companies that don't implement these before adoption begins will face the consequences when the budget is already exhausted—like Uber. The company had no per-team limits, no centralized tracking, and no real-time visibility into consumption until the CTO prematurely reported the annual budget as exhausted.

Market dynamics: Who holds the power in this price transformation

The current price transformation is not symmetrical. Large hyperscalers like Microsoft, Google, and Amazon have structural leverage that differentiates them from smaller providers: They control distribution channels, enterprise contracts, cloud infrastructure, and developer tools. Microsoft didn't shut down Claude Code because Copilot is better—internal surveys showed that developers preferred Claude Code. The company shut it down because it controls distribution and cannot control or strategically leverage token costs for a competing product.

This dynamic is significant for interpreting the price transformation as a whole. For hyperscalers, the move away from flat rates and the introduction of token billing is not a price reform – it's revenue optimization. Those who control the infrastructure on which the models run, who operate the billing systems, and who hold the enterprise contracts structurally benefit from consumption-based billing. The opposing model – results-oriented pricing – jeopardizes these revenue positions because it forces the provider to bear the risk instead of passing it on.

For medium-sized businesses and corporations that are not among the hyperscalers, this is a significant power issue when it comes to the next contract renewal. According to an analysis by JP Morgan, the stress on AI infrastructure could create economic friction before the promised returns are fulfilled. Those who do not actively negotiate the risk distribution in the next AI contract will accept a standard position that is structurally unfavorable to them.

The message from investment economics: If efficiency is not a goal, it becomes a problem

There is a counterargument to the cost criticism of token-based billing that must be taken seriously. At Uber, AI generated 70 percent of the committed code and 11 percent of all live backend updates. An engineer in San Francisco costs a company significantly more per year than $2,000 per month in token costs. If AI-powered coding increases productivity by even a single-digit percentage of the company's most expensive resource, the return on investment could outweigh the costs.

The argument isn't wrong—it's incomplete. First, it only holds true if the productivity gains are actually quantifiable and causally attributable to the toolset, which is rarely measured systematically in most companies. Second, it presupposes that the saved engineering time translates into realized cost savings or directly attributable additional revenue—and not, as in many organizations, simply leads to more work, which in turn consumes more tokens from the AI ​​system. Third, the comparability is only valid if the result of the AI's work is validated: code that is generated but not used productively is not equivalent to the value of senior engineering work.

The fundamental argument for results-oriented pricing therefore remains valid: If the return is real, the provider can contractually substantiate it and link their income to it. If they cannot or will not do so, there are structural reasons for this, which work to the detriment of the buyer.

Strategic consequences for corporate management

The events of the first half of 2026 will provide company management with clear operational conclusions.

First, AI-driven spend control requires a dedicated FinOps discipline, which must be structured similarly to cloud FinOps but requires its own methodologies. Token consumption is non-linear, agent-specific, and model version-dependent. Dashboards are insufficient; what's needed are real-time budget caps at the team and agent levels, automatic kill mechanisms upon exceeding thresholds, and audit logs at the single-run level.

Secondly, pilot projects using token billing do not provide reliable forecasts for production costs. A pilot project costing €1,000 per month can scale up to 100 times its original usage in a production environment, thus exceeding budgeted resources. AI spending planning must be based on production assumptions, not pilot usage.

Third, every contract renewal with AI providers has a strategic negotiation dimension that is currently underutilized. The question every company should ask its AI provider in the next meeting is simple and precise: What will I pay if it doesn't work? A provider unwilling to share the downside risk has a conflict of interest with the buyer that cannot be ignored in a serious procurement process.

Fourth, data sovereignty is a distinct cost and risk variable, not just a compliance issue. Companies in regulated industries that use token-based services in the public cloud accumulate compliance effort, audit exposure, and potential liability risks with every unit of use. Sovereign AI—that is, AI infrastructure operated within the company's own perimeter—will have reached technological parity with cloud front-end models by 2026: According to the Stanford HAI 2026 AI Index, the performance gap between the best open-weighted models and the most advanced proprietary systems will have narrowed to an average of three months.

Outlook: What the price transformation means for 2027

The market is in flux. The shift away from flat rates and towards token billing is a short-term victory for providers – revenues increase with usage. In the medium term, however, it is a catalyst for three parallel developments that will fundamentally alter the price structure.

First, competitive pressure will increase due to open-source models. If proprietary token costs for enterprise-wide agentic deployments reach six figures per year, and open-weight models deliver comparable performance on on-premises hardware, the total cost of ownership calculation will tip in favor of on-premises infrastructure – especially for European companies that prioritize GDPR compliance and data sovereignty.

Secondly, results-oriented pricing models will grow in the market because they give enterprise customers a negotiating position that token billing, by definition, does not offer. Even though only a few providers currently have the platform efficiency to offer this model profitably, competition will force imitation.

Third, AI governance—including measuring AI ROI, tracking value creation contributions, and contractually defining success metrics—will become a distinct business area, comparable to data protection or cybersecurity. Gartner expects global AI spending to reach $3.34 trillion by 2027. At this scale, corporate executives will no longer accept AI as a budget category without verifiable success metrics.

The crucial question is not whether token-based billing will be replaced by results-oriented models – economic logic suggests that it will happen. The question is whether companies will actively shape this transition or allow it to be forced upon them passively by ever-increasing bills. Those who adapt the contract architecture of their AI investments now are pulling the right end of the rope.

 

Consulting - Planning - Implementation

Konrad Wolfenstein

I would be happy to serve as your personal advisor.

You can contact me at wolfensteinxpert.digital or

Just call me on +49 7348 4088 965 .

LinkedIn
 

 

Leave the mobile version