AI is priced in three main ways: per token (for API access to models, based on text processed), per seat (a fixed fee per user for AI products), and per compute (for self-hosted models, based on infrastructure used). Many deployments combine these.

What is token-based pricing?

Token-based pricing charges for the amount of text processed, measured in tokens, for both input and output. It is the dominant model for API access, and means cost scales directly with usage volume and the size of prompts and responses.

How do you predict AI costs?

Estimate the volume of requests, the average tokens per request (input plus output), and the model's price per token, then multiply. For seat-based products, multiply users by the per-seat fee. Monitoring actual usage early is essential because estimates often understate real consumption.

Understanding AI Pricing Models

Quick answer

AI is priced in three main ways, and understanding them is the basis of predicting and controlling cost. Token-based pricing charges for the amount of text processed — measured in tokens, for both input and output — and dominates API access to models. Seat-based pricing charges a fixed fee per user for AI-enabled products. Compute-based pricing applies when you self-host models, charging for the infrastructure they run on. Many real deployments combine these. The reason this matters is that each model behaves differently as you scale: token pricing rises with usage, seat pricing with headcount, and compute pricing with infrastructure — and confusing them is how organisations end up with costs they did not foresee.

What this means

Each pricing model answers "what are you paying for?" differently. With tokens, you pay for the work done — every word in and out. With seats, you pay for access — a flat amount per person regardless of how much they use it. With compute, you pay for capacity — the servers running a model you host yourself.

Knowing which model applies to each part of your AI stack is what lets you forecast cost. A token-priced API and a seat-priced product scale on entirely different variables, and a budget that treats them the same will be wrong.

Why it matters for business

The dominant model — token pricing — is also the one most likely to surprise, because cost scales with usage and prompt size, both of which tend to grow over time. A use case that is inexpensive in a pilot can become costly as adoption spreads and prompts and context expand. Gartner predicts that inaccurate AI cost calculations will push most large enterprises toward FinOps practices for AI, precisely because token-based spend is easy to underestimate.

For Australian organisations, understanding pricing models is the foundation of an AI budget that holds. It allows cost to be forecast before commitment, monitored during operation, and optimised deliberately — rather than discovered in an unexpected invoice.

How it works technically

The three models compared:

Model	You pay for	Scales with	Typical use
Per token	Text processed (input + output)	Usage volume and prompt size	API access to models
Per seat	User access	Number of users	Packaged AI products
Per compute	Infrastructure	Capacity provisioned	Self-hosted open-weight models

For token pricing, the cost of a request is roughly (input tokens + output tokens) × price per token, with input and output sometimes priced differently. This makes prompt size, retrieved context and output length direct cost drivers — the same variables that cost optimisation targets. Seat pricing is simpler to forecast but can be inefficient if many seats are lightly used. Compute pricing trades per-use cost for fixed infrastructure cost, which can favour high, steady volume.

Practical implementation considerations

Forecast cost before deploying by estimating volume, tokens per request and price — and then monitor actual usage early, because real consumption frequently exceeds estimates as prompts grow and adoption spreads. Cost visibility per use case is what turns pricing knowledge into cost control.

Edison AI's implementation work models expected AI cost during design and instruments spend monitoring from launch, so pricing is understood and controlled rather than discovered. Where a use case is high-volume and steady, the team also assesses whether compute-based self-hosting would be cheaper than per-token API pricing — a calculation that depends entirely on volume.

Match the pricing model to the usage pattern: token pricing suits variable or low volume, seat pricing suits broad light use, and compute pricing can suit high steady volume.

Common mistakes

Not forecasting cost before deploying. Token-based spend is easy to underestimate and scales with usage.
Ignoring prompt and output size. These directly drive token cost on every request.
Over-buying seats. Seat pricing is wasteful when many users are inactive.
Assuming self-hosting is cheaper. Compute pricing only beats token pricing at sufficient, steady volume.
No usage monitoring. Without it, the first sign of a cost problem is the bill.

What leaders should do next

Understand which pricing model applies to each part of your AI stack and forecast cost accordingly — volume and token size for APIs, headcount for seats, infrastructure for self-hosting. Monitor actual usage from launch, since estimates tend to understate real consumption. Match pricing models to usage patterns, and reassess high-volume token-priced workloads against self-hosting. Treat AI cost as something modelled and monitored from the outset, so your AI budget reflects reality and scales predictably as usage grows.

An AI readiness audit maps the highest-return use cases before you commit to a model or platform.

Frequently asked

Questions, answered.

How is AI priced?
AI is priced in three main ways: per token (for API access to models, based on text processed), per seat (a fixed fee per user for AI products), and per compute (for self-hosted models, based on infrastructure used). Many deployments combine these.
What is token-based pricing?
Token-based pricing charges for the amount of text processed, measured in tokens, for both input and output. It is the dominant model for API access, and means cost scales directly with usage volume and the size of prompts and responses.
How do you predict AI costs?
Estimate the volume of requests, the average tokens per request (input plus output), and the model's price per token, then multiply. For seat-based products, multiply users by the per-seat fee. Monitoring actual usage early is essential because estimates often understate real consumption.

Take the next step

Ready to put this into practice?

Edison AI helps Australian businesses move from AI curiosity to practical implementation, with workflow design, team training and measurable outcomes. Tell us about your setup and we'll come back with a sequenced plan grounded in the same thinking you just read.

Book an AI readiness call