How Large Language Models Actually Work: A Business Leader's Technical Primer
A concise technical explanation of how large language models function — from training data and transformer architecture to why they produce the outputs they do.
A clear explanation of tokens and context windows, and why these two technical limits shape cost, accuracy and feasibility in enterprise AI projects.
Tokens are the unit of currency in every large language model: they determine what the model can process, how long it takes, and how much it costs. A context window is the hard ceiling on how much text the model can consider in a single interaction. Together, these two concepts constrain the design of virtually every enterprise AI workflow. Understanding them shifts AI decision-making from vendor-led to evidence-led.
A token is not a word. It is a subword unit — typically three to four characters — produced by the tokeniser used to pre-process text before it enters the model. The word "implementation" might become two tokens; "AI" is typically one. Code, numbers and non-English text often tokenise less efficiently, meaning they consume more tokens relative to their information content.
A context window is the number of tokens a model can process in a single forward pass — input plus output combined. Modern frontier models have context windows ranging from roughly 128,000 tokens (approximately 100,000 words) to over one million tokens in some configurations. Every token fed into the model — system instructions, retrieved documents, conversation history and the current query — counts against that limit.
Token economics directly determine the cost and feasibility of enterprise AI tasks. Most API-based AI pricing charges separately for input tokens and output tokens, with output typically more expensive. A workflow that processes a 200-page policy document once per query consumes far more tokens — and incurs far higher cost — than one that uses retrieval to inject only the three most relevant paragraphs.
Context window size also affects accuracy. Attention quality across a very long context is not uniform: research has consistently shown that models perform less reliably when the information critical to a query is buried in the middle of a very long context, rather than near the start or end. This effect — sometimes called the "lost in the middle" problem — means that a larger context window does not automatically mean better answers. How context is structured matters as much as how much of it you provide.
Gartner predicts that by 2027, inaccurate AI cost and budget calculations will drive approximately 60% of large enterprises to adopt FinOps practices specifically for AI. Token consumption modelling is one of the primary inputs to those calculations.
When a request is sent to an LLM:
A practical approximation for scoping work: one million tokens is roughly equivalent to 750,000 words, or about 1,500 pages of dense business text.
Token and context window management becomes operationally significant at scale. Key decisions include:
Edison AI's AI training programmes include hands-on context window budgeting exercises for teams building or evaluating enterprise AI systems, ensuring cost projections reflect real-world token volumes rather than vendor demo conditions.
Edison AI runs practical AI training that turns this understanding into day-to-day team capability.
A token is a chunk of text — roughly four characters or three-quarters of a word — that a language model reads and generates. Models price and limit work by tokens, not words.
A context window is the maximum amount of text, measured in tokens, a model can consider at once. Everything beyond it is ignored or must be retrieved separately.
You pay per token for input and output. Larger contexts improve grounding but raise cost and latency, so right-sizing context is an operational decision, not just a technical one.
Edison AI helps Australian businesses move from AI curiosity to practical implementation, with workflow design, team training and measurable outcomes. Tell us about your setup and we'll come back with a sequenced plan grounded in the same thinking you just read.
Article: What Tokens and Context Windows Mean for Enterprise AI Decisions