What this means
"Data leakage" in an AI context means sensitive information leaving the boundary where it should stay. Unlike a dramatic breach, AI leakage is often quiet and procedural: an employee pastes a client contract into a consumer chatbot to summarise it, and that text is now outside the organisation's control. Multiply that across a workforce and the exposure is significant.
The four channels — prompts, logs, training and third-party tools — each leak differently, so each needs its own control. Treating leakage as one undifferentiated risk leads to incomplete protection.
Why it matters for business
The commercial and regulatory stakes are high. Confidential information — client data, intellectual property, strategic plans — that leaks into an external system can damage competitive position, breach contracts and trigger notifiable data breach obligations under Australian law.
The risk is amplified by how widely staff already use AI. PwC's workforce research found meaningful daily use of generative AI among employees, much of it through tools the organisation has not sanctioned. Gartner has predicted that a substantial share of AI-related breaches will arise from improper use of generative AI, frequently across organisational or national boundaries. Leakage is therefore not a hypothetical edge case; it is a present operational reality that governance must address.
How it works technically
Each channel has a corresponding control:
| Leakage channel | How it happens | Control |
|---|
| Prompts | Sensitive data sent to external model APIs | Enterprise tooling with no-retention terms; redaction; on-prem options |
| Logs and traces | Sensitive data captured in observability systems | Log scrubbing, retention limits, access control on logs |
| Training | Data used to improve a provider's model | Contractual opt-out; enterprise agreements that exclude training |
| Third-party tools | Unsanctioned tools with weak data handling | Approved tool lists; blocking; policy and education |
The foundational technical decision is which model service is used. Enterprise model offerings typically contractually guarantee that prompts are not retained or used for training, which closes the two most significant channels at once. Consumer tools often make no such guarantee.
Practical implementation considerations
The most effective approach pairs sanctioned tooling with clear policy. Provide staff with an approved enterprise AI tool that carries data-handling guarantees, so they have a safe option, and set explicit policy on what may and may not be entered into any AI system. Convenience drives behaviour — if the sanctioned tool is good, staff use it instead of risky alternatives.
Edison AI's AI readiness audit maps where organisational data currently flows through AI tools, including unsanctioned ones, and identifies the leakage channels that are open. Most organisations are surprised by how much sensitive data already passes through consumer tools.
Technical controls — redaction of sensitive fields before data reaches a model, disciplined logging with scrubbing, and access controls on traces — close the residual channels that policy alone cannot.
Common mistakes
- Relying on policy without providing a safe alternative. If staff have no good sanctioned tool, they will use risky ones regardless of policy.
- Using consumer AI tools for confidential work. Without enterprise terms, prompt data may be retained or used for training.
- Forgetting logs. Sensitive data captured in observability and trace systems is a frequently overlooked leakage channel.
- No redaction. Sending raw sensitive data to models when redaction would suffice increases exposure unnecessarily.
- Ignoring shadow AI. Unsanctioned tools are where much leakage occurs and are invisible without active management.
What leaders should do next
Provide a sanctioned enterprise AI tool with contractual no-retention and no-training guarantees, so staff have a safe default. Set explicit, well-communicated policy on what may be entered into AI systems. Audit current data flows to find where leakage channels are open, including through shadow AI. Add technical controls — redaction, log scrubbing, access scoping — to close residual channels. Treat leakage prevention as an ongoing governance function, because tools and behaviours change continuously.
Start with an AI readiness audit to map your data, access and governance gaps before you scale.