Quick take
AI ROI isn’t a spreadsheet trick. Pick one workflow with a clear baseline. Capture all costs – engineering, evals, governance, change management – not just API bills. Tie benefits to outcomes the business already measures. Report a range with assumptions, not one magic number. If your ROI case only works under best-case assumptions, it doesn’t work.
I’ve sat in a lot of budget reviews over the years – telecoms, fintech, logistics. The AI ROI presentations I see fall into two categories: honest assessments that lead to good decisions, and fiction that leads to funded projects that get quietly killed six months later.
The difference isn’t sophistication. It’s honesty about costs and rigor about baselines.
The Full Cost Picture
The first lie in most AI ROI calculations is the cost side. Teams report API costs and maybe some engineering time. They leave out everything else.
Here’s what AI actually costs:
| Cost Category | What Teams Report | What It Actually Includes |
|---|---|---|
| Infrastructure | API usage fees | API fees + local compute + storage + networking + monitoring |
| Engineering | Initial build time | Build + integration + prompt engineering + ongoing maintenance |
| Evaluation | Nothing | Eval set creation + human review + quality monitoring tooling |
| Data | Nothing | Data preparation + cleaning + annotation + ongoing curation |
| Governance | Nothing | Compliance review + privacy controls + audit tooling + vendor management |
| Change Management | Nothing | Training + process redesign + user support + documentation |
| Opportunity Cost | Nothing | What else the team could have built with the same time |
When I push teams to fill in the “What It Actually Includes” column, the cost estimate typically doubles or triples. That isn’t an argument against AI. It’s an argument for honest accounting so you can make the right investment decisions.
The Baseline Problem
You can’t measure improvement without a baseline. Sounds obvious. You’d be amazed how many teams skip it.
Before you deploy AI in a workflow, measure the current state:
| Metric | How to Capture | Why It Matters |
|---|---|---|
| Throughput | Tasks completed per person per day | Direct productivity comparison |
| Error rate | Errors caught in QA or by customers | Quality comparison |
| Cycle time | Time from task start to completion | Speed comparison |
| Cost per task | Fully loaded labor cost / tasks completed | Economic comparison |
| Customer satisfaction | CSAT or NPS for the specific workflow | Outcome comparison |
Measure for at least four weeks before deployment. Document any other changes that happened during the same period – new hires, process changes, seasonal variation. Those confounders matter when you try to attribute improvements to AI.
Mapping Benefits to Outcomes
The second lie in most AI ROI cases is on the benefit side. “Time saved” isn’t a business outcome. It’s a proxy. What did the team do with the saved time?
Map every claimed benefit to something the business already tracks and trusts:
| AI Capability | Claimed Benefit | Business Outcome to Measure |
|---|---|---|
| Automated triage | Faster ticket routing | Resolution time, first-response time |
| Document extraction | Less manual data entry | Throughput per person, error rate |
| Content generation | Faster content creation | Time to publish, content volume |
| Code assistance | Faster development | Cycle time, defect rate, deploy frequency |
| Customer support | Reduced support load | Tickets per agent, CSAT, escalation rate |
If you can’t connect an AI capability to a number the business already watches, the benefit is speculative. Label it that way. Don’t pretend it’s measured.
The Three Traps
Cherry-picking the easy wins. Measuring ROI only on the tasks that were already easiest to automate. The impressive numbers don’t represent the full deployment. Report the aggregate, not just the highlights.
Ignoring the learning curve. The first month after deployment is usually worse than the baseline. People are adjusting. Workflows are changing. If you measure too early, you either see inflated novelty effects or deflated learning-curve effects. Neither is representative.
Qualitative benefits as hard numbers. “Developers feel more productive” isn’t the same as “throughput increased 20%.” Both are worth reporting. Only one belongs in a financial model. In my work, I insist on separating measured outcomes from perceived benefits in every report. Leadership respects the honesty.
The Report Format That Works
Keep the ROI report to one page. Seriously. If it needs more than one page, you’re either overcomplicating or overclaiming.
Decision context. What question does this measurement answer? “Should we expand AI-assisted triage to all support channels” is specific. “Is AI valuable” isn’t.
Assumptions. List every assumption explicitly. Volume of tasks, cost rates, attribution model, measurement window. When assumptions change, the conclusion changes. Make that visible.
Results as a range. Don’t report a single ROI number. Report a range: conservative estimate under pessimistic assumptions, expected estimate under likely assumptions, optimistic estimate under best-case assumptions. If the conservative estimate is still positive, you have a strong case. If only the optimistic estimate is positive, you have a gamble.
Next measurement. State when you’ll re-measure and what would cause you to change course. This turns the report from a sales pitch into a decision tool.
What matters
AI ROI measurement isn’t about proving AI works. It’s about making good investment decisions. Capture the full cost, not just the API bill. Establish a real baseline before deploying. Map benefits to outcomes the business already tracks. Report honestly, with ranges and assumptions.
The teams that do this get funded reliably because leadership trusts their numbers. The teams that overclaim get one round of funding and then spend a year explaining why the projections didn’t materialize.
Discipline over heroics. Even in spreadsheets.