Quick take
Stop counting lines of code and story points. Measure deployment frequency, lead time, restore time, and change failure rate. Add a few developer experience signals. Connect them to business outcomes. Kill anything that drives bad behavior.
I’ve watched teams drown in dashboards. At one company I walked into a weekly review where the VP proudly scrolled through forty-seven metrics. Nobody in the room could tell me which three mattered most. That isn’t measurement. That’s decoration.
The goal is simple: a small set of numbers that tell you whether the system – the whole system of people, code, and process – is getting healthier or sicker. Everything else is noise.
Why Most Metrics Backfire
Metrics fail when they measure motion instead of progress. Lines of code reward verbosity. Story points inflate the moment managers tie them to performance reviews. Ticket counts incentivize splitting work into trivial pieces. Time-in-seat confuses presence with output.
Goodhart’s Law isn’t a thought experiment. I watched it play out at a company where a “PRs merged per sprint” target led to engineers opening tiny, low-value PRs just to hit the number. The charts looked great. The product didn’t move.
If a metric can be gamed without improving outcomes, it will be gamed. That isn’t cynicism. It’s human nature meeting bad incentive design.
The DORA Four: Still the Best Starting Point
The four DORA metrics have held up across every team shape and stack I’ve worked with. They connect speed to stability without prescribing a specific process.
- Deployment frequency – how often you ship to production.
- Lead time for changes – from commit to running in prod.
- Time to restore service – how fast you recover from incidents.
- Change failure rate – how often a deployment causes a problem.
These work because they measure the path from idea to production and what happens when that path breaks. Collect them at the team or service level. Never use them to rank individuals. The moment you turn DORA into a leaderboard, the numbers become meaningless.
At a large consumer platform, we tracked deployment frequency per service as a health signal, not a target. When a team’s frequency dropped, it was usually a sign of flaky tests or a painful merge process – system problems, not people problems.
Reliability: Measure What Users Feel
Availability percentages are fine, but they hide a lot. A service can be “99.9% available” and still have p99 latency so bad that the checkout flow feels broken.
Track what customers actually experience:
- Latency at a meaningful percentile (p95 or p99, not averages)
- Error rates on critical paths
- Throughput trends
Error budgets are the best policy tool I’ve seen for balancing reliability with feature velocity. Agree on how much unreliability is acceptable for a service in a given period. When the budget burns too fast, slow down feature work and fix the foundations. When there’s budget left, ship faster. Simple.
The key is treating error budgets as a decision framework, not a spreadsheet exercise. If nobody changes behavior when the budget burns, you don’t have an error budget. You have a dashboard.
Developer Experience Is a Leading Indicator
By the time delivery metrics go red, the developer experience problems that caused them are months old. Slow feedback loops, painful CI, confusing onboarding – these drag on quality and morale long before they show up in DORA numbers.
A few signals that capture friction without turning into surveillance:
- Time to first meaningful change for a new engineer. If onboarding takes three weeks before someone ships anything real, that’s a system problem.
- End-to-end build and test duration. If CI takes forty minutes, developers batch changes and context-switch. Both hurt quality.
- Time to first review on a PR. Review latency is one of the most underrated bottlenecks. A PR sitting for two days is two days of context decay.
- Time from merge to production. If this is hours instead of minutes, your deployment pipeline needs work.
These are system health checks. Improvements here compound into better delivery and reliability outcomes downstream.
Connect to Business Outcomes
Engineering metrics should relate to things the business cares about. Not everything needs a revenue number attached, but there should be a clear line between what you measure and why it matters.
Three signals that bridge the gap:
- Feature adoption or usage change after launch
- Customer-reported issues (trending down is good; zero is suspicious)
- Time from idea approval to customer availability
At a real-time messaging company, connecting deployment lead time to customer feature requests was eye-opening. We could show exactly how much faster a customer got value when the pipeline was healthy versus when it was degraded. That made infrastructure investment easy to justify.
How to Start Without Creating Overhead
Pick four to six metrics. Automate collection. Review monthly with a trend line, not a snapshot. That’s it.
The pitfalls are predictable:
- Using metrics to punish. This kills psychological safety and data quality simultaneously.
- Expanding the list. Every quarter someone wants to add “just one more.” Resist. If something new goes in, something old comes out.
- Reporting numbers without narrative. A chart without context is an inkblot test. Everyone sees what they want to see.
If a metric drives unhelpful behavior, change it or kill it. Metrics exist to guide decisions, not to prove a point. The best set is small enough to remember, stable enough to trend, and connected enough to tell a coherent story about how your engineering system is actually performing.