Your Cloud Bill Is a Design Document

I’ve now seen cloud bills at three different companies – a fintech startup processing market data, a mobility startup at Dropbyke, and now at Decloud where cloud infrastructure is literally the product. The pattern is always the same: nobody looks at the bill until it’s painful, and by then the waste is structural.

FinOps has become the trendy label for fixing this. I think the label is fine. What bothers me is that most FinOps advice reads like a procurement playbook: tag everything, set budgets, buy reserved instances. That’s table stakes. The real leverage is in understanding that your cloud bill is a mirror of your architecture decisions. Every spike, every creeping line item, every mysterious data transfer charge – they’re all telling you something about how your system actually behaves versus how you think it behaves.

The Bill Nobody Reads

At the fintech startup, we had a service that polled financial data APIs on a fixed interval. Straightforward. Except someone had configured the polling interval in milliseconds instead of seconds. The service was hammering APIs a thousand times faster than intended, burning through compute and egress. The bill caught it before monitoring did.

That’s not a budgeting problem. That’s a feedback problem. The bill was screaming useful information and nobody was listening.

Most teams I’ve talked to have a version of this story. The numbers are just different. A dev environment left running over the weekend. An EBS volume attached to a terminated instance. Cross-region replication for a service that doesn’t need it. These aren’t failures of discipline. They’re failures of visibility.

Tagging Is Necessary but Not Sufficient

Every FinOps guide starts with tagging, and they’re right. You need to know who owns the spend.

owner: team-payments
service: api
environment: production
cost-center: CC-12345

But tagging alone just tells you where money goes. It doesn’t tell you why. The second step – the one most teams skip – is connecting cost to workload behavior. Cost per API call. Cost per user session. Cost per data pipeline run.

When I was at Dropbyke, we tracked cost per ride. That single metric changed how we thought about infrastructure. It stopped being “we spent $X on EC2 this month” and became “it costs us $0.03 in infrastructure to serve a ride.” Suddenly, scaling decisions had a business unit attached to them. Product managers could reason about infrastructure tradeoffs without needing to read a CloudWatch dashboard.

Where the Real Money Hides

The big savings aren’t in right-sizing your t3.medium to a t3.small. They help, sure. But the structural wins come from three places:

Data transfer. This is the charge most teams underestimate. Cross-AZ traffic, NAT gateway throughput, CloudFront to origin – these add up silently. At Decloud, we’ve seen teams paying more for data transfer than for the compute processing that data. Consolidating services into fewer AZs or redesigning API chatiness often saves more than any instance resizing.

Storage lifecycle. Logs, snapshots, old EBS volumes – they accumulate like sediment. Nobody deletes a snapshot because nobody knows if something depends on it. Set lifecycle policies early and enforce them. S3 Intelligent-Tiering exists for a reason. Use it.

Idle resources. Dev and staging environments running 24/7 when they’re used 8 hours a day. That’s 70% waste right there. Schedule them. Shut them down. If your team complains about spin-up time, fix the spin-up time. Don’t pay for an always-on environment nobody uses at 2am.

The Monthly Review That Actually Works

I’ve seen teams build elaborate dashboards that nobody checks. The fix is simpler than you think: a 30-minute monthly meeting with five questions.

What did we spend versus what we expected?
What are the top five services by cost, and do they make sense?
What changed the most since last month, and why?
What are we going to do about it, and who owns each action?
What does next month look like?

That’s it. No dashboard required. A spreadsheet works. The point is the conversation, not the tooling. When engineers sit in a room and explain their spend, they start making different decisions. Not because anyone forced them to – because the cost became real.

Reserved Instances and the Commitment Trap

Everyone tells you to buy Reserved Instances. The math looks obvious: commit for a year, save 30-40%. But I’ve watched teams overcommit based on three months of data and end up paying for capacity they restructured away from.

My rule: don’t commit until you have six months of stable utilization data for a workload you’re confident won’t change architecturally. Use Savings Plans over traditional RIs where you can – they’re more flexible. And never reserve more than 70% of your steady-state. Leave room for the architecture to evolve.

Spot instances, on the other hand, are underused. If your workload can tolerate interruption – batch jobs, CI/CD, data processing – you should be running it on spot. We run all our CI on spot at Decloud. The interruption rate is low enough that the occasional retry is far cheaper than on-demand.

Cost as an Engineering Metric

The shift I’m arguing for is cultural, not procedural. Treat cost the same way you treat latency or error rate. Make it visible in dashboards. Talk about it in standups when something is off. Include it in architecture decision records.

This doesn’t mean engineers should be afraid to spend money. The goal isn’t a lower bill – it’s a bill you can explain. There’s a difference between $50k/month that you understand and $30k/month that you can’t account for. The first is fine. The second will eventually hurt you.

At EF this year, I’ve been talking to a lot of early-stage founders about infrastructure. The ones who track unit economics from day one make fundamentally better scaling decisions later. Not because they’re cheap, but because they understand the relationship between their product and their infrastructure. That understanding compounds.

What I’d Do Tomorrow

If you’re starting from zero, here’s the sequence:

Turn on AWS Cost Explorer (or your cloud’s equivalent). It’s free and it’s already collecting data.
Tag your top 10 resources by owner and service. Don’t boil the ocean – start with what’s expensive.
Calculate one unit cost metric that maps to your business. Cost per user, cost per transaction, cost per request – pick one.
Schedule your dev/staging environments to shut down outside business hours.
Set a budget alert at 80% of last month’s spend. Not to punish – to notice.

That’s a week of work. Maybe two. And it’ll save you more than any FinOps platform you could buy.

The cloud bill isn’t a finance document. It’s a design document. Read it like one.