Quick take
Reserved instances aren’t a strategy. Visibility is. Most teams burn money on idle resources they can’t even find, then negotiate a discount on the waste. Fix the waste first.
I’ve managed cloud spend at three startups now. A fintech startup where every penny in margin mattered. A mobility platform at Dropbyke in Seoul where we ran real-time services on a shoestring. And currently at Decloud, where – ironically – the whole product is about making cloud infrastructure sane.
I keep seeing the same pattern. Someone gets a scary AWS bill. Leadership asks engineering to “optimize cloud costs.” Engineering buys reserved instances, saves 20%, and calls it done. Meanwhile, half the staging environments are running 24/7, there are EBS volumes attached to instances that were terminated months ago, and nobody knows which team owns the NAT gateway that’s costing $400/month.
Reserved instances are a pricing mechanism, not a strategy. Here’s what actually works, ranked by effort vs. impact.
The Real Comparison
| Strategy | Effort | Typical Savings | When It Helps | When It’s a Trap |
|---|---|---|---|---|
| Kill idle resources | Low | 15-30% | Always. Literally always. | Never a trap. Just do it. |
| Right-size instances | Medium | 10-25% | After you have utilization data | Before you have data – you’ll guess wrong |
| Storage lifecycle policies | Low | 5-15% | When you have >1TB of objects | When access patterns are unpredictable |
| Reserved instances / savings plans | Low (to buy) | 20-40% off on-demand | Steady, predictable workloads | Fast-changing services, early-stage products |
| Spot / preemptible instances | High | 60-90% off on-demand | CI, batch jobs, stateless workers | Anything stateful or latency-sensitive |
| Architecture changes | Very high | 10-50% | When compute patterns are fundamentally wasteful | When the org isn’t ready for the migration cost |
That table tells a story most FinOps consultants won’t: the highest-ROI work is the boring stuff at the top.
Start With Visibility, Not Discounts
At the fintech startup, I inherited an AWS account with zero tagging discipline. Costs were allocated to “engineering” as a single line item. Completely useless. My first week, I enforced four tags on every resource:
Environment: production | staging | dev
Team: platform | data | product
Service: api | worker | web | ml-pipeline
Owner: <person who provisioned it>
Within two weeks, we found three full staging environments nobody was using. One had been running for five months. That single discovery saved more than the next quarter’s reserved instance commitment would have.
Tags are free. Use them.
Right-Sizing: The Unsexy Middle Ground
Most instances are oversized. This isn’t controversial. Every cloud provider’s own tooling will tell you this. The question is what to do about it.
My approach, applied at every place I’ve worked:
- Collect two weeks of utilization data. Not one day. Not peak hour. Two weeks across normal traffic patterns.
- Start with non-production. Dev and staging environments are almost always 2-4x oversized because someone copied the production config.
- Target p95 CPU below 60% after resizing. You want headroom. This isn’t about running hot.
| Instance Type | Monthly Cost | Avg CPU | p95 CPU | Action |
|---|---|---|---|---|
| m5.2xlarge | $280 | 8% | 15% | Downsize to m5.large ($70) |
| r5.xlarge | $183 | 45% | 72% | Keep. Good fit. |
| c5.4xlarge | $496 | 12% | 20% | Downsize to c5.xlarge ($124) |
| t3.medium | $30 | 3% | 5% | Ask if this service is still needed |
That last column is the one that matters most. Sometimes the right size is zero.
Spot Instances: High Reward, High Effort
I’m a fan of spot capacity for the right workloads. At Decloud, we run all CI builds on spot instances. Saves us roughly 70% on build infrastructure. But I wouldn’t touch spot for anything user-facing or stateful.
The honest comparison:
| Workload | Spot-safe? | Why / Why Not |
|---|---|---|
| CI/CD pipelines | Yes | Builds are idempotent. Retry is cheap. |
| Batch ETL jobs | Yes | Checkpointable. Interruption adds minutes, not risk. |
| Dev environments | Yes | Nobody cares if dev goes down for 2 minutes. |
| Stateless API workers | Maybe | Only if you have enough replicas and proper drain handling. |
| Databases | No | Just… no. |
| Single-instance anything | No | Spot + single instance = eventual outage |
The Thing Nobody Talks About: Data Transfer
Cross-AZ traffic on AWS is $0.01/GB each way. Doesn’t sound like much until you’re pushing 10TB/month between services in different availability zones. That’s $200/month just for services talking to each other inside your own VPC.
Three fixes that cost nothing but attention:
- Keep chatty services in the same AZ. Schedule pods to co-locate when latency and bandwidth matter.
- Use VPC endpoints for AWS services. S3 traffic through a NAT gateway is money on fire.
- Cache aggressively at the edge. A CDN in front of your API responses (where appropriate) cuts both latency and egress.
At Dropbyke, data transfer was our third-highest line item after compute and RDS. Moving our real-time location services into a single AZ and adding a VPC endpoint for S3 cut that bill by 40% in one sprint.
What I Actually Do Every Month
I don’t have a FinOps team. At startup scale, cost discipline is an engineering habit, not a department. Here’s my actual routine:
Weekly: Glance at the cost dashboard. Look for anything that jumped more than 20% week-over-week. If something spiked, find it before the month closes.
Monthly: Review top 10 services by cost. Check for idle resources – unattached volumes, orphaned load balancers, forgotten test clusters. Five minutes of cleanup saves hundreds.
Before any commitment: Right-size first. Always. Buying a reserved instance for an oversized instance just locks in the waste at a discount.
Cloud cost optimization isn’t a project with a finish line. It’s hygiene. Like writing tests or reviewing pull requests – you either build the habit or you pay the tax forever.
The teams that manage cloud spend well aren’t the ones with the best FinOps tooling. They’re the ones where every engineer can see what their service costs and cares enough to keep it reasonable.