Your Kubernetes Bill Is Lying to You

Quick take

Most Kubernetes clusters run at 20-40% actual utilization. The rest is wasted money. Right-size your resource requests using real data, stop guessing at CPU limits, use spot instances for stateless workloads, and review your spend monthly. I’ve helped teams cut 30-50% off their K8s bills without a single reliability regression.

Last month I looked at a company spending $47k/month on their Kubernetes clusters. Actual utilization across the board? 23%. They were paying for ghost capacity that nobody asked for and nobody noticed.

This isn’t unusual. I see it at almost every enterprise engagement. Kubernetes makes scaling trivially easy, which makes over-provisioning trivially easy too. Some developer copy-pasted resource requests from a Stack Overflow answer two years ago. Nobody questioned it. The cluster autoscaler dutifully spun up nodes to satisfy those fictional requests. The bill grew.

Why Your Bill Keeps Growing

Kubernetes schedules based on resource requests, not actual usage. Your cloud bill pays for the nodes backing those requests. If every pod requests 2 CPU cores but uses 0.3, you’re paying for 2 cores of node capacity per pod. Multiply that across hundreds of pods. Yeah.

The usual culprits:

Copy-pasted resource blocks that become tribal defaults
Memory requests inflated to “never get OOM-killed” levels, without anyone measuring actual usage
CPU limits set defensively and never revisited
Zero cost visibility per team or per service

Measure First. Seriously.

I can’t stress this enough. Before you touch a single resource request, get visibility into what you’re actually using.

Build a dashboard (Grafana, Kubecost, whatever) that shows:

CPU requested vs CPU used at p95
Memory requested vs peak working set
Utilization broken down by namespace
Nodes that can’t scale down because of inflated requests

The first time an engineering lead sees their team requesting 64 cores and using 11, the conversation changes fast.

The Cost Comparison

Here is what I typically see before and after right-sizing:

Category	Before	After	Savings
CPU requests vs actual	4x over-provisioned	1.3x buffer	~60% node reduction
Memory requests	2-3x peak usage	Peak + 20% buffer	~40% node reduction
Node types	Single large instance type	Mixed instance pools	15-25% better bin packing
Spot usage	0%	40-60% of stateless workloads	60-70% on those nodes
Dev/staging environments	Same specs as prod	Right-sized, spot-heavy	50-70% reduction
Typical monthly bill	$45-50k	$20-25k	45-55%

These are real numbers from a mid-sized company running about 200 microservices. Your mileage will vary, but the pattern holds.

Right-Size Resource Requests

Stop guessing. Use p95 CPU as your request baseline. Use peak memory working set plus a 15-20% buffer for memory requests. Set memory limits to protect against runaway processes. Drop CPU limits entirely for most services – they cause throttling that hurts latency more than it helps anything.

resources:
  requests:
    cpu: 250m      # based on p95 actual usage
    memory: 1Gi    # based on peak working set + buffer
  limits:
    memory: 2Gi    # hard cap for runaway protection
    # no CPU limit -- let it burst

Run VPA in recommendation mode first. It watches actual usage and suggests request values. Don’t let it auto-apply yet. Review the recommendations manually, make sure they make sense, then update your deployments. Build trust before you automate.

Fix the Node Layer

Right-sized pods on wrong-sized nodes still waste money. If you have a bunch of pods requesting 500m CPU and 512Mi memory, shoving them onto m5.4xlarge instances is terrible bin packing.

Mix your instance types. Use smaller instances for smaller workloads. Create separate node pools for workloads with different profiles (CPU-heavy vs memory-heavy vs general). Enable Cluster Autoscaler and let it remove nodes that are actually idle.

Autoscaling only works when requests are honest. Inflated requests mean the autoscaler sees “full” nodes everywhere and keeps adding capacity you don’t need.

Spot Instances: Free Money (Almost)

Spot instances are 60-70% cheaper than on-demand. The catch: they can be reclaimed with two minutes notice.

Good candidates for spot:

Stateless services with 3+ replicas
Batch jobs and queue processors
Dev and staging environments (all of it, honestly)
Anything that handles graceful shutdown

Bad candidates: single-replica stateful services, databases, anything where losing a node means losing data.

Use taints and tolerations to pin critical workloads to on-demand nodes. Spread spot across multiple instance types and AZs so reclamation doesn’t take out your entire fleet at once.

Guardrails That Stick

Optimization without guardrails is a one-time win. Things drift back within months.

Set up:

ResourceQuota per namespace so no team can accidentally claim half the cluster
LimitRange to enforce minimum and maximum resource requests
Labels for team, environment, and service – you need these for cost allocation
Monthly review cadence – look at the top 10 over-provisioned workloads, adjust, repeat

That last one matters most. Cost optimization isn’t a project. It’s a habit. A 30-minute monthly review catches drift before it becomes a $10k/month problem.

The Uncomfortable Truth

Most Kubernetes cost problems aren’t technical problems. They’re ownership problems. Nobody owns the bill. Nobody sees the bill broken down by team. Nobody gets asked “why are you requesting 8 cores for a service that peaks at 0.5?”

Fix the visibility and the accountability first. The technical optimization follows naturally.

I’ve watched teams cut their bills in half within two months just by making resource usage visible to the teams that own the workloads. No fancy tooling. No replatforming. Just data, ownership, and a monthly conversation.