Stop Guessing Your Kubernetes Resource Limits

| 6 min read |
kubernetes devops infrastructure

Most K8s clusters I audit are either wildly overprovisioned or one bad deploy away from eviction storms. Here's how I set requests, limits, and guardrails.

Quick take

Your requests decide scheduling. Your limits decide who dies under pressure. Get them wrong and you’re either burning money or getting paged at 3am. Measure real usage, set requests to p75, limits with headroom, and use LimitRanges so nobody deploys a BestEffort pod to production.


Between running infrastructure at Decloud and working with other teams, there’s one pattern I see in almost every cluster I audit: resource settings are either copy-pasted from a tutorial, set to absurdly high values “just to be safe,” or missing entirely.

The result is always the same. Either you’re paying for 3x the compute you actually need, or your pods are getting evicted during the one traffic spike that matters.

Let me walk through how I actually think about this.

Requests vs limits: two different jobs

People treat these as the same thing. They’re not.

Requests are what the scheduler sees. When you set requests.cpu: 200m, you’re telling Kubernetes “this pod needs at least 200 millicores to run, find me a node that has that available.” The scheduler uses requests to bin-pack pods onto nodes.

Limits are runtime caps. The kernel enforces these. Hit your CPU limit and you get throttled. Hit your memory limit and you get OOM-killed. Very different consequences.

resources:
  requests:
    cpu: "200m"
    memory: "256Mi"
  limits:
    cpu: "1"
    memory: "512Mi"

One thing that trips people up: if you set a limit but skip the request, Kubernetes sets the request equal to the limit. If you set neither, congratulations, you’ve got a BestEffort pod. First to die when the node gets tight.

CPU throttling vs OOM kills

This is the part that bites people.

CPU is compressible. Your container hits its CPU limit? It gets throttled. Slower responses, higher latency, but it stays alive. Annoying but survivable.

Memory isn’t compressible. Your container exceeds its memory limit? The kernel kills it. No negotiation. The pod restarts (if your restart policy allows it), your users see errors, and you get an alert.

I’ve seen teams set memory limits too close to actual usage, then wonder why they’re getting random OOM kills during garbage collection spikes. Leave real headroom on memory. I’d rather waste 50Mi per pod than deal with cascading restarts.

QoS classes matter more than you think

Kubernetes assigns every pod a Quality of Service class based on its resource settings. This isn’t just a label – it determines eviction order when a node runs out of resources.

Guaranteed – requests equal limits for both CPU and memory. These pods get evicted last. Use this for anything you actually care about in production.

Burstable – requests are set but limits are higher. Most pods in real clusters end up here. Fine for most workloads, but know that you’re in the middle of the eviction queue.

BestEffort – no requests, no limits, no guarantees. First to get evicted. I only use this for batch jobs I genuinely don’t care about.

The mistake I see constantly: teams running critical APIs as BestEffort because nobody set resource fields in the deployment spec. Then a memory spike on the node evicts those pods before the logging sidecar that has Guaranteed QoS. Backwards.

How I actually size resources

I don’t guess. I measure, then set values based on real data.

Start with kubectl top to get a quick snapshot:

kubectl top pods -n production
kubectl top pods -n production --containers

But a snapshot isn’t enough. You need percentile data over days or weeks. Pull p50, p75, p95, and p99 from whatever metrics system you’re running – Prometheus, Datadog, whatever.

Then I set values like this:

  • CPU request: p75 usage plus a small buffer. This is your “normal operating range.”
  • Memory request: p95 usage plus a buffer. Memory spikes are less forgiving.
  • CPU limit: 2-3x the request. Let it burst for short periods.
  • Memory limit: request plus enough headroom for GC spikes and transient allocations. But not so much that a leak runs unchecked.
resources:
  requests:
    cpu: "200m"      # p75 + headroom
    memory: "300Mi"  # p95 + headroom
  limits:
    cpu: "500m"      # room for bursts
    memory: "450Mi"  # room for GC, not for leaks

Review these after any significant traffic change. Resource settings should evolve with your workload, not stay frozen from the day someone first deployed the service.

Autoscaling helps but isn’t magic

HPA (Horizontal Pod Autoscaler) scales replicas based on CPU utilization relative to your requests. If your requests are wrong, your scaling behavior is wrong. Garbage in, garbage out.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

VPA (Vertical Pod Autoscaler) is interesting but still beta. I run it in recommendation mode only – let it tell me what it thinks the requests should be, then I decide whether to apply that.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Off"

I don’t trust VPA to mutate running pods in production yet. Maybe next year.

LimitRanges and ResourceQuotas: the guardrails you need

Every namespace in a shared cluster should have a LimitRange. Full stop. Without one, any developer can deploy a pod with no resource settings, and now you’ve got a BestEffort pod sitting in production waiting to get evicted.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
    - type: Container
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      default:
        cpu: "500m"
        memory: "256Mi"

This gives every container sane defaults even if the developer forgot to set them. It’s a safety net, not a replacement for proper resource specs.

ResourceQuotas cap total consumption per namespace. Essential for multi-tenant clusters where you don’t want one team eating the whole cluster.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

What to actually monitor

Set it and forget it doesn’t work here. Watch these:

  • CPU throttling rate – if it’s consistently high, your CPU limit is too low
  • OOM kill count – any non-zero number means your memory limit needs attention
  • Request-to-usage ratio – if you’re requesting 4x what you use, you’re wasting cluster capacity
  • Allocatable vs requested – tells you how much room the cluster actually has
  • Eviction events – if pods are getting evicted, something is wrong with your resource math

I check these weekly for the clusters I manage. It takes ten minutes and prevents most resource-related incidents before they happen.

The bottom line: resource management isn’t glamorous. Nobody writes conference talks about setting CPU requests correctly. But getting this right is the difference between a cluster that hums along quietly and one that wakes you up at 3am because a BestEffort pod got evicted and took your API gateway with it.