Comparing Infrastructure Testing Approaches: What Actually Catches Bugs

| 6 min read |
infrastructure testing terraform iac

I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses.

Quick take

Most infra teams either test nothing or test everything the slow way. The sweet spot is layering three approaches – static/unit, policy, and integration – and knowing which one catches which class of bug. I built a comparison from six months of actually doing this at Decloud.


I broke production with a Terraform typo last year. Changed a security group rule, fat-fingered a CIDR block, and opened port 22 to the internet for about forty minutes. Nobody caught it in review. Our CI pipeline at the time was terraform plan and vibes.

That was the wake-up call. I spent the next six months building out infrastructure testing at Decloud, trying every approach I could find. Unit tests for IaC. Policy engines. Full integration runs with real resources. Some of it was worth the effort. Some wasn’t.

Here’s what I learned.

Three Approaches, Three Different Jobs

The mistake I see most teams make is treating “infrastructure testing” as one thing. It’s not. There are three fundamentally different approaches, and they catch fundamentally different problems.

Static / Unit TestsPolicy / Compliance TestsIntegration Tests
What it checksSyntax, structure, basic correctnessSecurity rules, org standards, regulatory requirementsActual deployed resources behaving correctly
SpeedSecondsSeconds to low minutes5-30 minutes
CostZero (no resources created)Zero (evaluates plans/manifests)Real money (spins up real infra)
CatchesTypos, invalid configs, format driftPolicy violations, misconfigs that scanners know aboutNetworking issues, IAM problems, broken dependencies
MissesAnything behavioralNovel misconfigs, runtime behaviorNothing – but too slow/expensive to run on every commit
Run whenEvery commit, pre-commit hooksEvery PR, every terraform planNightly, or on merges to main
Toolsterraform validate, tflint, kubevalOPA/Conftest, Checkov, tfsecTerratest, Kitchen-Terraform

That table is the whole thesis. Each layer fills in what the previous one misses.

Static and Unit Tests: The Fast Floor

These are your terraform fmt, terraform validate, tflint, kubeval checks. They run in seconds. They cost nothing. They should be on every single commit.

terraform fmt -check -recursive
terraform validate
tflint --chdir terraform/
kubeval --strict k8s/*.yaml

What surprised me: these catch more than you’d think. At Decloud, roughly 40% of our infra PRs had at least one issue caught at this layer. Mostly formatting, occasionally an actual invalid reference or deprecated resource argument.

What they won’t catch: anything that’s syntactically valid but semantically wrong. That security group I opened to the world? Perfectly valid HCL. terraform validate had no opinion.

Policy Tests: Where the Real Value Is

This is the layer I underestimated. OPA with Conftest lets you write rules against Terraform plan JSON, Kubernetes manifests, basically any structured config.

The SSH rule that would have saved me forty minutes of panic:

package terraform

deny[msg] {
  r := input.resource_changes[_]
  r.type == "aws_security_group_rule"
  r.change.after.cidr_blocks[_] == "0.0.0.0/0"
  r.change.after.from_port == 22
  msg := "SSH must not be open to the world"
}

Small. Specific. Tied to an actual incident. That’s the pattern.

I tried building comprehensive policy libraries. Dozens of rules covering every possible misconfiguration. Bad idea. Teams bypass large generic policy sets. They add exceptions, disable checks, or just stop running them. The policies that stick are the ones written after something went wrong. Each one represents a scar.

We also run Checkov as a complement. It knows about hundreds of common misconfigs out of the box. Good for catching things you haven’t been burned by yet. But it’s a floor, not a ceiling.

My rule of thumb: if a scanner catches it, great. If it doesn’t, and it bit you, write a policy for it. Your policy library should grow from incidents, not from a wish list.

Integration Tests: Expensive and Worth It (Sometimes)

Integration tests create real AWS resources, run assertions against them, then tear everything down. Terratest is what we use. Go-based, works well with our existing stack.

These are the only tests that catch networking and IAM problems. You can validate a Terraform plan all day – it won’t tell you that your NAT gateway routing is broken or that your RDS instance can’t actually reach the VPC endpoint it needs.

But they’re slow. Our VPC module integration test takes about twelve minutes. Our EKS test takes twenty-five. And they cost money – not a lot per run, but it adds up if you’re running them on every PR.

What we integration testRuntimeWhy
VPC module (routing, NAT, DNS)~12 minNetworking bugs are invisible until deploy
EKS cluster (deploy, healthcheck)~25 minToo many moving parts to validate statically
RDS module (connectivity, encryption)~8 minEncryption settings are easy to get wrong

We don’t integration test every module. That’s the trap. Test the ones where a static check can’t tell you if it works. Our S3 bucket module? Unit tests and policy checks are enough. Our networking module? Needs the real thing.

The cleanup problem is real. Terratest has defer terraform.Destroy() but it’s not bulletproof. We’ve had orphaned resources from failed tests sit around for weeks. I wrote a nightly cleanup job that tags test resources and nukes anything older than 24 hours. Unsexy but necessary.

How We Wire It Together

Our CI pipeline at Decloud:

  1. Pre-commit hooks: terraform fmt, basic linting. Runs locally in seconds.
  2. PR checks: terraform validate, tflint, Conftest policies, Checkov. Blocks merge on failure. Takes about 90 seconds.
  3. Nightly on main: Integration tests for critical modules. Results posted to Slack. Takes about 45 minutes total.
  4. Post-deploy: Smoke tests hitting real endpoints. Health checks, latency thresholds, basic connectivity. Alerts on failure.

The key insight: fast checks on every commit, expensive checks on a schedule. I’ve seen teams try to run Terratest on every PR and give up within a month because the feedback loop is too slow.

Practical Stuff That Matters

A few things I wish someone had told me six months ago:

  • Tag everything with a test identifier. When tests fail mid-run and leave resources behind, you need to find them.
  • Set hard cost limits. We capped our test AWS account at $200/month. Hit it once. Fixed the test that was spinning up m5.4xlarge instances.
  • Never put real credentials in test fixtures. Sounds obvious. I’ve reviewed code where people did it anyway.
  • Store plan outputs. When something breaks in production, being able to diff against the last tested plan is invaluable.

The Honest Assessment

Six months in, here’s where we stand. Static checks are table stakes – zero excuse not to have them. Policy tests are the highest ROI layer and the one most teams skip. Integration tests are worth it for complex modules but not for everything.

If I were starting from scratch tomorrow, I’d add static checks on day one, write my first policy rule on day two after reading our incident history, and add integration tests only for the modules that scare me.

The goal isn’t perfect coverage. It’s knowing that the categories of mistakes that have hurt you before won’t hurt you again.