I spent years in NATO cyber defense before I got into startups. One thing that stuck with me: the gap between “we have a scanner” and “we’re secure” is enormous. Most teams I’ve seen have a container scanner somewhere in their CI pipeline. Most of them also have hundreds of unread findings, no triage process, and a vague feeling that they’re covered.
They aren’t.
What Are You Actually Scanning
A container image is a layer cake of risk. The base OS, the system packages, the application dependencies, your code, and whatever config and secrets accidentally ended up in there. Scanning just the OS packages – which is what most default setups do – covers maybe 30% of the attack surface.
A useful scan covers:
- Base image and OS packages
- Application dependencies (npm, pip, Go modules, whatever)
- Misconfigurations – running as root, exposed ports, writable filesystems
- Secrets embedded in image layers (it happens more than you think)
If your scanner only does the first one, you have a false sense of security. Which is worse than no scanner at all, because at least without a scanner you know you’re exposed.
Where to Scan
CI, before merge. This is where fixes are cheap. Block on critical findings. Let medium and low findings through with a warning. If you block on everything, developers will route around the scanner. I’ve seen it happen at three different companies. The scanner becomes a rubber stamp or, worse, gets disabled.
Registry, on a schedule. An image that was clean last Tuesday might have a critical CVE by Friday. Rescan images in the registry weekly at minimum. Alert on new critical findings. This is the part most teams skip, and it’s the part that matters most for long-lived images.
Admission control. Prevent unscanned images from running in production. This doesn’t need to be complicated – a simple OPA policy or Kyverno rule that rejects images without a scan annotation. The goal is to close the gap between “we scanned it” and “we’re running it.”
Prioritization Over Volume
A scanner that reports 500 findings isn’t helping. It’s generating noise. Prioritize by four factors:
- Exploitability. Is there a known exploit in the wild? A theoretical vulnerability in a library you don’t use isn’t urgent.
- Exposure. Is this component internet-facing or buried behind three layers of internal networking?
- Criticality. Is this your payment service or your internal metrics dashboard?
- Fix availability. If there’s no patch, what exactly do you want the team to do about it?
Handle exceptions with documented waivers – owner, reason, expiry date. No permanent exceptions. If a waiver expires and nobody renews it, the finding blocks again.
Base Images Are Dependencies
Treat them like one. Pin versions. Set a monthly upgrade cadence. Use minimal bases – distroless or alpine – not because they’re trendy but because fewer packages means fewer CVEs means less noise.
I’ve seen teams running Ubuntu 20.04 base images with 200+ packages when their Go binary needs exactly zero of them. That isn’t a base image. That’s an attack surface donation.
Sign Your Images
Scanning tells you what’s inside an image. Signing tells you it’s the image you built and nobody tampered with it between your CI pipeline and your cluster. Use cosign, use Notary, use whatever – but sign your artifacts and verify on pull.
After the SolarWinds and codecov incidents, “trust but don’t verify” isn’t a defensible position anymore.
The Minimum Viable Pipeline
- Scan in CI with Trivy or Grype. Block on critical, warn on high.
- Rescan in your registry weekly. Alert on new criticals.
- Admission control in Kubernetes. Reject unscanned images.
- Pin and regularly update base images.
- Sign images in CI. Verify at deploy.
That’s it. Five steps. None of them are hard individually. The hard part is maintaining the discipline to triage findings, update waivers, and keep base images current. Discipline over heroics, as always.