Your Software Supply Chain Is Probably a Mess

Quick take

Most organizations can’t answer “what code is running in production and how did it get there?” After SolarWinds, that’s no longer acceptable. This post covers what I’ve been doing to fix it: mapping the full chain, hardening builds, locking down dependencies, signing everything, and monitoring for tampering. Not glamorous work. But it’s the work that matters.

SolarWinds broke something in people’s heads. Good. It needed breaking.

For years, supply chain security was the thing everyone nodded about in security reviews and nobody actually funded. Then a nation-state actor slipped a backdoor into a trusted update mechanism, and suddenly 18,000 organizations – including multiple US government agencies – were running compromised code they had willingly installed.

I was already deep in work with large telecoms when the news hit. Within a week, I had three separate calls asking some variant of the same question: “Could this happen to us?” The honest answer was yes. For most of them, absolutely yes.

What “supply chain” actually means here

People hear “supply chain” and think dependencies. That’s one piece. The full chain is everything between a developer’s keyboard and running production code. Developer identities. Source control. Third-party libraries. Build systems. Artifact registries. Deployment pipelines. Update mechanisms.

SolarWinds wasn’t a dependency attack. It was a build system compromise. The attackers injected code during the build process itself. The source looked clean. The artifact was poisoned. That distinction matters because it tells you where to focus.

Step one: figure out what you actually have

I can’t tell you how many times I’ve walked into an organization and asked “show me every path code takes from commit to production” and gotten blank stares. Not because people are incompetent. Because nobody ever mapped it.

At one telecom company, we discovered that a critical billing service had three different build paths depending on which team member triggered the release. One used the CI pipeline. One used a local build script that predated the CI system by two years. One involved SSH-ing into a jump box. Same service. Three paths. Zero documentation.

You start here:

Enumerate every build pipeline. Not just the ones in your CI tool. The scripts. The manual processes. The “temporary” workarounds that have been running for years.
Map dependency trees. Direct and transitive. For Go projects I use go mod graph. For Node, npm ls --all. You will be horrified by what you find in transitive dependencies.
Identify who can publish what. Who has write access to your artifact registry? Who can trigger a production deployment? If the answer is “most of the engineering team,” that’s your first problem.
Generate SBOMs. A software bill of materials for every release. Not optional. This is the foundation everything else builds on. Tools like syft or cyclonedx-gomod make this trivial for Go projects.

At another company, the dependency inventory alone surfaced 14 packages that had been abandoned by their maintainers. Two had known CVEs. Nobody knew they were even in the tree.

Step two: harden the build

Your build system is a high-value target. If I can compromise your CI runner, I own your artifacts. Treat builds like production infrastructure.

Here’s what I implement:

Ephemeral builders. Every build gets a fresh environment. No state leaks between jobs. If your CI runners have been alive for months with accumulated cruft, that’s a risk.
Network isolation during builds. Builds should pull from a known, audited set of sources. Not the open internet. This means running an internal proxy or mirror for package registries.
Scoped, short-lived secrets. Every credential used in CI should be scoped to exactly one job and expire quickly. I’ve seen enterprise CI systems with admin-level AWS credentials that hadn’t been rotated in over a year. That’s a ticking bomb.
Input verification, output signing. Verify checksums of everything you pull in. Sign everything you produce. If an artifact isn’t signed, it doesn’t deploy. Full stop.
Comprehensive build logs. Every step logged and retained. When something goes wrong – and it will – you need the forensic trail.

For Go specifically, I configure GONOSUMCHECK to be empty and GONOSUMDB to be empty. Every module download gets verified against the sum database. I also set GOFLAGS=-mod=vendor for critical services so the exact source code ships with the repo.

Step three: lock down dependencies

Most modern software is more dependency than original code. I’ve seen Go services where the go.sum file has 400+ entries and the actual application is maybe 3,000 lines. That ratio should make you uncomfortable.

What works:

Pin everything. Lock files aren’t optional. Review lock file changes in PRs the same way you review code changes.
Proxy registries. Run an internal mirror for your package sources. Athens for Go modules. Verdaccio or Artifactory for npm. This gives you a cache, an audit trail, and a kill switch.
Mandatory review for new dependencies. Adding a new dependency should require justification. Who maintains it? What is the bus factor? Does it pull in a transitive tree you aren’t comfortable with?
Prune aggressively. Unused dependencies are attack surface with zero value. I run go mod tidy in CI and fail the build if it produces changes. If your module file isn’t clean, your build shouldn’t pass.
Watch for typosquatting. This sounds paranoid until you realize it’s one of the most common attack vectors for npm and PyPI. Automated checks for package name similarity against known packages are cheap insurance.

One audit turned up a Go service importing a fork of a popular library. The fork was created by a former contractor who had left the company two years earlier. Nobody remembered why the fork existed. The fork was 200 commits behind upstream and had none of the security patches. That’s how supply chain risk works in practice – not dramatic, just quietly accumulating.

Step four: protect distribution

You build a clean artifact. Great. Now what? If your artifact registry has loose access controls, or your deployment pipeline pulls artifacts without verification, you have a gap.

Immutable artifact storage. Once published, an artifact can’t be overwritten. Publish is append-only.
Signature verification at deploy time. Not just at build time. The deployment system should independently verify that the artifact it’s about to run was signed by your build system. Cosign from the sigstore project is excellent for container images.
Separate credentials. The build system shouldn’t have deployment credentials. The deployment system shouldn’t have build credentials. Compromising one shouldn’t give you both.
Monitor your update channels. If you distribute software to customers (not just internal services), your update mechanism is critical infrastructure. Monitor it for unexpected changes, access patterns, and timing anomalies.

Step five: detect what you missed

Prevention is necessary. It’s not sufficient. You need detection.

Alert on build definition changes. If someone modifies a Dockerfile, a CI config, or a build script, that should trigger a review. These are high-leverage files.
Flag dependency changes. New dependencies, major version bumps, changed checksums. All should be visible and reviewed.
Watch build timing. A build that suddenly takes 40% longer might be doing something extra. Or it might just be a slow network day. Either way, investigate.
Block unsigned artifacts. If verification fails, the deployment doesn’t happen. No exceptions. No “we’ll fix it after this release.”
Have a playbook. Key revocation, artifact rollback, dependency quarantine. Know how to do these before you need to do them at 2am.

The phased approach

I roll this out in three phases:

Phase 1 – Visibility (weeks 1-3). Inventory everything. Map pipelines. Generate SBOMs. This is unglamorous but essential. You can’t secure what you can’t see.

Phase 2 – Hardening (weeks 4-8). Isolate builds. Pin and proxy dependencies. Reduce privileges. Rotate credentials. This is where most of the immediate risk reduction happens.

Phase 3 – Verification (weeks 8-12). Sign artifacts. Verify at deploy. Set up monitoring and alerting. Build the incident response playbook.

The timeline varies. A startup with 10 services can do this in a month. An enterprise with hundreds of services and decades of accumulated build infrastructure? Longer. But the phases stay the same.

What I keep seeing go wrong

The number one mistake: buying a scanning tool and calling it “supply chain security.” Scanning finds problems. It doesn’t fix them. If nobody owns the remediation, you’re just generating reports that make you feel productive.

The number two mistake: hardening the build but ignoring everything after it. Your build is hermetic and signed, great. Your artifact registry allows anonymous pushes? Your deployment pipeline skips verification because “it was slow”? You have wasted your effort.

The number three mistake: making security so painful that developers route around it. If your controls add 30 minutes to every build and require three approvals for a one-line fix, people will find workarounds. Good security feels like good tooling.

Where this is heading

SolarWinds was the wake-up call. The executive order on software security followed. SBOM requirements are coming. Frameworks like SLSA are defining levels of build integrity. This isn’t going away.

The organizations that start now – even imperfectly – will be ahead when the requirements become mandatory. The ones that wait will be scrambling.

I’ve been saying this for months. Some listened after SolarWinds. Some are still “evaluating.” The evaluation period ended when a trusted update mechanism became an attack vector for a nation-state operation. Get your house in order.