Quick take
After migrating Decloud’s entire CI/CD to GitHub Actions, here are the patterns that survived contact with reality: matrix builds that don’t waste money, caching that actually works, job orchestration with artifacts, trigger filtering, and the secret-handling mistakes you’ll make exactly once.
I migrated our CI/CD at Decloud from a janky CircleCI setup to GitHub Actions about four months ago. The basic stuff was trivial. Push code, run tests, green checkmark. Fine.
Then I needed matrix builds across three OSes, caching that doesn’t invalidate every other commit, gated production deploys, and secrets that don’t leak into fork PRs. That’s where it got interesting.
This is what I learned. All of it from breaking things in production at least once.
Matrix builds: cover ground without burning minutes
The matrix strategy is the single best feature in Actions. One job definition, multiple OS and runtime combinations. But the defaults will burn through your free minutes fast if you’re not careful.
Here’s what our test workflow looks like:
name: ci
on:
push:
branches: [main]
pull_request:
jobs:
test:
runs-on: ${{ matrix.os }}
timeout-minutes: 15
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node: [12, 14]
exclude:
- os: windows-latest
node: 12
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: ${{ matrix.node }}
- run: npm ci
- run: npm test
Few things to note.
fail-fast: false – I used to leave this at the default (true). Then we’d get a flaky Windows test that killed the whole matrix, and we’d lose the Linux and macOS results entirely. Set it to false. Let everything run. You want the full picture.
Exclude dead combos. We dropped Node 10 entirely and excluded Node 12 on Windows because we don’t ship Windows builds on that version. Every combination you exclude saves real minutes. We were running 9 combinations, trimmed it to 5. The monthly bill noticed.
timeout-minutes on everything. I learned this one the hard way. A hung test on macOS once ran for 6 hours before I noticed. GitHub’s default timeout is also 6 hours. That’s not a default, that’s a trap. Set 15 minutes. If your tests take longer than that, you have a different problem.
Caching: the difference between 8 minutes and 90 seconds
Caching npm dependencies turned our builds from “go get coffee” to “it’s already done.” But the cache key design matters more than you’d think.
- name: Cache node modules
uses: actions/cache@v1
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- name: Install dependencies
run: npm ci
The restore-keys fallback is critical. Without it, any change to package-lock.json means a complete cache miss and a full install. With the fallback, you get the last close-enough cache and npm ci only fetches the diff. Went from 4-minute installs to about 20 seconds on cache hit.
One gotcha: actions/cache@v1 has a 5GB limit per repo. We hit it after a few weeks because we were caching node_modules directly (big mistake – cache the npm cache directory instead, let npm ci do the linking). Switched to caching ~/.npm and the size dropped by 80%.
Job orchestration: build once, deploy from artifacts
This is the pattern I’m most opinionated about. Never rebuild in your deploy step. Build once, upload the artifact, download it in deploy. Guarantees you deploy exactly what you tested.
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v2
- name: Cache npm
uses: actions/cache@v1
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- run: npm ci
- run: npm run build
- run: npm test
- uses: actions/upload-artifact@v2
with:
name: dist
path: dist/
retention-days: 5
deploy-staging:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- uses: actions/download-artifact@v2
with:
name: dist
- name: Deploy to staging
run: ./scripts/deploy.sh staging
env:
DEPLOY_TOKEN: ${{ secrets.STAGING_DEPLOY_TOKEN }}
deploy-production:
needs: deploy-staging
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
timeout-minutes: 5
steps:
- uses: actions/download-artifact@v2
with:
name: dist
- name: Deploy to production
run: ./scripts/deploy.sh production
env:
DEPLOY_TOKEN: ${{ secrets.PROD_DEPLOY_TOKEN }}
retention-days: 5 on artifacts. Default is 90 days. You don’t need three months of build artifacts. You need maybe a week for debugging. We were storing 12GB of artifacts before I added this.
The needs chain (build -> deploy-staging -> deploy-production) gives you a natural pipeline. Staging has to pass before production even starts. The environment: production on the last job lets you add required reviewers in repo settings – someone has to click “approve” before prod deploy runs. Worth setting up.
Trigger filtering: stop running CI on README changes
This was genuinely annoying before I figured it out. Every docs edit, every README typo fix, every changelog update triggered a full CI run. Waste.
on:
push:
branches: [main]
paths:
- 'src/**'
- 'package.json'
- 'package-lock.json'
- '.github/workflows/**'
pull_request:
paths:
- 'src/**'
Only run when code actually changes. I also include .github/workflows/** in the push paths because I want CI to test itself when I modify the workflow files. Learned that one after pushing a broken workflow that I didn’t catch for two days because the paths filter excluded the workflows directory.
For production deploys, we use workflow_dispatch so nobody accidentally deploys on merge:
on:
workflow_dispatch:
inputs:
environment:
description: 'Deploy target'
required: true
default: 'staging'
Manual button in the Actions tab. Feels old-school. Works perfectly.
Secrets and fork PRs: the security stuff that bit me
Here’s the thing nobody tells you clearly enough: fork pull requests can read your workflow files but secrets aren’t injected. This is correct behavior and you shouldn’t try to work around it.
But there’s a subtlety. If you use pull_request_target instead of pull_request, the workflow runs in the context of the base branch and secrets ARE available. I see people recommending this for “fixing” the fork PR problem. Don’t. A malicious fork can modify the workflow steps via the PR code while having access to your secrets. That’s a supply chain attack.
Our approach:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: npm ci
- run: npm test
# No secrets needed for tests
deploy:
needs: test
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Deploy
run: ./scripts/deploy.sh
env:
DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}
Tests run on PRs without secrets. Deploy only runs on push to main, where secrets are available and the code is trusted. Simple separation.
Also: always mask secrets in logs.
- name: Mask token
run: echo "::add-mask::${{ secrets.API_TOKEN }}"
GitHub does this automatically for secrets referenced directly, but if you derive a value from a secret (substring, base64 encode, whatever), the derived value isn’t masked. Found that out when a deploy URL containing an embedded token showed up in our build logs. Fun morning.
Self-hosted runners: probably not yet
We considered self-hosted runners for builds that need access to our internal Docker registry. Decided against it. The maintenance overhead of patching, isolating, and monitoring runner machines wasn’t worth it for our team size. We just push to a public registry and pull in deploy.
If you do need them:
jobs:
build:
runs-on: self-hosted
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
- run: make build
Just know that self-hosted runners don’t get automatic cleanup between jobs like hosted runners do. Previous job’s files, environment variables, running processes – all still there. You need to handle that yourself or you’ll get extremely confusing cross-contamination bugs.
The patterns that stuck
After four months in production, these are the non-negotiable habits:
timeout-minuteson every job. No exceptions. 15 minutes for tests, 5 for deploys.- Cache the package manager cache, not
node_modules. Smaller, more reliable, fewer weird symlink issues. - Build once, artifact it, deploy from artifact. Never rebuild in deploy.
fail-fast: falseon matrices. Always get the complete picture.- Path filters on triggers. Stop wasting minutes on non-code changes.
- Never use
pull_request_targetfor fork PRs. Just don’t. retention-dayson all artifacts. Your future storage bill will thank you.
None of this is groundbreaking. But I spent a couple months learning each of these the hard way, so maybe this saves you the same debugging sessions. GitHub Actions is genuinely good in 2020. The YAML is verbose. The ecosystem is young. But the fundamentals are solid and the GitHub integration is unbeatable.