The GitHub Actions Patterns I Actually Use in Production

Quick take

After migrating Decloud’s entire CI/CD to GitHub Actions, here are the patterns that survived contact with reality: matrix builds that don’t waste money, caching that actually works, job orchestration with artifacts, trigger filtering, and the secret-handling mistakes you’ll make exactly once.

I migrated our CI/CD at Decloud from a janky CircleCI setup to GitHub Actions about four months ago. The basic stuff was trivial. Push code, run tests, green checkmark. Fine.

Then I needed matrix builds across three OSes, caching that doesn’t invalidate every other commit, gated production deploys, and secrets that don’t leak into fork PRs. That’s where it got interesting.

This is what I learned. All of it from breaking things in production at least once.

Matrix builds: cover ground without burning minutes

The matrix strategy is the single best feature in Actions. One job definition, multiple OS and runtime combinations. But the defaults will burn through your free minutes fast if you’re not careful.

Here’s what our test workflow looks like:

name: ci
on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ${{ matrix.os }}
    timeout-minutes: 15
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        node: [12, 14]
        exclude:
          - os: windows-latest
            node: 12
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v1
        with:
          node-version: ${{ matrix.node }}
      - run: npm ci
      - run: npm test

Few things to note.

fail-fast: false – I used to leave this at the default (true). Then we’d get a flaky Windows test that killed the whole matrix, and we’d lose the Linux and macOS results entirely. Set it to false. Let everything run. You want the full picture.

Exclude dead combos. We dropped Node 10 entirely and excluded Node 12 on Windows because we don’t ship Windows builds on that version. Every combination you exclude saves real minutes. We were running 9 combinations, trimmed it to 5. The monthly bill noticed.

timeout-minutes on everything. I learned this one the hard way. A hung test on macOS once ran for 6 hours before I noticed. GitHub’s default timeout is also 6 hours. That’s not a default, that’s a trap. Set 15 minutes. If your tests take longer than that, you have a different problem.

Caching: the difference between 8 minutes and 90 seconds

Caching npm dependencies turned our builds from “go get coffee” to “it’s already done.” But the cache key design matters more than you’d think.

- name: Cache node modules
  uses: actions/cache@v1
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

- name: Install dependencies
  run: npm ci

The restore-keys fallback is critical. Without it, any change to package-lock.json means a complete cache miss and a full install. With the fallback, you get the last close-enough cache and npm ci only fetches the diff. Went from 4-minute installs to about 20 seconds on cache hit.

One gotcha: actions/cache@v1 has a 5GB limit per repo. We hit it after a few weeks because we were caching node_modules directly (big mistake – cache the npm cache directory instead, let npm ci do the linking). Switched to caching ~/.npm and the size dropped by 80%.

Job orchestration: build once, deploy from artifacts

This is the pattern I’m most opinionated about. Never rebuild in your deploy step. Build once, upload the artifact, download it in deploy. Guarantees you deploy exactly what you tested.

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v2
      - name: Cache npm
        uses: actions/cache@v1
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-
      - run: npm ci
      - run: npm run build
      - run: npm test
      - uses: actions/upload-artifact@v2
        with:
          name: dist
          path: dist/
          retention-days: 5

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/download-artifact@v2
        with:
          name: dist
      - name: Deploy to staging
        run: ./scripts/deploy.sh staging
        env:
          DEPLOY_TOKEN: ${{ secrets.STAGING_DEPLOY_TOKEN }}

  deploy-production:
    needs: deploy-staging
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    timeout-minutes: 5
    steps:
      - uses: actions/download-artifact@v2
        with:
          name: dist
      - name: Deploy to production
        run: ./scripts/deploy.sh production
        env:
          DEPLOY_TOKEN: ${{ secrets.PROD_DEPLOY_TOKEN }}

retention-days: 5 on artifacts. Default is 90 days. You don’t need three months of build artifacts. You need maybe a week for debugging. We were storing 12GB of artifacts before I added this.

The needs chain (build -> deploy-staging -> deploy-production) gives you a natural pipeline. Staging has to pass before production even starts. The environment: production on the last job lets you add required reviewers in repo settings – someone has to click “approve” before prod deploy runs. Worth setting up.

Trigger filtering: stop running CI on README changes

This was genuinely annoying before I figured it out. Every docs edit, every README typo fix, every changelog update triggered a full CI run. Waste.

on:
  push:
    branches: [main]
    paths:
      - 'src/**'
      - 'package.json'
      - 'package-lock.json'
      - '.github/workflows/**'
  pull_request:
    paths:
      - 'src/**'

Only run when code actually changes. I also include .github/workflows/** in the push paths because I want CI to test itself when I modify the workflow files. Learned that one after pushing a broken workflow that I didn’t catch for two days because the paths filter excluded the workflows directory.

For production deploys, we use workflow_dispatch so nobody accidentally deploys on merge:

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Deploy target'
        required: true
        default: 'staging'

Manual button in the Actions tab. Feels old-school. Works perfectly.

Secrets and fork PRs: the security stuff that bit me

Here’s the thing nobody tells you clearly enough: fork pull requests can read your workflow files but secrets aren’t injected. This is correct behavior and you shouldn’t try to work around it.

But there’s a subtlety. If you use pull_request_target instead of pull_request, the workflow runs in the context of the base branch and secrets ARE available. I see people recommending this for “fixing” the fork PR problem. Don’t. A malicious fork can modify the workflow steps via the PR code while having access to your secrets. That’s a supply chain attack.

Our approach:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: npm ci
      - run: npm test
      # No secrets needed for tests

  deploy:
    needs: test
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Deploy
        run: ./scripts/deploy.sh
        env:
          DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}

Tests run on PRs without secrets. Deploy only runs on push to main, where secrets are available and the code is trusted. Simple separation.

Also: always mask secrets in logs.

- name: Mask token
  run: echo "::add-mask::${{ secrets.API_TOKEN }}"

GitHub does this automatically for secrets referenced directly, but if you derive a value from a secret (substring, base64 encode, whatever), the derived value isn’t masked. Found that out when a deploy URL containing an embedded token showed up in our build logs. Fun morning.

Self-hosted runners: probably not yet

We considered self-hosted runners for builds that need access to our internal Docker registry. Decided against it. The maintenance overhead of patching, isolating, and monitoring runner machines wasn’t worth it for our team size. We just push to a public registry and pull in deploy.

If you do need them:

jobs:
  build:
    runs-on: self-hosted
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v2
      - run: make build

Just know that self-hosted runners don’t get automatic cleanup between jobs like hosted runners do. Previous job’s files, environment variables, running processes – all still there. You need to handle that yourself or you’ll get extremely confusing cross-contamination bugs.

The patterns that stuck

After four months in production, these are the non-negotiable habits:

timeout-minutes on every job. No exceptions. 15 minutes for tests, 5 for deploys.
Cache the package manager cache, not node_modules. Smaller, more reliable, fewer weird symlink issues.
Build once, artifact it, deploy from artifact. Never rebuild in deploy.
fail-fast: false on matrices. Always get the complete picture.
Path filters on triggers. Stop wasting minutes on non-code changes.
Never use pull_request_target for fork PRs. Just don’t.
retention-days on all artifacts. Your future storage bill will thank you.

None of this is groundbreaking. But I spent a couple months learning each of these the hard way, so maybe this saves you the same debugging sessions. GitHub Actions is genuinely good in 2020. The YAML is verbose. The ecosystem is young. But the fundamentals are solid and the GitHub integration is unbeatable.