Rollback Document — Template

If you cannot turn an AI feature off quickly, you have shipped a liability with a product label.

This template forces the rollback path to be designed before launch. It is one page per feature, lives in the same repo as the feature, and is reviewed at every release that touches the feature.


How to use this template

One feature, one rollback document. Filled in before the feature ships. Reviewed at every change. Tested at least once before launch and once per quarter after.

If a field reads “TBD” or “we’ll figure it out,” the feature is not ready for release.


Template

Feature

  • Name:
  • Owner (human):
  • Backup owner (human):
  • Release stage: dev / canary / partial rollout / full GA
  • Last revert drill: (date — must have happened in the last 90 days)

The four questions, answered

How do we turn this off?

Name the exact mechanism. Not a description — the command, flag, or operation.

  • Kill switch mechanism:
  • Time to disable (target): ≤ X seconds
  • Tested last on:

How do we know it is hurting us?

The signals that trigger the rollback decision. Each one is a number with a source.

  • Customer-visible signal: (e.g., support ticket rate > X% over Y minutes — measured where)
  • Quality signal: (e.g., eval-suite drift, judge-model score drop — measured where)
  • Reliability signal: (e.g., malformed-output rate, retry rate, error rate — measured where)
  • Cost signal: (e.g., $/task exceeds 2x baseline — measured where)
  • Who is paged when any of these fires:

How fast can we revert?

The revert path, end to end. Not “we’ll roll back the deploy” — the specific steps.

  • Revert steps in order: (each step with the command or operation)
  • Time to revert (target): ≤ X minutes from page to recovery
  • Data implications of revert: (writes to undo, state to restore, consistency caveats)
  • Tested last on:

What manual path exists if the model degrades?

What the team does when the AI is off. This is the bridge between “broken” and “fixed.”

  • Manual fallback workflow:
  • Who staffs the fallback:
  • Capacity of the fallback: (how much traffic the manual path can absorb)
  • SLA the customer sees during fallback:
  • Customer comms during fallback: (who, what channel, what message)

Coverage map — what gets reverted by the rollback

A rollback is a graph, not a single operation. Be explicit about what is rolled back and what is not.

  • Code: (which services, which version)
  • Data writes: (which tables, which queues, which idempotency keys to honor)
  • Side effects: (emails, vendor API calls, downstream events — already-sent are not recallable)
  • Caches: (which caches need invalidation, which can stay)
  • Feature flags: (the flags toggled by the rollback)
  • Customer state: (anything customers see that needs to be reset, hidden, or messaged)

Risk classification

  • Blast radius if not reverted in 1 hour: (customers affected, dollars at risk, regulatory exposure)
  • Blast radius if not reverted in 24 hours:
  • Reputational risk: (low / medium / high — why)
  • Regulatory risk: (GDPR, sector-specific, contractual SLAs — list each)

Drill log

Date, who ran it, what happened, what changed afterward. The drill is a tabletop or a real revert in a non-prod environment, depending on the feature’s blast radius.

DateDriverOutcomeAction items

Change log

Every material change to the feature requires this doc to be reviewed. Every quarter, owner re-confirms the doc still describes reality.

DateChangeUpdated by

Worked example — AI-assisted reply suggestions in support tooling

Feature

  • Name: support-reply-suggestions-v2
  • Owner: A. Ramos (eng lead, support platform)
  • Backup owner: J. Patel (eng manager, support platform)
  • Release stage: Full GA, EU + US.
  • Last revert drill: 2026-04-12 (tabletop with on-call).

The four questions

How do we turn this off?

  • Kill switch mechanism: Feature flag support.reply_suggestions.enabled = false (LaunchDarkly), per-region.
  • Time to disable: ≤ 30 seconds globally.
  • Tested last on: 2026-04-12.

How do we know it is hurting us?

  • Customer-visible signal: Support CSAT drops > 4 points day-over-day in the rolling 24-hour window. Measured in the support metrics warehouse.
  • Quality signal: Eval-suite pass rate drops below 88%, or hallucination rate exceeds 3% on sampled replies. Measured by the async eval job that scores 5% of suggestions.
  • Reliability signal: Malformed-suggestion rate > 1% over 15 minutes, or p99 latency > 8s. Measured in Datadog.
  • Cost signal: Cost per suggestion exceeds 2x the 30-day baseline. Measured in the FinOps dashboard.
  • Who is paged: Support platform on-call (PagerDuty schedule support-platform-primary).

How fast can we revert?

  • Revert steps:
    1. Flip support.reply_suggestions.enabled = false in LaunchDarkly. Confirm 100% rollout of the disabled flag.
    2. Verify in Datadog that suggestion generation drops to zero within 60 seconds.
    3. Page the support manager on duty to confirm staffing for the manual path.
    4. Post in #support-ops Slack channel: feature disabled, reason, ETA.
  • Time to revert: ≤ 5 minutes from page to recovery.
  • Data implications: Already-shown suggestions remain visible in agents’ UIs until the next page load. No data writes to roll back — suggestions are read-only until a human accepts them.
  • Tested last on: 2026-04-12.

What manual path exists?

  • Manual fallback workflow: Support agents see the customer message without an AI-suggested reply. They use existing macros and templates.
  • Who staffs the fallback: Existing support agent rotation. No additional staffing required for normal load.
  • Capacity of the fallback: Handles 100% of current support volume. Average handle time increases by ~30% without suggestions (measured during the 2026-04-12 drill).
  • SLA during fallback: Standard support SLA unchanged.
  • Customer comms: None. Customers do not see the AI suggestions directly; they see the support agent’s reply.

Coverage map

  • Code: support-reply-suggestions service, all versions.
  • Data writes: None. The agent only reads ticket context and proposes text; the human writes the reply.
  • Side effects: None outside the support tool UI.
  • Caches: Suggestion cache (Redis) — does not need invalidation; will simply not be populated.
  • Feature flags: support.reply_suggestions.enabled (the kill switch).
  • Customer state: None. Customer experience is unchanged.

Risk classification

  • 1-hour blast radius if not reverted: ~3,000 support tickets per hour see degraded suggestions. Agent productivity drops, CSAT may dip 2-4 points. Recoverable.
  • 24-hour blast radius: ~70,000 tickets affected. CSAT impact compounds. Possible SLA breaches on response time. Reputational impact moderate.
  • Reputational risk: Medium — visible in support channels but not externally branded as “AI.”
  • Regulatory risk: Low. No PII written by the agent. EU data handled per existing support tool sovereignty controls.

Drill log

DateDriverOutcomeAction items
2026-04-12J. PatelRevert completed in 3:40. Manual fallback capacity confirmed. CSAT dip during drill window was 1.2 points — within tolerance.Add automated dashboard for fallback capacity check
2026-01-15A. RamosRevert completed in 4:50. Flagged that the #support-ops post was manual; should be auto-posted by the kill switch action.Implement auto-post (closed 2026-02-08)

Change log

DateChangeUpdated by
2026-05-01Updated quality signal to use new judge-model versionA. Ramos
2026-03-14Added cost signal after vendor price changeA. Ramos
2026-02-08Auto-post to #support-ops on kill-switch trigger implementedJ. Patel

Common mistakes to avoid

  • “Roll back the deploy” is not a rollback plan. Rollbacks of AI features often need a feature flag, not a code revert, because the model is a configured dependency.
  • No data-implications section. A feature that wrote rows or sent emails cannot be rolled back by flipping a flag. The flag stops new harm. It does not undo existing harm.
  • Drill log empty. A rollback that has never been executed is a hypothesis. Run the drill on a non-prod environment before launch.
  • Signals without a source. “Customer complaints rise” is not measurable. “Support ticket rate > X% over Y minutes, measured in the support warehouse” is.
  • Fallback capacity unverified. “Humans will handle it” is not capacity planning. Number of tickets per hour, number of humans, average handle time — those are capacity.
  • Customer comms missing. If the customer notices the rollback, the comms plan must exist before the rollback happens, not be improvised during it.
  • One revert path for everything. Different signals may justify different revert actions. A cost spike might need throttling, not a full kill. A quality drop might need a fallback to a cheaper model, not full disable. The doc may list more than one path.

Companion to the manifesto Build the System the Model Cannot Break . See also: Agent Reliability Contract template · Eval Suite starter kit .