Agent Reliability Contract — Template

This is the artifact every production agent should have on file before it gets a service account. It is a contract, not a checklist. If the answers below are vague, the agent is not ready for production traffic.

The template is structured in seven sections. Three filled examples follow.

How to use this template

Copy the section under Template into a doc per agent. One agent, one contract. If you have an agent that does two things, you have two agents — split it.

Every field must be answered with a sentence, a number, or a named human. “TBD” or “see Slack thread” is not a valid answer. If you cannot fill a field, the agent is not ready for the next promotion stage.

The contract is reviewed quarterly and after every incident that touches the agent.

Template

1. Identity

Agent name:
Owner (human):
Backup owner (human):
Promotion stage: internal-only / beta / GA
Last reviewed:

2. Scope — what it is allowed to do

Primary task in one sentence:
Inputs it accepts:
Tools it can call: (list each, with the permission scope of each)
Data it can read: (named data classes, not “the database”)
Data it can write: (named tables/queues, not “wherever”)
Users or systems it acts on behalf of:

3. Anti-scope — what it is explicitly not allowed to do

Actions explicitly forbidden: (e.g., sending email externally, modifying billing rows, calling vendor APIs that incur charges)
Data it must not read: (PII classes, regulated data, other tenants)
Tools it must not call:
Decisions it must escalate to a human:

4. Health metrics — what proves it is working

Adoption signal: (real users completing real workflows, weekly)
Reliability signal: (success rate, malformed-output rate, retry rate)
Quality signal: (eval-suite pass rate, golden-set anchor score, judge-model version)
Cost signal: (cost per task, tokens per task, retries per task)
Latency signal: (p50 and p99 end-to-end, not just inference)

5. Blast-radius caps — bounded failure

Each cap is a number. If the cap is approached, the agent throttles. If the cap is exceeded, the agent stops.

Action cap: max actions per task / per hour / per day
Financial cap: max spend the agent can cause per hour (vendor API costs, downstream charges, refunds)
Data cap: max rows read or written per task
Concurrency cap: max simultaneous instances
Time cap: max wall-clock seconds per task before automatic termination
Tenant scope: can the agent ever touch another customer’s data — yes/no, and how is that enforced

6. Degradation and kill switch — what happens when it breaks

Degradation signals: (specific metrics that trigger fallback)
Fallback path: (deterministic alternative, named in code)
Kill switch — three operations, three latencies:
- Stop inference: target ≤ X seconds, mechanism
- Revoke tool credentials: target ≤ X seconds, mechanism
- Stop in-flight side effects: target ≤ X seconds, mechanism
Who can pull each kill switch (named human or role):
Customer comms plan if the agent fails: (who, what channel, what SLA)
Manual fallback the on-call engineer can run at 2 a.m.:

7. Review and change control

Eval suite linked here: (URL or path)
Reliability contract is reviewed quarterly. Last review:
Permission audit cadence:
Change-control requirement for scope expansion: (e.g., new tool, new data class — what review is required, who approves)

Example 1 — Inbox triage agent

1. Identity

Agent name: inbox-triage-v3
Owner: A. Ramos (eng lead, support platform)
Backup owner: J. Patel (eng manager, support platform)
Promotion stage: GA
Last reviewed: 2026-04-30

2. Scope

Primary task: Read incoming support emails, classify by category, and route to the correct queue.
Inputs: Email subject, body (sanitized), and prior thread history for the same case.
Tools: classifier.predict, routing.assign, tags.apply — all read/write scoped to the support-mail tenant only.
Data it can read: Support mail tenant. Customer profile basics (account tier, region).
Data it can write: Tags and queue assignment on the support ticket. No customer-visible fields.
Acts on behalf of: The support-platform system account, never a named user.

3. Anti-scope

Forbidden actions: Replying to the customer, modifying billing or subscription state, escalating to legal, calling any external API.
Data it must not read: Billing tables, payment methods, employee data, any other tenant.
Tools it must not call: Anything outside the three listed.
Escalates to a human: Any case classified as “legal”, “abuse”, or confidence below 0.6.

4. Health metrics

Adoption: ≥ 95% of inbound support mail processed by the agent within 5 minutes.
Reliability: Malformed-output rate < 0.5% over a 7-day rolling window.
Quality: Golden-set pass rate ≥ 92%, judge-model claude-opus-4-7@2026-04 pinned.
Cost: ≤ $0.004 per triage decision, p95.
Latency: p50 < 1.5s, p99 < 5s, end to end.

5. Blast-radius caps

Action cap: 200 routing decisions per minute per region.
Financial cap: $40/hour in inference spend before throttle.
Data cap: Reads single ticket + 5 most recent in thread. Hard cap of 6 rows per task.
Concurrency cap: 32 simultaneous instances per region.
Time cap: 8 seconds wall-clock per task.
Tenant scope: Single tenant per request. Enforced by per-task scoped token tied to the inbound ticket’s tenant ID.

6. Degradation and kill switch

Degradation signals: Malformed-output rate > 2% over 15 minutes, OR golden-set score drops > 5 points, OR p99 latency > 10s for 10 consecutive minutes.
Fallback path: Tickets route to the “unclassified” queue and a human triage rotation picks them up. The agent’s classification step is bypassed entirely.
Kill switch:
- Stop inference: ≤ 30 seconds via feature flag agents.inbox_triage.enabled = false.
- Revoke tool credentials: ≤ 5 minutes via IAM policy update (auto-rotation of agent service account).
- Stop in-flight side effects: in-flight tasks finish within their 8s time cap; no further side effects are possible without the feature flag.
Who can pull: Any support-platform on-call engineer or the eng lead. Documented in the support-platform runbook.
Customer comms plan: None visible to customers — the fallback is internal-only. Support manager is paged if the kill switch is used.
Manual fallback at 2 a.m.: Run make agents.inbox_triage.disable, confirm the unclassified queue is being staffed, page the support manager.

7. Review

Eval suite: /evals/agents/inbox-triage/ — 220 cases including adversarial subject lines and prompt-injection attempts in email bodies.
Last review: 2026-04-30. Next: 2026-07-30.
Permission audit: Quarterly with security team.
Change control: Adding a new tool requires eng-lead + security review. Adding a new data class requires governance owner sign-off.

Example 2 — Code review assist agent

1. Identity

Agent name: pr-reviewer-v2
Owner: M. Iwasaki (staff engineer, developer productivity)
Backup owner: S. Chen (eng manager, developer productivity)
Promotion stage: beta (engineers opt in repo by repo)
Last reviewed: 2026-05-08

2. Scope

Primary task: Post a non-blocking comment on opted-in pull requests with suggestions on security, correctness, and style.
Inputs: PR diff (max 4,000 lines), PR description, repo’s AGENTS.md if present.
Tools: github.read_pr, github.post_comment — scoped to repos with the opt-in label.
Data it can read: Source code in opted-in repos only. No production data. No customer data.
Data it can write: A single PR comment, marked as posted by the agent.
Acts on behalf of: A dedicated GitHub bot account with read-only token plus comment scope on opted-in repos.

3. Anti-scope

Forbidden actions: Approving or requesting changes (only non-blocking comments), merging, closing PRs, modifying CI, posting in any repo without the opt-in label.
Data it must not read: Repos without the opt-in label, secrets, environment files, any private gist.
Tools it must not call: Anything that writes code, anything that touches CI, anything that touches deployment.
Escalates to a human: If the diff includes files under /security/ or /billing/, the agent skips the PR and the team is notified to review manually.

4. Health metrics

Adoption: ≥ 30% of opted-in PRs have an engineer thumbs-up on the agent’s comment per week.
Reliability: Malformed comment rate < 1%. Comments posted to wrong repo: 0 tolerated.
Quality: Eval-suite pass rate ≥ 85% on the curated benchmark (true positives on planted bugs, low false-positive rate on clean code).
Cost: ≤ $0.12 per PR comment, p95.
Latency: p50 < 30s after PR open, p99 < 3 minutes.

5. Blast-radius caps

Action cap: 1 comment per PR. Hard limit.
Financial cap: $20/hour in inference spend.
Data cap: Diff truncated at 4,000 lines. Repos > 50 MB are skipped entirely.
Concurrency cap: 8 simultaneous instances.
Time cap: 5 minutes per PR before automatic termination.
Tenant scope: Single GitHub organization, opted-in repos only. Enforced by repo label check at task start.

6. Degradation and kill switch

Degradation signals: False-positive rate > 25% on the past 50 PRs (manually scored weekly), OR any comment posted to a non-opt-in repo (zero tolerance).
Fallback path: The agent posts no comment. Engineers continue normal review.
Kill switch:
- Stop inference: ≤ 1 minute via feature flag agents.pr_reviewer.enabled = false.
- Revoke tool credentials: ≤ 5 minutes via GitHub App permissions update.
- Stop in-flight side effects: in-flight tasks finish within 5-minute time cap; pending comments are dropped.
Who can pull: Developer productivity team or platform security on-call.
Customer comms plan: Internal Slack #dev-prod notification.
Manual fallback at 2 a.m.: Run make agents.pr_reviewer.disable. Notify the team in #dev-prod.

7. Review

Eval suite: /evals/agents/pr-reviewer/ — 80 curated PRs (40 with planted issues, 40 clean) plus adversarial cases (prompt injection in PR descriptions, code with disguised secrets).
Last review: 2026-05-08. Next: 2026-08-08.
Permission audit: Monthly while in beta.
Change control: Expanding to a new repo requires opt-in label only. Adding a new tool requires staff-engineer + security review.

Example 3 — Internal documentation Q&A agent (RAG)

1. Identity

Agent name: docs-qa-v1
Owner: R. Okafor (eng lead, internal platforms)
Backup owner: L. Hoffmann (technical writer, internal platforms)
Promotion stage: internal-only
Last reviewed: 2026-05-01

2. Scope

Primary task: Answer employee questions about internal engineering docs in Slack, with citations.
Inputs: A Slack message in the #ask-docs channel.
Tools: docs.search, docs.fetch — read-only over the engineering docs corpus. slack.post_reply — scoped to the #ask-docs channel and ephemeral DMs.
Data it can read: The published engineering docs corpus only. No source code, no design docs marked confidential, no HR or finance docs.
Data it can write: Replies in #ask-docs or DMs initiated by the asker, citing source docs by URL.
Acts on behalf of: The internal-platforms bot user.

3. Anti-scope

Forbidden actions: Posting outside #ask-docs or DMs, summarizing closed-channel content, fabricating citations, answering with a doc URL it did not retrieve.
Data it must not read: Confidential docs, employee records, customer data, financial data, source code repos.
Tools it must not call: Anything outside the three listed.
Escalates to a human: Questions about people, compensation, legal, or anything where retrieval returns no document above the confidence threshold.

4. Health metrics

Adoption: ≥ 20 employee questions answered per week, ≥ 60% thumbs-up rate.
Reliability: Citation accuracy ≥ 98% (every cited URL exists and was retrieved this run, verified asynchronously).
Quality: Golden-set pass rate ≥ 88%. Hallucination rate (claims not grounded in retrieved doc) < 2%.
Cost: ≤ $0.03 per answer, p95.
Latency: p50 < 6s, p99 < 20s.

5. Blast-radius caps

Action cap: 1 reply per question. Hard limit.
Financial cap: $5/hour in inference spend.
Data cap: Retrieves max 10 documents per question. No document larger than 100 KB is loaded into context.
Concurrency cap: 16 simultaneous instances.
Time cap: 30 seconds per question.
Tenant scope: Internal-only. The bot is not callable from external workspaces. Enforced by Slack workspace allow-list.

6. Degradation and kill switch

Degradation signals: Hallucination rate > 5% on sampled responses over 24 hours, OR citation accuracy < 95% over a rolling 50 answers, OR any post outside the allowed channels.
Fallback path: The bot responds with “I don’t have a confident answer — try asking in #help-engineering” and posts no further content.
Kill switch:
- Stop inference: ≤ 30 seconds via feature flag.
- Revoke tool credentials: ≤ 2 minutes via Slack app token rotation.
- Stop in-flight side effects: in-flight tasks finish within 30s time cap.
Who can pull: Internal-platforms on-call, or any member of the technical writing team.
Customer comms plan: Slack notification in #internal-platforms.
Manual fallback at 2 a.m.: Disable via feature flag. The channel reverts to human Q&A. No customer impact.

7. Review

Eval suite: /evals/agents/docs-qa/ — 150 cases including hallucination probes, ambiguous questions, questions outside scope (must refuse), and prompt-injection attempts in retrieved docs.
Last review: 2026-05-01. Next: 2026-08-01.
Permission audit: Quarterly.
Change control: Expanding the docs corpus requires technical writing team approval. Expanding to a new channel requires eng-lead approval.

Common mistakes to avoid

Writing one contract that covers two agents. Split them.
Listing tools without their permission scope.
“Kill switch” as a single thing. There are three operations and three latencies.
Blast-radius caps without numbers.
A fallback path that is “the model will retry.” That is not a fallback. That is a retry loop.
Forgetting that retrieved documents are an attack surface. Adversarial content in a vendor PDF is still adversarial.
Reviewing the contract once and never again. The contract is a living artifact. It drifts with the system.

Companion to the manifesto Build the System the Model Cannot Break . See also: Rollback document template · Eval Suite starter kit .