Agent Patterns That Survive Production

Quick take

Agents need structure, not longer prompts. Plan-execute-replan, specialist orchestration, compact memory management, and explicit recovery paths are the patterns that hold up. This post walks through each one with Go implementations.

I’ve been building and reviewing agent systems most of this year. The pattern is always the same: someone builds a single-prompt agent, it works beautifully on the happy path, and then it meets a real task and falls apart.

The fix is never “make the prompt better.” It’s always “add structure around the model.” Here are the patterns that actually survive production, with Go code you can adapt.

When Simple Agents Break

Simple agents – one prompt, one model call, maybe a tool – fail predictably once tasks get real:

More steps than fit in one context window
Tool calls that return errors or ambiguous results
Multiple valid paths with unknown payoff
Dependencies between sub-tasks that require ordering

If your task has any of these properties, you need patterns. Not hope.

Plan, Execute, Replan

The most useful pattern is also the simplest. Break the task into a plan, execute steps sequentially, and replan when reality diverges from the plan.

The plan is a draft, not a contract.

// Plan represents a sequence of steps the agent intends to execute.
// Steps can be updated mid-execution when results diverge.
type Plan struct {
	Goal      string
	Steps     []Step
	Completed []StepResult
}

type Step struct {
	ID          string
	Description string
	ToolName    string
	Input       map[string]any
}

type StepResult struct {
	StepID  string
	Output  any
	Err     error
	Blocked bool
}

// Execute runs through the plan, replanning when a step is blocked
// or produces unexpected results.
func (a *Agent) Execute(ctx context.Context, p *Plan) (*Plan, error) {
	for len(p.Steps) > 0 {
		step := p.Steps[0]
		p.Steps = p.Steps[1:]

		result := a.runStep(ctx, step)
		p.Completed = append(p.Completed, result)

		if result.Blocked || result.Err != nil {
			revised, err := a.replan(ctx, p)
			if err != nil {
				return p, fmt.Errorf("replan failed: %w", err)
			}
			p = revised
		}
	}
	return p, nil
}

// replan asks the model to revise remaining steps given what has
// happened so far. The completed results provide context.
func (a *Agent) replan(ctx context.Context, p *Plan) (*Plan, error) {
	prompt := fmt.Sprintf(
		"Goal: %s\nCompleted: %s\nRevise the remaining steps.",
		p.Goal, formatResults(p.Completed),
	)
	resp, err := a.llm.Complete(ctx, prompt)
	if err != nil {
		return p, err
	}
	p.Steps = parseSteps(resp)
	return p, nil
}

The key design choice is to replan on failure, not on every step. Replanning is expensive – it costs a model call and risks plan instability. Only trigger it when the current plan is provably broken.

I’ve seen teams replan after every step “for safety.” The result is an agent that never commits to anything and burns tokens oscillating between plans. Pick a plan, execute, and adjust on failure, not anxiety.

Orchestrator-Specialist Pattern

When tasks naturally split into parallel or specialized work, a single agent doing everything is the wrong abstraction. Use an orchestrator that breaks the task down and dispatches to specialists.

// Orchestrator decomposes a task and dispatches sub-tasks to
// specialist agents. It synthesizes their results.
type Orchestrator struct {
	planner     LLM
	specialists map[string]*Specialist
}

type Specialist struct {
	Name    string
	Agent   *Agent
	Domain  string // e.g. "research", "code-generation", "validation"
}

type SubTask struct {
	ID          string
	Description string
	Specialist  string
	Input       map[string]any
	DependsOn   []string
}

// Run decomposes the task, executes sub-tasks respecting dependencies,
// and synthesizes results.
func (o *Orchestrator) Run(ctx context.Context, task string) (string, error) {
	subtasks, err := o.decompose(ctx, task)
	if err != nil {
		return "", fmt.Errorf("decompose: %w", err)
	}

	results := make(map[string]string)

	for _, batch := range topologicalBatches(subtasks) {
		g, gCtx := errgroup.WithContext(ctx)

		for _, st := range batch {
			st := st
			spec, ok := o.specialists[st.Specialist]
			if !ok {
				return "", fmt.Errorf("unknown specialist: %s", st.Specialist)
			}

			g.Go(func() error {
				// Inject dependency results into the sub-task input.
				for _, dep := range st.DependsOn {
					st.Input[dep] = results[dep]
				}
				res, err := spec.Agent.RunTask(gCtx, st.Description, st.Input)
				if err != nil {
					return fmt.Errorf("specialist %s: %w", spec.Name, err)
				}
				results[st.ID] = res
				return nil
			})
		}

		if err := g.Wait(); err != nil {
			return "", err
		}
	}

	return o.synthesize(ctx, task, results)
}

The topological batching is important. Sub-tasks without dependencies run in parallel. Sub-tasks that depend on earlier results wait. This gives you concurrency where it’s safe and ordering where it’s required.

Go’s errgroup is perfect for this. I’ve tried this pattern in Python with asyncio, and the error handling is significantly worse. Go’s explicit error returns make failure paths clear.

Structured Working Memory

Context windows are finite and expensive. You can’t dump every intermediate result into the prompt and hope for the best. Working memory needs structure.

// Memory manages the agent's working context with size limits
// and periodic compression.
type Memory struct {
	mu       sync.Mutex
	facts    []Fact
	maxFacts int
	llm      LLM
}

type Fact struct {
	Key       string
	Value     string
	Source    string // which step produced this
	Priority  int    // higher = keep longer
	CreatedAt time.Time
}

// Add inserts a fact, compressing if the memory is full.
func (m *Memory) Add(ctx context.Context, f Fact) error {
	m.mu.Lock()
	defer m.mu.Unlock()

	m.facts = append(m.facts, f)

	if len(m.facts) > m.maxFacts {
		return m.compress(ctx)
	}
	return nil
}

// compress asks the model to summarize low-priority facts into
// fewer entries, keeping high-priority facts intact.
func (m *Memory) compress(ctx context.Context) error {
	sort.Slice(m.facts, func(i, j int) bool {
		return m.facts[i].Priority > m.facts[j].Priority
	})

	// Keep top half as-is, compress bottom half.
	keep := m.facts[:m.maxFacts/2]
	toCompress := m.facts[m.maxFacts/2:]

	summary, err := m.llm.Complete(ctx, fmt.Sprintf(
		"Summarize these facts into 2-3 key points:\n%s",
		formatFacts(toCompress),
	))
	if err != nil {
		// On failure, just drop the lowest priority facts.
		m.facts = keep
		return nil
	}

	m.facts = append(keep, Fact{
		Key:       "compressed_context",
		Value:     summary,
		Priority:  1,
		CreatedAt: time.Now(),
	})
	return nil
}

// ForPrompt renders the current memory as a string for inclusion
// in a prompt.
func (m *Memory) ForPrompt() string {
	m.mu.Lock()
	defer m.mu.Unlock()
	return formatFacts(m.facts)
}

The compression strategy matters. High-priority facts (decisions, constraints, key results) stay intact. Low-priority facts (intermediate outputs, exploration notes) get summarized. If compression fails, drop the least important items rather than crashing.

I keep raw tool outputs entirely outside the prompt. They go into a side store the agent can query if needed. Only extracted facts enter working memory.

Explicit Recovery

This is the pattern most teams skip, and it’s the one that matters most in production. Agents will encounter tool failures, stale plans, missing inputs, and model refusals. Without explicit recovery, those become silent failures or infinite loops.

// RecoveryStrategy defines how the agent handles a specific failure type.
type RecoveryStrategy struct {
	Name       string
	MaxRetries int
	Backoff    time.Duration
	Handler    func(ctx context.Context, err error) (Action, error)
}

type Action int

const (
	Retry        Action = iota
	Decompose           // break the failed step into smaller steps
	Skip                // mark step as skipped, continue
	Escalate            // pause for human input
	Abort               // stop the agent
)

// Recover selects and applies the appropriate recovery strategy.
func (a *Agent) Recover(ctx context.Context, step Step, err error) (Action, error) {
	strategy := a.selectStrategy(err)

	for attempt := 0; attempt < strategy.MaxRetries; attempt++ {
		action, retryErr := strategy.Handler(ctx, err)
		if retryErr == nil {
			return action, nil
		}
		time.Sleep(strategy.Backoff * time.Duration(attempt+1))
	}

	// All retries exhausted. Escalate.
	return Escalate, fmt.Errorf(
		"recovery exhausted for step %s after %d attempts: %w",
		step.ID, strategy.MaxRetries, err,
	)
}

The key insight: recovery actions are an enum, not free-form decisions. The agent picks from a fixed set of responses. Retry, decompose, skip, escalate, or abort. No improvisation. This keeps the failure paths testable and predictable.

The escalation path – pausing for human input – isn’t a failure. It’s a feature. An agent that knows when to ask for help is more reliable than one that guesses and gets it wrong.

Putting It Together

A production agent combines these patterns in layers:

Plan-execute-replan as the outer loop
Orchestrator-specialist for sub-task parallelism
Structured memory to manage context within budget
Explicit recovery at every step boundary

Each layer is independently testable. You can unit test recovery strategies, benchmark memory compression, and integration test the orchestrator without running the full agent.

Start with plan-execute-replan and explicit recovery. Those two patterns alone will take you from “works on demos” to “works on real tasks.” Add orchestration and structured memory when your tasks demand it.

The agents that survive production aren’t clever. They’re disciplined.