Quick take
Agents need structure, not longer prompts. Plan-execute-replan, specialist orchestration, compact memory management, and explicit recovery paths are the patterns that hold up. This post walks through each one with Go implementations.
I’ve been building and reviewing agent systems most of this year. The pattern is always the same: someone builds a single-prompt agent, it works beautifully on the happy path, and then it meets a real task and falls apart.
The fix is never “make the prompt better.” It’s always “add structure around the model.” Here are the patterns that actually survive production, with Go code you can adapt.
When Simple Agents Break
Simple agents – one prompt, one model call, maybe a tool – fail predictably once tasks get real:
- More steps than fit in one context window
- Tool calls that return errors or ambiguous results
- Multiple valid paths with unknown payoff
- Dependencies between sub-tasks that require ordering
If your task has any of these properties, you need patterns. Not hope.
Plan, Execute, Replan
The most useful pattern is also the simplest. Break the task into a plan, execute steps sequentially, and replan when reality diverges from the plan.
The plan is a draft, not a contract.
// Plan represents a sequence of steps the agent intends to execute.
// Steps can be updated mid-execution when results diverge.
type Plan struct {
Goal string
Steps []Step
Completed []StepResult
}
type Step struct {
ID string
Description string
ToolName string
Input map[string]any
}
type StepResult struct {
StepID string
Output any
Err error
Blocked bool
}
// Execute runs through the plan, replanning when a step is blocked
// or produces unexpected results.
func (a *Agent) Execute(ctx context.Context, p *Plan) (*Plan, error) {
for len(p.Steps) > 0 {
step := p.Steps[0]
p.Steps = p.Steps[1:]
result := a.runStep(ctx, step)
p.Completed = append(p.Completed, result)
if result.Blocked || result.Err != nil {
revised, err := a.replan(ctx, p)
if err != nil {
return p, fmt.Errorf("replan failed: %w", err)
}
p = revised
}
}
return p, nil
}
// replan asks the model to revise remaining steps given what has
// happened so far. The completed results provide context.
func (a *Agent) replan(ctx context.Context, p *Plan) (*Plan, error) {
prompt := fmt.Sprintf(
"Goal: %s\nCompleted: %s\nRevise the remaining steps.",
p.Goal, formatResults(p.Completed),
)
resp, err := a.llm.Complete(ctx, prompt)
if err != nil {
return p, err
}
p.Steps = parseSteps(resp)
return p, nil
}
The key design choice is to replan on failure, not on every step. Replanning is expensive – it costs a model call and risks plan instability. Only trigger it when the current plan is provably broken.
I’ve seen teams replan after every step “for safety.” The result is an agent that never commits to anything and burns tokens oscillating between plans. Pick a plan, execute, and adjust on failure, not anxiety.
Orchestrator-Specialist Pattern
When tasks naturally split into parallel or specialized work, a single agent doing everything is the wrong abstraction. Use an orchestrator that breaks the task down and dispatches to specialists.
// Orchestrator decomposes a task and dispatches sub-tasks to
// specialist agents. It synthesizes their results.
type Orchestrator struct {
planner LLM
specialists map[string]*Specialist
}
type Specialist struct {
Name string
Agent *Agent
Domain string // e.g. "research", "code-generation", "validation"
}
type SubTask struct {
ID string
Description string
Specialist string
Input map[string]any
DependsOn []string
}
// Run decomposes the task, executes sub-tasks respecting dependencies,
// and synthesizes results.
func (o *Orchestrator) Run(ctx context.Context, task string) (string, error) {
subtasks, err := o.decompose(ctx, task)
if err != nil {
return "", fmt.Errorf("decompose: %w", err)
}
results := make(map[string]string)
for _, batch := range topologicalBatches(subtasks) {
g, gCtx := errgroup.WithContext(ctx)
for _, st := range batch {
st := st
spec, ok := o.specialists[st.Specialist]
if !ok {
return "", fmt.Errorf("unknown specialist: %s", st.Specialist)
}
g.Go(func() error {
// Inject dependency results into the sub-task input.
for _, dep := range st.DependsOn {
st.Input[dep] = results[dep]
}
res, err := spec.Agent.RunTask(gCtx, st.Description, st.Input)
if err != nil {
return fmt.Errorf("specialist %s: %w", spec.Name, err)
}
results[st.ID] = res
return nil
})
}
if err := g.Wait(); err != nil {
return "", err
}
}
return o.synthesize(ctx, task, results)
}
The topological batching is important. Sub-tasks without dependencies run in parallel. Sub-tasks that depend on earlier results wait. This gives you concurrency where it’s safe and ordering where it’s required.
Go’s errgroup is perfect for this. I’ve tried this pattern in Python with asyncio, and the error handling is significantly worse. Go’s explicit error returns make failure paths clear.
Structured Working Memory
Context windows are finite and expensive. You can’t dump every intermediate result into the prompt and hope for the best. Working memory needs structure.
// Memory manages the agent's working context with size limits
// and periodic compression.
type Memory struct {
mu sync.Mutex
facts []Fact
maxFacts int
llm LLM
}
type Fact struct {
Key string
Value string
Source string // which step produced this
Priority int // higher = keep longer
CreatedAt time.Time
}
// Add inserts a fact, compressing if the memory is full.
func (m *Memory) Add(ctx context.Context, f Fact) error {
m.mu.Lock()
defer m.mu.Unlock()
m.facts = append(m.facts, f)
if len(m.facts) > m.maxFacts {
return m.compress(ctx)
}
return nil
}
// compress asks the model to summarize low-priority facts into
// fewer entries, keeping high-priority facts intact.
func (m *Memory) compress(ctx context.Context) error {
sort.Slice(m.facts, func(i, j int) bool {
return m.facts[i].Priority > m.facts[j].Priority
})
// Keep top half as-is, compress bottom half.
keep := m.facts[:m.maxFacts/2]
toCompress := m.facts[m.maxFacts/2:]
summary, err := m.llm.Complete(ctx, fmt.Sprintf(
"Summarize these facts into 2-3 key points:\n%s",
formatFacts(toCompress),
))
if err != nil {
// On failure, just drop the lowest priority facts.
m.facts = keep
return nil
}
m.facts = append(keep, Fact{
Key: "compressed_context",
Value: summary,
Priority: 1,
CreatedAt: time.Now(),
})
return nil
}
// ForPrompt renders the current memory as a string for inclusion
// in a prompt.
func (m *Memory) ForPrompt() string {
m.mu.Lock()
defer m.mu.Unlock()
return formatFacts(m.facts)
}
The compression strategy matters. High-priority facts (decisions, constraints, key results) stay intact. Low-priority facts (intermediate outputs, exploration notes) get summarized. If compression fails, drop the least important items rather than crashing.
I keep raw tool outputs entirely outside the prompt. They go into a side store the agent can query if needed. Only extracted facts enter working memory.
Explicit Recovery
This is the pattern most teams skip, and it’s the one that matters most in production. Agents will encounter tool failures, stale plans, missing inputs, and model refusals. Without explicit recovery, those become silent failures or infinite loops.
// RecoveryStrategy defines how the agent handles a specific failure type.
type RecoveryStrategy struct {
Name string
MaxRetries int
Backoff time.Duration
Handler func(ctx context.Context, err error) (Action, error)
}
type Action int
const (
Retry Action = iota
Decompose // break the failed step into smaller steps
Skip // mark step as skipped, continue
Escalate // pause for human input
Abort // stop the agent
)
// Recover selects and applies the appropriate recovery strategy.
func (a *Agent) Recover(ctx context.Context, step Step, err error) (Action, error) {
strategy := a.selectStrategy(err)
for attempt := 0; attempt < strategy.MaxRetries; attempt++ {
action, retryErr := strategy.Handler(ctx, err)
if retryErr == nil {
return action, nil
}
time.Sleep(strategy.Backoff * time.Duration(attempt+1))
}
// All retries exhausted. Escalate.
return Escalate, fmt.Errorf(
"recovery exhausted for step %s after %d attempts: %w",
step.ID, strategy.MaxRetries, err,
)
}
The key insight: recovery actions are an enum, not free-form decisions. The agent picks from a fixed set of responses. Retry, decompose, skip, escalate, or abort. No improvisation. This keeps the failure paths testable and predictable.
The escalation path – pausing for human input – isn’t a failure. It’s a feature. An agent that knows when to ask for help is more reliable than one that guesses and gets it wrong.
Putting It Together
A production agent combines these patterns in layers:
- Plan-execute-replan as the outer loop
- Orchestrator-specialist for sub-task parallelism
- Structured memory to manage context within budget
- Explicit recovery at every step boundary
Each layer is independently testable. You can unit test recovery strategies, benchmark memory compression, and integration test the orchestrator without running the full agent.
Start with plan-execute-replan and explicit recovery. Those two patterns alone will take you from “works on demos” to “works on real tasks.” Add orchestration and structured memory when your tasks demand it.
The agents that survive production aren’t clever. They’re disciplined.