Function Calling Patterns That Survive Production

Quick take

Function calling works in production when you treat it like boring infrastructure: strict schemas, validation at every boundary, explicit permissions, and structured errors. The model isn’t trusted code. It’s an external caller that happens to speak JSON. Build accordingly.

Function calling turned LLMs from text generators into system operators. That’s the opportunity and the risk. A model that can create tickets, query databases, and trigger deployments is powerful. A model that does those things with unvalidated arguments and no permission checks is a security incident waiting to happen.

I’ve built function calling integrations in past projects – mostly in Go – and the patterns that survive production are boring. That’s the point. Here’s what I’ve learned.

The mental model

Think of function calling as an API gateway where the caller is an LLM instead of a user. The model sees a list of available tools with schemas, picks one, and returns arguments as JSON. Your backend validates, executes, and returns results. The model then uses the results to continue the conversation.

User prompt + tool definitions
        |
        v
  Model selects tool + arguments (JSON)
        |
        v
  Backend validates arguments
        |
        v
  Backend executes tool (with permissions)
        |
        v
  Structured result returned to model
        |
        v
  Model generates final response

Simple in theory. In practice, the complexity is in validation, permissions, and error handling. That’s where most teams cut corners, and where most production incidents start.

Tool definitions: treat them like API contracts

A tool definition is a contract. The model’s behavior is only as good as the schema you provide. Vague descriptions produce vague arguments. Loose types produce invalid inputs.

In Go, I define tools as structs with explicit JSON Schema generation:

// ToolDef represents a callable tool exposed to the LLM.
type ToolDef struct {
    Name        string      `json:"name"`
    Description string      `json:"description"`
    Parameters  JSONSchema  `json:"parameters"`
    Handler     ToolHandler `json:"-"`
    Permission  Permission  `json:"-"`
}

type JSONSchema struct {
    Type       string                `json:"type"`
    Properties map[string]Property   `json:"properties"`
    Required   []string              `json:"required"`
}

type Property struct {
    Type        string   `json:"type"`
    Description string   `json:"description,omitempty"`
    Enum        []string `json:"enum,omitempty"`
    Default     string   `json:"default,omitempty"`
}

type ToolHandler func(ctx context.Context, args json.RawMessage) (*ToolResult, error)

A concrete example – a ticket creation tool:

var createTicketTool = ToolDef{
    Name:        "create_ticket",
    Description: "Create a support ticket. Requires a verified user session.",
    Parameters: JSONSchema{
        Type: "object",
        Properties: map[string]Property{
            "subject":  {Type: "string", Description: "Short summary of the issue"},
            "category": {Type: "string", Enum: []string{"billing", "bug", "account", "other"}},
            "priority": {Type: "string", Enum: []string{"low", "normal", "high"}, Default: "normal"},
        },
        Required: []string{"subject", "category"},
    },
    Handler:    handleCreateTicket,
    Permission: PermWriteApproval,
}

Notice the pattern: enums on every field with a bounded set of values, a clear description that tells the model when to use the tool, and required fields marked explicitly. The model doesn’t guess. It follows the contract.

The tool registry

Centralize tool registration. Don’t scatter tool definitions across your codebase. A single registry makes it easy to generate schemas for the model, enforce permissions, and audit what’s available.

type Registry struct {
    mu    sync.RWMutex
    tools map[string]ToolDef
}

func NewRegistry() *Registry {
    return &Registry{tools: make(map[string]ToolDef)}
}

func (r *Registry) Register(tool ToolDef) {
    r.mu.Lock()
    defer r.mu.Unlock()
    r.tools[tool.Name] = tool
}

func (r *Registry) Schema() []map[string]any {
    r.mu.RLock()
    defer r.mu.RUnlock()

    out := make([]map[string]any, 0, len(r.tools))
    for _, t := range r.tools {
        out = append(out, map[string]any{
            "type": "function",
            "function": map[string]any{
                "name":        t.Name,
                "description": t.Description,
                "parameters":  t.Parameters,
            },
        })
    }
    return out
}

func (r *Registry) Execute(ctx context.Context, name string, args json.RawMessage) (*ToolResult, error) {
    r.mu.RLock()
    tool, ok := r.tools[name]
    r.mu.RUnlock()

    if !ok {
        return &ToolResult{
            Success:   false,
            ErrorCode: "unknown_tool",
            Message:   fmt.Sprintf("tool %q not found", name),
        }, nil
    }

    return tool.Handler(ctx, args)
}

The Execute method is intentionally minimal. Validation and permission checks happen in the layers around it, not inside the registry itself. Separation of concerns matters here because you’ll want to add middleware later without rewriting the registry.

Validation: the model isn’t trusted

This is the hill I’ll die on: model-generated arguments are untrusted input. Always. Even with a tight schema, the model can produce unexpected values – empty strings, null where you expect a value, or fields that technically match the type but are nonsensical.

type CreateTicketArgs struct {
    Subject  string `json:"subject"`
    Category string `json:"category"`
    Priority string `json:"priority"`
}

func validateCreateTicketArgs(raw json.RawMessage) (*CreateTicketArgs, error) {
    var args CreateTicketArgs
    if err := json.Unmarshal(raw, &args); err != nil {
        return nil, fmt.Errorf("invalid JSON: %w", err)
    }

    args.Subject = strings.TrimSpace(args.Subject)
    if args.Subject == "" {
        return nil, fmt.Errorf("subject must be non-empty")
    }
    if len(args.Subject) > 200 {
        return nil, fmt.Errorf("subject exceeds 200 characters")
    }

    validCategories := map[string]bool{"billing": true, "bug": true, "account": true, "other": true}
    if !validCategories[args.Category] {
        return nil, fmt.Errorf("invalid category: %q", args.Category)
    }

    if args.Priority == "" {
        args.Priority = "normal"
    }
    validPriorities := map[string]bool{"low": true, "normal": true, "high": true}
    if !validPriorities[args.Priority] {
        return nil, fmt.Errorf("invalid priority: %q", args.Priority)
    }

    return &args, nil
}

Yes, this is verbose. That’s deliberate. I don’t want clever one-liners here. I want code that a new team member can read at 3 AM during an incident and immediately understand what it checks and why.

Structured errors that the model can recover from

When validation fails, return a structured error the model can act on. Not a stack trace. Not a generic “bad request.” A clear envelope:

type ToolResult struct {
    Success   bool   `json:"success"`
    ErrorCode string `json:"error_code,omitempty"`
    Message   string `json:"message,omitempty"`
    Data      any    `json:"data,omitempty"`
}

The model sees this and can retry with corrected arguments, ask the user for clarification, or explain the failure. Unstructured errors produce unstructured recovery attempts. I’ve seen models apologize to users for “server errors” when the actual problem was a missing required field.

Permission scoping

Every tool gets a permission level. Every request carries user context. The execution layer checks permissions before calling the handler. No exceptions.

type Permission int

const (
    PermReadOnly Permission = iota
    PermWriteApproval
    PermAdminOnly
)

type ExecContext struct {
    UserID    string
    Role      string
    SessionID string
}

func (r *Registry) ExecuteWithAuth(ctx context.Context, ec ExecContext, name string, args json.RawMessage) (*ToolResult, error) {
    r.mu.RLock()
    tool, ok := r.tools[name]
    r.mu.RUnlock()

    if !ok {
        return &ToolResult{Success: false, ErrorCode: "unknown_tool"}, nil
    }

    if !hasPermission(ec.Role, tool.Permission) {
        return &ToolResult{
            Success:   false,
            ErrorCode: "permission_denied",
            Message:   fmt.Sprintf("role %q cannot execute %q", ec.Role, name),
        }, nil
    }

    return tool.Handler(ctx, args)
}

func hasPermission(role string, required Permission) bool {
    switch required {
    case PermReadOnly:
        return true
    case PermWriteApproval:
        return role == "user" || role == "admin"
    case PermAdminOnly:
        return role == "admin"
    default:
        return false
    }
}

The model doesn’t decide permissions. The backend does. This isn’t negotiable. I’ve seen demos where the model is told “you have admin access” in the system prompt. That isn’t a permission system. That’s a suggestion.

Parallel execution with guardrails

Some models support parallel tool calls. This can cut latency significantly when tools are independent, but you still need timeouts and isolation.

func executeParallel(ctx context.Context, registry *Registry, ec ExecContext, calls []ToolCall) []*ToolResult {
    ctx, cancel := context.WithTimeout(ctx, 8*time.Second)
    defer cancel()

    results := make([]*ToolResult, len(calls))
    var wg sync.WaitGroup

    for i, call := range calls {
        wg.Add(1)
        go func(idx int, c ToolCall) {
            defer wg.Done()
            result, err := registry.ExecuteWithAuth(ctx, ec, c.Name, c.Arguments)
            if err != nil {
                results[idx] = &ToolResult{Success: false, ErrorCode: "execution_error", Message: err.Error()}
                return
            }
            results[idx] = result
        }(i, call)
    }

    wg.Wait()
    return results
}

The timeout is critical. A slow tool shouldn’t block the entire response. Return partial results and let the model work with what it has.

Observability

Log every tool call. But be smart about what you log:

Tool name and version
User ID and session ID
Argument hash (not raw arguments – those may contain PII)
Success/failure and error code
Execution latency

This gives you enough to debug failures, detect drift (is the model suddenly calling a tool it never used before?), and identify tools that are slow, failing, or overused.

What I wish I had known earlier

After building several of these systems, a few lessons stand out:

Keep tool descriptions short and precise. The model reads them on every request. Long descriptions waste tokens and confuse tool selection. One sentence describing the action, one sentence about when to use it.

Version your tool schemas. When you change a tool’s parameters, the model’s behavior will change too. Treat schema changes like API migrations.

Test with adversarial inputs. Ask the model to call tools with garbage arguments, impossible combinations, and injection attempts. Your validation layer should handle all of these cleanly.

Function calling is the interface between language models and real systems. It works when you treat it like infrastructure: boring, reliable, and well-instrumented. The clever part is the model. Your job is to make the execution layer as predictable as possible.