Structured Output from LLMs: A Go Implementation Guide

Quick take

Structured output is a contract-enforcement problem, not a prompting problem. Define a schema, constrain the prompt, validate every response, and build a repair loop for when the model drifts. I do this in Go with about 300 lines of reusable code. Here is all of it.

I have a rule for any LLM feature that feeds a downstream system: if you can’t json.Unmarshal the response into a typed struct, it isn’t done.

That sounds obvious. In practice, it isn’t. I still see production systems parsing LLM output with string splitting and regex. They work until they don’t, and when they break, they fail in ways that are hard to diagnose because the failure is subtle data corruption, not a crash.

Structured output from LLMs is a solved problem if you treat it as contract enforcement. Define what you expect. Tell the model exactly what you expect. Validate what you get. Repair what breaks. Here is how I do it in Go.

The failure modes are predictable

LLMs generate text. They don’t generate data structures. Even with strong prompting, they will occasionally:

Wrap the JSON in markdown code fences or explanatory prose
Omit fields they consider “obvious” or irrelevant
Use wrong types (string "null" instead of JSON null, number as string)
Rename fields to something they think is more descriptive
Produce partial output when hitting token limits

Every pattern in this post targets one of these failures. They aren’t edge cases. They’re the normal operating reality of structured LLM output.

Define the contract as Go types

Start with the output structure. This isn’t just documentation – it’s both the validation target and the deserialization target. One definition serves both purposes.

type ContactInfo struct {
	Name    string  `json:"name"    validate:"required,min=1"`
	Email   *string `json:"email"   validate:"omitempty,email"`
	Company *string `json:"company"`
	Role    *string `json:"role"`
}

Nullable fields use pointers. Required fields use value types. The validate tags drive runtime validation. This struct is the single source of truth: the prompt references it, the validator enforces it, and the calling code consumes it.

I also generate a JSON Schema from the struct for inclusion in prompts. This keeps the prompt and validation in sync automatically:

func SchemaFor[T any]() ([]byte, error) {
	reflector := jsonschema.Reflector{
		RequiredFromJSONSchemaTags: true,
		DoNotReference:             true,
	}
	schema := reflector.Reflect(new(T))
	return json.MarshalIndent(schema, "", "  ")
}

One definition. One schema. No drift between what you ask for and what you validate.

Build the prompt to minimize ambiguity

The prompt should be rigid and specific. No motivational language. No “please try your best.” Just the schema, the rules, and the input.

func BuildExtractionPrompt(schema []byte, input string) string {
	return fmt.Sprintf(`Extract structured data from the input. Return ONLY valid JSON matching this schema:

%s

Rules:
- Use null for missing fields, not empty strings
- Lowercase email addresses
- No additional keys beyond the schema
- No markdown, no explanation, just the JSON object

Input:
%s

JSON:`, string(schema), input)
}

The JSON: at the end is a small trick that helps. It primes the model to start generating JSON immediately instead of opening with “Here is the extracted data:” or similar preamble.

The extraction pipeline

This is the core of the system: call the model, clean the response, parse it, validate it, and retry on failure.

type Extractor[T any] struct {
	client     LLMClient
	validator  *validator.Validate
	schema     []byte
	maxRetries int
}

func NewExtractor[T any](client LLMClient, maxRetries int) (*Extractor[T], error) {
	schema, err := SchemaFor[T]()
	if err != nil {
		return nil, fmt.Errorf("generating schema: %w", err)
	}

	return &Extractor[T]{
		client:     client,
		validator:  validator.New(),
		schema:     schema,
		maxRetries: maxRetries,
	}, nil
}

func (e *Extractor[T]) Extract(ctx context.Context, input string) (*T, error) {
	prompt := BuildExtractionPrompt(e.schema, input)
	var lastErr error

	for attempt := range e.maxRetries {
		raw, err := e.client.Generate(ctx, prompt)
		if err != nil {
			return nil, fmt.Errorf("llm call failed: %w", err)
		}

		cleaned := cleanJSONResponse(raw)

		var result T
		if err := json.Unmarshal([]byte(cleaned), &result); err != nil {
			lastErr = fmt.Errorf("attempt %d: json parse error: %w", attempt+1, err)
			prompt = buildRepairPrompt(prompt, raw, err.Error())
			continue
		}

		if err := e.validator.Struct(result); err != nil {
			lastErr = fmt.Errorf("attempt %d: validation error: %w", attempt+1, err)
			prompt = buildRepairPrompt(prompt, raw, err.Error())
			continue
		}

		return &result, nil
	}

	return nil, fmt.Errorf("extraction failed after %d attempts: %w", e.maxRetries, lastErr)
}

A few things to notice. The generic type parameter means this extractor works for any output struct: ContactInfo, InvoiceData, whatever. The cleaning step handles the most common format issues before parsing. And on failure, the repair prompt feeds the error back to the model so it can fix the specific problem.

Cleaning the response

Models love to wrap JSON in markdown code fences or add explanatory text. This function strips that away:

func cleanJSONResponse(raw string) string {
	s := strings.TrimSpace(raw)

	// Strip markdown code fences
	if strings.HasPrefix(s, "```") {
		lines := strings.Split(s, "\n")
		// Remove first line (```json) and last line (```)
		start := 1
		end := len(lines) - 1
		if end > start && strings.TrimSpace(lines[end-1]) == "```" {
			end = end - 1
		}
		s = strings.Join(lines[start:end], "\n")
	}

	// Find the first { and last } to extract the JSON object
	firstBrace := strings.Index(s, "{")
	lastBrace := strings.LastIndex(s, "}")
	if firstBrace >= 0 && lastBrace > firstBrace {
		s = s[firstBrace : lastBrace+1]
	}

	return strings.TrimSpace(s)
}

This isn’t pretty. It doesn’t need to be. It handles the three wrapping patterns I most often see in production: code fences, leading prose, and trailing explanation.

The repair prompt

When parsing or validation fails, the repair prompt tells the model exactly what went wrong:

func buildRepairPrompt(originalPrompt, badOutput, errorMsg string) string {
	return fmt.Sprintf(`%s

Your previous output was invalid:
%s

Error: %s

Fix the error and return ONLY valid JSON.

JSON:`, originalPrompt, badOutput, errorMsg)
}

This is where the retry loop earns its keep. The model gets the original instructions, sees its own bad output, and gets a specific error message to fix.

From what I’ve seen, this recovers about 80% of validation failures on the first retry. The remaining 20% usually indicate a genuinely ambiguous input that needs human review.

Use JSON mode when available

Most model APIs now offer a JSON-only response mode. Use it. It eliminates prose wrapping entirely and significantly reduces parsing failures.

func (e *Extractor[T]) Extract(ctx context.Context, input string) (*T, error) {
	prompt := BuildExtractionPrompt(e.schema, input)
	opts := GenerateOptions{
		ResponseFormat: ResponseFormatJSON, // Use JSON mode
	}

	// ... rest of the extraction logic
}

But – and I can’t stress this enough – JSON mode doesn’t mean you skip validation. The model can still omit required fields, use wrong types, or produce a valid JSON object that doesn’t match your schema. JSON mode guarantees parseable JSON. It doesn’t guarantee correct JSON for your use case.

Monitoring structured output in production

Three metrics I track for every structured-output pipeline:

Parse success rate. What percentage of responses parse and validate on the first attempt? If this drops below 95%, something changed: the model updated, the prompt drifted, or the input distribution shifted.
Retry rate and recovery rate. How often do you need retries, and how often do retries succeed? A high retry rate with good recovery means the repair loop is working. A high retry rate with low recovery means something is fundamentally wrong.
Field-level error distribution. Which fields cause the most validation failures? This tells you where the prompt needs to be more explicit or where the schema needs adjustment.

I log every extraction attempt: success or failure, first try or retry, with the raw model output. When something goes wrong in production, I want to see exactly what the model returned, not just that it failed.

The pattern, summarized

Every structured-output pipeline I build follows the same sequence:

Define the contract as a Go struct with validation tags.
Generate the JSON Schema from that struct.
Build a rigid prompt that includes the schema and leaves no room for interpretation.
Clean the raw response to handle common wrapping patterns.
Parse and validate against the struct.
On failure, retry with a repair prompt that includes the specific error.
Monitor parse rates, retry rates, and field-level errors.

This isn’t clever. It isn’t novel. It’s disciplined application of the same contract-enforcement thinking we use everywhere else in software engineering. The model is an unreliable data source. Treat it like one.