// Topics / LLM

LLM

Definition

LLM coverage in this archive spans 35 posts from Jan 2023 to Apr 2026 and treats llm as a production discipline: evaluation loops, tool boundaries, escalation paths, and cost control. The strongest adjacent threads are ai, go, and architecture. Recurring title motifs include ai, production, llm, and stop.

Key claims

The archive repeatedly argues that llm only creates leverage when it is wired into an existing workflow.
Early posts lean on llm and patterns, while newer posts lean on models and production as constraints shifted.
This topic repeatedly intersects with ai, go, and architecture, so design choices here rarely stand alone.

Practical checklist

Define quality gates up front: eval sets, guardrails, and explicit rollback criteria.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read ai and go before committing implementation details.

Failure modes

Shipping agent behavior without hard boundaries for tools, data access, and approvals.
Optimizing for model novelty while ignoring reliability, latency, or cost drift.
Applying guidance from 2023 to 2026 without revisiting assumptions as context changed.

Suggested reading path

Start here (current state): The Best Model Is the Smallest One That Works
Then read (operating middle): Architecting AI-Native Applications (Without the Delusion)
Finish with (foundational context): LLM Integration Patterns That Actually Survive Production

References

34 entries tagged “LLM”

Running AI Locally: A Practical Guide for Teams Who Care About Control August 18, 2025 · 6 min Local AI is no longer a hobby project. How to set it up properly: provider abstraction, versioned models, eval harnesses, and a cloud fallback. local-ai development ollama

Stop Fine-Tuning Models You Haven't Bothered to Prompt Properly June 23, 2025 · 4 min Fine-tuning is the goto move for teams who skipped the basics. Most of the time, better prompts and proper retrieval solve the actual problem. fine-tuning llm ai

Reasoning Models in Production: A Practical Guide January 20, 2025 · 7 min Reasoning models are powerful but expensive and slow. Here's how I integrate them in Go services with routing, async patterns, and cost controls that actually work. reasoning o1 llm

Picking an AI Model for Production (Late 2024) November 25, 2024 · 5 min There's no best model. There's the model that fits your workload, latency budget, cost constraint, and ops tolerance. Here's how to compare them. ai models comparison

AI Cost Benchmarking: What Your Bill Actually Tells You October 14, 2024 · 4 min Price-per-token is the least useful number on your AI bill. Real cost benchmarking starts with your workload, not a provider's pricing page. ai cost benchmarking

RAG Retrieval That Actually Works September 30, 2024 · 7 min Most RAG failures are retrieval failures. Hybrid search, smarter chunking, query expansion, and reranking -- measured separately from generation. rag retrieval vector-search

How I Actually Test LLM Features August 19, 2024 · 6 min LLM outputs are non-deterministic. That doesn't mean you can't test them rigorously. Here's the layered testing approach I use in production. llm testing ai

The Best Model Is the Smallest One That Works August 5, 2024 · 3 min Everyone reaches for GPT-4 by default. Most production tasks don't need it. Small models are faster, cheaper, and often better when the task is well-defined. small-models llm ai

Stop Stuffing Your Context Window July 22, 2024 · 4 min Bigger context windows aren't an excuse to stop thinking about what goes into them. Most teams are paying for irrelevant tokens and wondering why quality degrades. context-window llm ai

Function Calling Patterns That Survive Production July 8, 2024 · 7 min Function calling is how LLMs touch real systems. Treat tools like APIs, arguments like untrusted input, and permissions like the model is an intern with root access. function-calling llm ai

Claude 3.5 Sonnet Analysis: Cost, Coding, and Model Routing June 24, 2024 · 5 min Claude 3.5 Sonnet changes model routing math for coding, cost, latency, and production AI workloads. claude anthropic ai

LLM Structured Output in Go: JSON Schema, Validation, Retries April 29, 2024 · 7 min How to get reliable JSON from LLMs in Go with schemas, validation, repair loops, and typed contracts. llm structured-output json

LLM Prompt Caching in Go: Cut Costs Without Breaking Things March 25, 2024 · 6 min Caching LLM responses is the highest-leverage optimization most teams skip. How I implement it in Go -- keys, invalidation, and safety patterns. llm caching go

Why I Run Multiple Models in Production March 18, 2024 · 4 min Betting on a single model provider is like having a single database with no failover. Here is why multi-model is the only sane production strategy. ai architecture llm

Claude 3 First Impressions: Three Models, One Decision Framework March 4, 2024 · 4 min Anthropic shipped three models instead of one. That is actually the most interesting part of the release. claude anthropic llm

LLM Evaluation: Stop Shipping on Vibes February 19, 2024 · 5 min Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do. evaluation llm testing

Architecting AI-Native Applications (Without the Delusion) February 5, 2024 · 7 min AI-native apps are fundamentally different from a model bolted onto a CRUD app. How I structure them -- with code, layers, and hard-won opinions. architecture ai design

Stop Paying OpenAI to Test Your Prompts January 22, 2024 · 4 min Local LLMs are finally good enough for development. Use them for iteration, keep the API bills for production. llm local-development ollama

AI Engineering Is Its Own Discipline Now January 8, 2024 · 4 min AI engineering is not ML research with a product hat. It is the discipline of making models behave in production -- and it demands its own skill set. ai-engineering career skills

Two Weeks With the Assistants API: What I Like, What I Hate December 4, 2023 · 4 min I built three things with the Assistants API. One shipped, one got scrapped, and one taught me where the API's limits really are. openai assistants-api ai

OpenAI DevDay Happened and I Have Opinions November 27, 2023 · 4 min OpenAI DevDay was not just a product launch. It was a platform play that changes the build-vs-buy calculus for every team shipping AI features. openai ai devday

LLM Security: A Field Guide for People Who Ship Things October 30, 2023 · 6 min LLMs bring security failure modes most teams aren't defending against. Prompt injection, data leakage, tool abuse, and cost attacks are exploitable today. security llm ai

AI Technical Debt Is Eating Your Codebase (You Just Cannot See It Yet) October 2, 2023 · 4 min AI features create a new species of technical debt that hides in prompts, data pipelines, and model versions. By the time you notice it, the cleanup bill is brutal. ai technical-debt engineering

Agent Architecture Patterns That Actually Work in Production September 18, 2023 · 6 min Most agent demos are impressive. Most agent production systems are not. Here is what separates the two. ai agents llm

LLM Observability: Your Existing Monitoring Is Not Enough August 21, 2023 · 5 min Traditional monitoring says the service is up. It won't tell you the model started returning garbage last Tuesday. How to actually observe LLM systems. observability llm ai

What I Learned Building AI Features Into a Fintech Product August 7, 2023 · 5 min Building AI features at a fintech taught me the hard part isn't the model: it's defining quality, handling failures, and not shipping a demo as a product. ai product-engineering fintech

Your LLM Bill Is Your Own Fault July 24, 2023 · 4 min Everyone's complaining about LLM costs. Almost nobody has done the basics: caching, model routing, or even measuring what they're spending per feature. ai cost llm

Fine-Tuning vs. Prompting: A Decision Framework May 15, 2023 · 4 min Most teams should exhaust prompting before they even think about fine-tuning. Here's how to decide which lever to pull. ai fine-tuning prompting

LangChain Is the New ORM: Convenient Until It Is Not May 1, 2023 · 4 min LangChain promises to simplify LLM development. Instead it adds abstraction layers you will fight against the moment your use case gets real. langchain ai llm

RAG Patterns That Actually Work in Production April 17, 2023 · 8 min RAG is the default architecture for grounding LLMs in private data. Here are the patterns that survive real traffic, with Go examples from production systems. rag ai llm

Claude vs GPT: A User's Honest Take March 27, 2023 · 3 min Anthropic's Claude takes a different approach to AI safety. Here is how it compares to GPT in practice, from someone using both daily. ai claude anthropic

My First Week Building with GPT-4 March 6, 2023 · 4 min GPT-4 landed and everything changed. What I learned in the first week of building with it, and the architecture decisions that followed. ai gpt-4 openai

Prompt Engineering Is Not Engineering February 6, 2023 · 3 min The term 'prompt engineering' oversells what is essentially clear writing. It is a useful skill, not a discipline. ai prompt-engineering llm

LLM Integration Patterns That Actually Survive Production January 23, 2023 · 6 min Practical patterns for integrating LLMs into real applications -- prompt management, structured outputs, caching, fallbacks, and tool use -- with Go examples. ai llm go