// Topic
LLM
Definition
LLM coverage in this archive spans 35 posts from Jan 2023 to Apr 2026 and treats llm as a production discipline: evaluation loops, tool boundaries, escalation paths, and cost control. The strongest adjacent threads are ai, go, and architecture. Recurring title motifs include ai, production, llm, and stop.
Key claims
- The archive repeatedly argues that llm only creates leverage when it is wired into an existing workflow.
- Early posts lean on llm and patterns, while newer posts lean on models and production as constraints shifted.
- This topic repeatedly intersects with ai, go, and architecture, so design choices here rarely stand alone.
Practical checklist
- Define quality gates up front: eval sets, guardrails, and explicit rollback criteria.
- Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
- When boundary questions appear, cross-read ai and go before committing implementation details.
Failure modes
- Shipping agent behavior without hard boundaries for tools, data access, and approvals.
- Optimizing for model novelty while ignoring reliability, latency, or cost drift.
- Applying guidance from 2023 to 2026 without revisiting assumptions as context changed.
Suggested reading path
- Start here (current state): The Best Model Is the Smallest One That Works
- Then read (operating middle): Architecting AI-Native Applications (Without the Delusion)
- Finish with (foundational context): LLM Integration Patterns That Actually Survive Production
Related posts
- The Best Model Is the Smallest One That Works
- Running AI Locally: A Practical Guide for Teams Who Care About Control
- Stop Fine-Tuning Models You Haven’t Bothered to Prompt Properly
- Reasoning Models in Production: A Practical Guide
- Picking an AI Model for Production (Late 2024)
- AI Cost Benchmarking: What Your Bill Actually Tells You
- RAG Retrieval That Actually Works
- How I Actually Test LLM Features
References
34 posts
- Running AI Locally: A Practical Guide for Teams Who Care About Control
Local AI is no longer a hobby project. Here's how to set it up properly: provider abstraction, versioned models, evaluation harnesses, and cloud fallback for when local isn't enough.
Stop Fine-Tuning Models You Haven't Bothered to Prompt Properly
Fine-tuning is the goto move for teams who skipped the basics. Most of the time, better prompts and proper retrieval solve the actual problem.
Reasoning Models in Production: A Practical Guide
Reasoning models are powerful but expensive and slow. Here's how I integrate them in Go services with routing, async patterns, and cost controls that actually work.
Picking an AI Model for Production (Late 2024)
There's no best model. There's the model that fits your workload, latency budget, cost constraint, and ops tolerance. Here's how to compare them.
AI Cost Benchmarking: What Your Bill Actually Tells You
Price-per-token is the least useful number on your AI bill. Real cost benchmarking starts with your workload, not a provider's pricing page.
RAG Retrieval That Actually Works
Most RAG failures are retrieval failures. Fixing them requires hybrid search, smarter chunking, query expansion, and reranking -- measured independently from generation.
How I Actually Test LLM Features
LLM outputs are non-deterministic. That doesn't mean you can't test them rigorously. Here's the layered testing approach I use in production.
The Best Model Is the Smallest One That Works
Everyone reaches for GPT-4 by default. Most production tasks don't need it. Small models are faster, cheaper, and often better when the task is well-defined.
Stop Stuffing Your Context Window
Bigger context windows aren't an excuse to stop thinking about what goes into them. Most teams are paying for irrelevant tokens and wondering why quality degrades.
Function Calling Patterns That Survive Production
Function calling is how LLMs touch real systems. Treat tools like APIs, arguments like untrusted input, and permissions like the model is an intern with root access.
Claude 3.5 Sonnet Analysis: Cost, Coding, and Model Routing
Claude 3.5 Sonnet changes model routing math for coding, cost, latency, and production AI workloads.
LLM Structured Output in Go: JSON Schema, Validation, Retries
How to get reliable JSON from LLMs in Go with schemas, validation, repair loops, and typed contracts.
LLM Prompt Caching in Go: Cut Costs Without Breaking Things
Caching LLM responses is the highest-leverage optimization most teams are not doing. Here is how I implement it in Go, with real patterns for keys, invalidation, and safety.
Why I Run Multiple Models in Production
Betting on a single model provider is like having a single database with no failover. Here is why multi-model is the only sane production strategy.
Claude 3 First Impressions: Three Models, One Decision Framework
Anthropic shipped three models instead of one. That is actually the most interesting part of the release.
LLM Evaluation: Stop Shipping on Vibes
Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do.
Architecting AI-Native Applications (Without the Delusion)
The architecture of an AI-native app is fundamentally different from bolting a model onto a CRUD app. Here is how I structure them -- with code, layers, and hard-won opinions.
Stop Paying OpenAI to Test Your Prompts
Local LLMs are finally good enough for development. Use them for iteration, keep the API bills for production.
AI Engineering Is Its Own Discipline Now
AI engineering is not ML research with a product hat. It is the discipline of making models behave in production -- and it demands its own skill set.
Two Weeks With the Assistants API: What I Like, What I Hate
I built three things with the Assistants API. One shipped, one got scrapped, and one taught me where the API's limits really are.
OpenAI DevDay Happened and I Have Opinions
OpenAI DevDay was not just a product launch. It was a platform play that changes the build-vs-buy calculus for every team shipping AI features.
LLM Security: A Field Guide for People Who Ship Things
LLMs introduce security failure modes that most teams are not defending against. Prompt injection, data leakage, tool abuse, and cost attacks are real and exploitable today.
AI Technical Debt Is Eating Your Codebase (You Just Cannot See It Yet)
AI features create a new species of technical debt that hides in prompts, data pipelines, and model versions. By the time you notice it, the cleanup bill is brutal.
Agent Architecture Patterns That Actually Work in Production
Most agent demos are impressive. Most agent production systems are not. Here is what separates the two.
LLM Observability: Your Existing Monitoring Is Not Enough
Traditional monitoring tells you the service is up. It doesn't tell you the model started confidently returning garbage last Tuesday. Here's how to actually observe LLM systems.
What I Learned Building AI Features Into a Fintech Product
Building AI features at a fintech infrastructure company taught me that the hard part isn't the model. It's defining quality, handling failures gracefully, and resisting the urge to ship a demo as a product.
Your LLM Bill Is Your Own Fault
Everyone's complaining about LLM costs. Almost nobody has done the basics: caching, model routing, or even measuring what they're spending per feature.
Fine-Tuning vs. Prompting: A Decision Framework
Most teams should exhaust prompting before they even think about fine-tuning. Here's how to decide which lever to pull.
LangChain Is the New ORM: Convenient Until It Is Not
LangChain promises to simplify LLM development. Instead it adds abstraction layers you will fight against the moment your use case gets real.
RAG Patterns That Actually Work in Production
RAG is the default architecture for grounding LLMs in private data. Here are the patterns that survive real traffic, with Go examples from production systems.
Claude vs GPT: A User's Honest Take
Anthropic's Claude takes a different approach to AI safety. Here is how it compares to GPT in practice, from someone using both daily.
My First Week Building with GPT-4
GPT-4 landed and everything changed. What I learned in the first week of building with it, and the architecture decisions that followed.
Prompt Engineering Is Not Engineering
The term 'prompt engineering' oversells what is essentially clear writing. It is a useful skill, not a discipline.
LLM Integration Patterns That Actually Survive Production
Practical patterns for integrating LLMs into real applications -- prompt management, structured outputs, caching, fallbacks, and tool use -- with Go examples.