Everyone has a favorite AI developer tool now: code assistants, LLM frameworks, vector databases, eval harnesses, observability platforms, deployment wrappers. The landscape is overwhelming, and most of it isn’t worth your time.
That isn’t cynicism. It’s experience. I’ve watched teams adopt tools that solve problems they don’t have, add abstraction layers they can’t debug, and create dependencies they can’t unwind. The result is a stack that’s harder to understand than the problem it was supposed to simplify.
The framework trap
Here is my unpopular opinion: most teams shouldn’t be using an LLM framework. LangChain, LlamaIndex, whatever ships next week – they are solving a real problem, but they are solving it for a use case most teams haven’t reached yet.
If your application calls one model with one prompt and parses the output, you don’t need a framework. You need an HTTP client and solid error handling. A framework adds routing, memory, tool calling, and chain-of-thought orchestration that you might need in six months. Right now, it mostly adds layers you can’t see through when something breaks.
Start without the framework. Add it when you can name the specific pieces it replaces and what maintenance burden it removes. Not before.
Code assistants are useful. Stop pretending they are magic.
I use Copilot daily. It’s good at boilerplate, decent at suggesting patterns I’ve seen before, and occasionally impressive on unfamiliar code. It’s also confidently wrong often enough that accepting suggestions uncritically is dangerous.
Teams getting real value from code assistants treat the output as a first draft. It goes through the same code review process as any other contribution. Teams getting hurt are the ones accepting suggestions because they “look right” without checking whether they actually are.
The productivity gain is real, but smaller than the marketing suggests. It also comes with a hidden cost: style drift. The assistant doesn’t know your team’s conventions. Over time, the codebase starts to feel inconsistent unless you actively enforce standards on AI-generated code.
What actually earns its place
After working with several teams on their AI tooling stacks, I have a short list of what I think is genuinely worth adopting:
Eval harnesses. Whatever helps you measure output quality against a test set. This can be a framework or a 200-line script. It doesn’t matter. What matters is that it exists and runs on every change.
Structured logging for LLM calls. Not a fancy observability platform – just disciplined logging of prompts, responses, latency, and token counts. You will need this data the moment something goes wrong. Which will be soon.
A simple abstraction over model providers. Not a framework. Just a thin interface that lets you swap models without rewriting calling code. I build these in Go in an afternoon. They pay for themselves the first time a provider changes their API.
That’s it. Everything else should prove its value before it gets a spot in go.mod.
The decision filter
Before adopting any AI tool, answer one question: what specific friction does this remove that I can’t solve with under a day of custom code?
If the answer is “it makes things easier” or “everyone is using it,” that isn’t good enough. If the answer is “it replaces 500 lines of boilerplate I maintain across three services,” then fine. Adopt it.
Keep the stack small. Keep it legible. The tooling landscape will look completely different in six months anyway.