I’ve been in or around AI teams since 2018 – from Google for Startups in Seoul to enterprise teams, with roots going back to my first startup. One lesson keeps repeating: teams rarely fail at AI because they lack talent. They fail because nobody owns the outcome.
That sounds harsh. It’s also true.
The Ownership Gap
Here’s how it usually goes. A company decides to “do AI.” They hire an ML engineer, maybe two. Those engineers build a demo. Leadership is impressed. Then someone asks, “Who owns this in production?” and the room goes quiet.
The ML engineer built the model. The product team didn’t spec the success criteria. The data engineer wasn’t involved. The designer has no idea what happens when the model gets it wrong. And nobody defined what “getting it wrong” even means.
I’ve seen this exact pattern at large enterprises and small startups. The blocker isn’t technology. It’s structure.
Three Models That Work
Every successful team I’ve seen fits one of three structures.
Embedded. AI engineers sit inside product teams. They ship features directly, own the evaluation, and live with the consequences of their choices. This works when AI is a feature, not a platform. The downside: practices drift across teams because there’s no central coordination.
Platform. A central team builds shared infrastructure – model serving, evaluation harnesses, prompt management, observability. Product teams consume that platform. This works when multiple products need AI. The downside: the platform team gets pulled in every direction and loses focus on any single product.
Hybrid. A platform team builds the core. Embedded engineers in product teams customize it. This is the most common pattern at companies that have scaled this successfully. It also requires the most coordination. Without clear ownership boundaries, it degenerates into blame-passing between platform and product.
Pick the model that matches your current scale, not the one you hope to need in two years.
Who to Hire
The best AI engineers I’ve worked with share a few traits that don’t show up on resumes.
They can explain how their system fails. Not just how it works, but how it breaks and what happens when it does. This is the best interview signal I’ve found.
They think in systems, not models. The model is one component. The retrieval layer, validation step, fallback path, and monitoring are just as important. A candidate who talks only about model architecture is missing the point.
They build evaluations before they build features. If you can’t measure whether the thing works, you’re guessing. The best engineers treat eval sets like test suites. They version them, maintain them, and refuse to ship without them.
They’ve shipped something to real users. Not a notebook. Not a demo. Something people used, complained about, and forced them to iterate on. Production experience changes how you think about every design choice.
The Operating Loop
Fancy process frameworks aren’t necessary. A tight loop between four phases covers it:
Discovery. Define success in measurable terms. What does “good” look like? What are the edge cases? Is the data available? A clear definition of success is worth more than a long list of ideas.
Prototyping. Run small experiments with real examples. Document the failures, not just the successes. Bring domain experts in early – they know the edge cases you’ll miss.
Development. Build the evaluation suite first. Version prompts and retrieval logic as code. Test against known failure cases whenever models or data change.
Production. Roll out gradually. Monitor quality and cost in the same dashboard. Treat regressions as product issues with named owners, not vague “the model changed” explanations.
What Actually Goes Wrong
The problems I see most often aren’t technical:
- Nobody owns evaluation for a specific feature. There’s a shared checklist but no named person.
- Success criteria are undefined, so feedback becomes opinion. “This doesn’t feel right” isn’t actionable.
- The pipeline is too complex for the use case. Someone built a multi-agent system for what should have been a single prompt.
- Knowledge stays in people’s heads. When someone leaves, the team loses context that took months to build.
Fix these four problems and you’re ahead of most AI teams. No new tools required. No new hires. Just clarity about who owns what and how you know it’s working.
That’s the whole secret: clear ownership, reliable evaluation, and the discipline to maintain both. Everything else is detail.