Testing AI Where It Actually Runs
Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code.
Quality coverage in this archive spans 7 posts from Nov 2017 to Mar 2026 and leans into practical engineering craft: interfaces, testing, and maintainable implementation details. The strongest adjacent threads are ai, testing, and code review. Recurring title motifs include ai, code, evaluation, and testing.
Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code.
I've been running AI code review on real PRs for months. It catches some real bugs. It also generates a staggering amount of useless commentary.
Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do.
After three months of using AI-assisted code review across multiple projects, here's what actually works and what's just noise.
Microservices fail at the seams. A layered test strategy that keeps feedback fast and catches integration issues before production.
Most code reviews are theater. Here's what actually makes them worth the time.