Testing AI Where It Actually Runs
Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code.
Monitoring coverage in this archive spans 9 posts from Dec 2016 to Apr 2025 and focuses on reliability, delivery speed, and cost discipline as one system, not three separate concerns. The strongest adjacent threads are observability, devops, and production. Recurring title motifs include ai, observability, monitoring, and enough.
Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code.
Traditional monitoring will tell you your AI service is up. It won't tell you it's returning confident garbage. Here's what observability actually looks like for AI.
Tracing is ready. Metrics are getting there. Logs are not. Here's a practical adoption path and the code to back it up.
ODD sounds fancy. It's not. It means writing logs, metrics, and traces before you ship, not after your first outage.
eBPF promises kernel-level observability without the pain of kernel modules. The tech is real. The hype-to-adoption ratio concerns me.
Most observability advice is written for 500-engineer orgs. Here's what actually matters when you're a small distributed team trying not to drown in dashboards.
After a mystery outage that our dashboards couldn't explain, I rebuilt the fintech startup's telemetry stack around metrics, logs, and traces. Here's what I learned.
Your dashboards look green. Your users say the site is broken. That gap is the whole problem.
Most teams monitor too much and alert on the wrong things. Five metrics are enough to run a startup backend.