// Topic

Testing

Definition

Testing coverage in this archive spans 8 posts from Aug 2017 to Apr 2025 and leans into practical engineering craft: interfaces, testing, and maintainable implementation details. The strongest adjacent threads are ai, quality, and go. Recurring title motifs include testing, llm, lying, and ai.

What the archive argues

The through-line is clarity first: simple designs that survive change beat clever abstractions.
Early posts lean on lying and need, while newer posts lean on testing and llm as constraints shifted.
This topic repeatedly intersects with ai, quality, and go, so design choices here rarely stand alone.

Execution checklist

Keep interfaces small, automate regressions early, and make operational assumptions explicit in code.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read ai and quality before committing implementation details.

Common failure modes

Abstracting before usage patterns are stable enough to justify indirection.
Treating style consistency as optional until quality and velocity both degrade.
Applying guidance from 2017 to 2025 without revisiting assumptions as context changed.

Suggested reading path

Start here (current state): Testing AI Where It Actually Runs
Then read (operating middle): Comparing Infrastructure Testing Approaches: What Actually Catches Bugs
Finish with (foundational context): You Don’t Need to Be Netflix to Break Things on Purpose

References

8 posts

Testing AI Where It Actually Runs April 14, 2025 · 6 min Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code. testing ai production

How I Actually Test LLM Features August 19, 2024 · 6 min LLM outputs are non-deterministic. That doesn't mean you can't test them rigorously. Here's the layered testing approach I use in production. llm testing ai

LLM Evaluation: Stop Shipping on Vibes February 19, 2024 · 5 min Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do. evaluation llm testing

Testing Microservices Without Losing Your Mind September 19, 2022 · 5 min Microservices fail at the seams. A layered test strategy that keeps feedback fast and catches integration issues before production. testing microservices contract-testing

Comparing Infrastructure Testing Approaches: What Actually Catches Bugs February 17, 2020 · 6 min I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses. infrastructure testing terraform

Your Load Tests Are Lying to You August 26, 2019 · 3 min Most load tests produce comforting numbers instead of useful answers. Here's what I learned the hard way about getting honest results. testing performance reliability

Your Staging Environment Is Lying to You June 3, 2019 · 5 min Staging never catches the real bugs. Here's how I learned to test in production without burning everything down. testing production feature-flags

You Don't Need to Be Netflix to Break Things on Purpose August 21, 2017 · 4 min Chaos engineering isn't just for the big players. Here's how a small team can start breaking things deliberately and actually learn from it. chaos-engineering reliability testing