// Topic
Testing
Definition
Testing coverage in this archive spans 8 posts from Aug 2017 to Apr 2025 and leans into practical engineering craft: interfaces, testing, and maintainable implementation details. The strongest adjacent threads are ai, quality, and go. Recurring title motifs include testing, llm, lying, and ai.
What the archive argues
- The through-line is clarity first: simple designs that survive change beat clever abstractions.
- Early posts lean on lying and need, while newer posts lean on testing and llm as constraints shifted.
- This topic repeatedly intersects with ai, quality, and go, so design choices here rarely stand alone.
Execution checklist
- Keep interfaces small, automate regressions early, and make operational assumptions explicit in code.
- Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
- When boundary questions appear, cross-read ai and quality before committing implementation details.
Common failure modes
- Abstracting before usage patterns are stable enough to justify indirection.
- Treating style consistency as optional until quality and velocity both degrade.
- Applying guidance from 2017 to 2025 without revisiting assumptions as context changed.
Suggested reading path
- Start here (current state): Testing AI Where It Actually Runs
- Then read (operating middle): Comparing Infrastructure Testing Approaches: What Actually Catches Bugs
- Finish with (foundational context): You Don’t Need to Be Netflix to Break Things on Purpose
Related posts
- Testing AI Where It Actually Runs
- How I Actually Test LLM Features
- LLM Evaluation: Stop Shipping on Vibes
- Testing Microservices Without Losing Your Mind
- Comparing Infrastructure Testing Approaches: What Actually Catches Bugs
- Your Load Tests Are Lying to You
- Your Staging Environment Is Lying to You
- You Don’t Need to Be Netflix to Break Things on Purpose
References
8 posts
- Testing AI Where It Actually Runs
Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code.
How I Actually Test LLM Features
LLM outputs are non-deterministic. That doesn't mean you can't test them rigorously. Here's the layered testing approach I use in production.
LLM Evaluation: Stop Shipping on Vibes
Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do.
Testing Microservices Without Losing Your Mind
Microservices fail at the seams. A layered test strategy that keeps feedback fast and catches integration issues before production.
Comparing Infrastructure Testing Approaches: What Actually Catches Bugs
I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses.
Your Load Tests Are Lying to You
Most load tests produce comforting numbers instead of useful answers. Here's what I learned the hard way about getting honest results.
Your Staging Environment Is Lying to You
Staging never catches the real bugs. Here's how I learned to test in production without burning everything down.
You Don't Need to Be Netflix to Break Things on Purpose
Chaos engineering isn't just for the big players. Here's how a small team can start breaking things deliberately and actually learn from it.