Testing

Definition

Testing coverage in this archive spans 8 posts from Aug 2017 to Apr 2025 and leans into practical engineering craft: interfaces, testing, and maintainable implementation details. The strongest adjacent threads are ai, quality, and go. Recurring title motifs include testing, llm, lying, and ai.

What the archive argues

The through-line is clarity first: simple designs that survive change beat clever abstractions.
Early posts lean on lying and need, while newer posts lean on testing and llm as constraints shifted.
This topic repeatedly intersects with ai, quality, and go, so design choices here rarely stand alone.

Execution checklist

Keep interfaces small, automate regressions early, and make operational assumptions explicit in code.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read ai and quality before committing implementation details.

Common failure modes

Abstracting before usage patterns are stable enough to justify indirection.
Treating style consistency as optional until quality and velocity both degrade.
Applying guidance from 2017 to 2025 without revisiting assumptions as context changed.

References

Testing AI Where It Actually Runs

Apr 2025

Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code.

How I Actually Test LLM Features

Aug 2024

LLM outputs are non-deterministic. That doesn't mean you can't test them rigorously. Here's the layered testing approach I use in production.

LLM Evaluation: Stop Shipping on Vibes

Feb 2024

Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do.

Testing Microservices Without Losing Your Mind

Sep 2022

Microservices fail at the seams. A layered test strategy that keeps feedback fast and catches integration issues before production.

Comparing Infrastructure Testing Approaches: What Actually Catches Bugs

Feb 2020

I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses.

Your Load Tests Are Lying to You

Aug 2019

Most load tests produce comforting numbers instead of useful answers. Here's what I learned the hard way about getting honest results.

Your Staging Environment Is Lying to You

Jun 2019

Staging never catches the real bugs. Here's how I learned to test in production without burning everything down.

You Don't Need to Be Netflix to Break Things on Purpose

Aug 2017

Chaos engineering isn't just for the big players. Here's how a small team can start breaking things deliberately and actually learn from it.