// Topic
Resilience
Definition
Resilience coverage in this archive spans 5 posts from Jul 2016 to Mar 2026 and focuses on reliability, delivery speed, and cost discipline as one system, not three separate concerns. The strongest adjacent threads are distributed systems, teams, and leadership. Recurring title motifs include distributed, production, resilient, and teams.
Working claims
- Most posts prioritize predictable operations over feature breadth or stack novelty.
- The consistent theme from 2016 to 2026 is disciplined execution over hype cycles.
- This topic repeatedly intersects with distributed systems, teams, and leadership, so design choices here rarely stand alone.
How to apply this
- Set SLOs first, then choose tooling that keeps deploy, observability, and rollback simple.
- Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
- When boundary questions appear, cross-read distributed systems and teams before committing implementation details.
Where teams get burned
- Adding platform layers faster than the team can operate and debug them.
- Chasing throughput gains without proving they improve end-user reliability.
- Applying guidance from 2016 to 2026 without revisiting assumptions as context changed.
Suggested reading path
- Start here (current state): De-Risking the Black Swan: Red-Teaming Distributed Databases Before Production
- Then read (operating middle): Watching Layoffs From the Inside
- Finish with (foundational context): Building Resilient Systems: Lessons from Production Failures
Related posts
- De-Risking the Black Swan: Red-Teaming Distributed Databases Before Production
- Resilient Teams Are Boring Teams
- Watching Layoffs From the Inside
- What Building Distributed Systems at a Fintech Startup Taught Me About Failure
- Building Resilient Systems: Lessons from Production Failures
References
5 posts
- De-Risking the Black Swan: Red-Teaming Distributed Databases Before Production
Structured red-teaming is a practical reliability discipline for distributed databases. Most catastrophic failures are compound scenarios nobody practiced, not black swans.
Resilient Teams Are Boring Teams
The engineering teams that survived 2022 best were not the ones with the most talent. They were the ones with the least drama.
Watching Layoffs From the Inside
What I saw during the 2022 layoff wave, and what actually helps engineering teams survive contraction without burning out.
What Building Distributed Systems at a Fintech Startup Taught Me About Failure
Hard-won lessons from designing distributed systems that survive real-world failures -- timeouts, retries, bulkheads, and the operational habits that actually keep things running.
Building Resilient Systems: Lessons from Production Failures
Production incidents show where architecture bends and where it breaks. These lessons focus on designing for failure, limiting blast radius, and making recovery routine.