// Topics / Devops

Devops

Definition

Devops coverage in this archive spans 49 posts from Feb 2016 to Nov 2022 and focuses on reliability, delivery speed, and cost discipline as one system, not three separate concerns. The strongest adjacent threads are infrastructure, kubernetes, and security. Recurring title motifs include kubernetes, production, platform, and scale.

Key claims

Most posts prioritize predictable operations over feature breadth or stack novelty.
Early posts lean on kubernetes and production, while newer posts lean on kubernetes and platform as constraints shifted.
This topic repeatedly intersects with infrastructure, kubernetes, and security, so design choices here rarely stand alone.

Practical checklist

Set SLOs first, then choose tooling that keeps deploy, observability, and rollback simple.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read infrastructure and kubernetes before committing implementation details.

Failure modes

Adding platform layers faster than the team can operate and debug them.
Chasing throughput gains without proving they improve end-user reliability.
Applying guidance from 2016 to 2022 without revisiting assumptions as context changed.

Suggested reading path

Start here (current state): Infrastructure as Code Patterns That Actually Scale
Then read (operating middle): The Boring Kubernetes Checklist That Actually Keeps Production Alive
Finish with (foundational context): Docker in Production: What We Learned Running Containers at Scale

References

49 entries tagged “Devops”

Infrastructure as Code Patterns That Actually Scale November 21, 2022 · 6 min Practical Terraform patterns for teams that have outgrown the tutorial stage: module design, state management, environment promotion, and policy enforcement. infrastructure-as-code terraform devops

Platform Engineering: DevOps Grew Up November 7, 2022 · 4 min Platform engineering is what happens when you realize 'you build it, you run it' does not scale past a handful of teams. platform-engineering devops developer-experience

Monorepo vs. Polyrepo: A Practical Decision Guide October 31, 2022 · 4 min Monorepo or polyrepo depends on coupling, team shape, and your appetite for build tooling. Here is how to decide without getting religious about it. architecture monorepo git

Kubernetes Requests and Limits: Lessons From Getting It Wrong September 5, 2022 · 5 min CPU is compressible. Memory is not. That one sentence explains 80% of Kubernetes resource problems. kubernetes resources capacity-planning

Hardening Kubernetes: The Stuff That Actually Matters February 7, 2022 · 7 min Kubernetes defaults favor convenience over security. A layered hardening guide covering pods, RBAC, network policies, secrets, and the control plane. kubernetes security hardening

DORA Metrics: Stop Ruining a Good Idea January 24, 2022 · 4 min DORA metrics are useful exactly until someone puts them on a performance review. Here's how to use them without destroying your engineering culture. dora metrics devops

Terraform at Scale: What Changed Since 2019 December 6, 2021 · 6 min Two years ago I wrote about Terraform patterns for growing teams. Here's what held up, what broke, and what I do differently now. terraform infrastructure-as-code devops

Most Platform Teams Are Building the Wrong Thing November 1, 2021 · 6 min Most platform teams build tools nobody asked for while developers wait in ticket queues. Lessons from maturity assessments at a dozen enterprises. platform-engineering devops developer-experience

Feature Flags at Scale: What Nobody Warns You About September 6, 2021 · 5 min Feature flags are great until you have 847 of them and nobody knows which ones are safe to remove. Practical lessons from Decloud and enterprise teams. feature-flags deployment devops

Observability-Driven Development Is Just Instrumenting Your Code June 14, 2021 · 4 min ODD sounds fancy. It's not. It means writing logs, metrics, and traces before you ship, not after your first outage. observability monitoring development

DevSecOps in Practice: What I Actually Implement April 19, 2021 · 7 min The concrete pipeline configs, policy-as-code patterns, and runtime controls I set up to bake security into delivery. devsecops security devops

Most Teams Are Not Ready for MLOps March 22, 2021 · 4 min MLOps is real, but most teams buying MLOps tooling cannot even version their training data. Fix the basics first. mlops machine-learning devops

Platform Engineering Is Just DevOps With a Rebrand October 19, 2020 · 3 min The industry loves renaming things. Platform engineering is DevOps done properly — and most companies still won't do it right. platform-engineering devops infrastructure

Observability for Small Distributed Teams (What Actually Works) September 14, 2020 · 6 min Most observability advice is written for 500-engineer orgs. Here's what actually matters when you're a small distributed team trying not to drown in dashboards. observability monitoring distributed-systems

The GitHub Actions Patterns I Actually Use in Production July 20, 2020 · 7 min Matrix builds, dependency caching, gated deploys, and the security gotchas I hit building Decloud's CI/CD pipeline on GitHub Actions. github-actions ci-cd devops

Stop Guessing Your Kubernetes Resource Limits June 1, 2020 · 6 min Most K8s clusters I audit are either wildly overprovisioned or one bad deploy away from eviction storms. Here's how I set requests, limits, and guardrails. kubernetes devops infrastructure

Comparing Infrastructure Testing Approaches: What Actually Catches Bugs February 17, 2020 · 6 min I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses. infrastructure testing terraform

My Kubernetes Predictions for 2020 (Most of Yours Are Wrong) January 6, 2020 · 4 min The adoption debate is over. 2020 is about operating Kubernetes well: managed control planes, GitOps by default, and policy enforcement. kubernetes predictions cloud-native

Zero Downtime Deploys Are a Team Habit, Not a Tool October 21, 2019 · 5 min Every team says they want zero downtime. Few want to do the boring work that actually gets them there. Here's what that boring work looks like. deployment devops kubernetes

Your Terraform Monolith Will Break. Here's How to Fix It Before It Does. September 23, 2019 · 6 min Lessons from splitting a 4000-resource Terraform state into something teams can work with: state layout, module boundaries, and workflow discipline. terraform infrastructure devops

Internal Platforms vs. Ad-Hoc Tooling: Which Developer Experience Actually Wins August 12, 2019 · 6 min Purpose-built internal platforms versus the organic tooling teams build for themselves -- and when each approach actually delivers. developer-experience platform-engineering devops

Your Incident Response Plan Is Useless Until Someone Bleeds July 15, 2019 · 7 min Most incident response plans are shelf-ware. What actually matters when your infrastructure is on fire, drawn from real breaches and national cyber-defense exercises. security incident-management devops

Your Internal Platform Is Probably a Liability March 11, 2019 · 3 min Most internal developer platforms fail because nobody treated them like a product. Lessons from building (and scrapping) platform tooling at three startups. platform devops developer-experience

GitOps: Stop SSHing Into Production February 11, 2019 · 9 min How I moved three teams off ad-hoc kubectl deployments and onto Git-driven infrastructure -- with code examples, repo layouts, and the mistakes I made along the way. gitops devops kubernetes

The Boring Kubernetes Checklist That Actually Keeps Production Alive January 14, 2019 · 5 min Most Kubernetes outages come from skipping the basics. Here's the checklist I use after running clusters at the fintech startup and now at Decloud. kubernetes devops infrastructure

IaC Patterns That Actually Work October 29, 2018 · 4 min Opinionated Terraform patterns from the fintech startup: repo layout, modules, state management, and what burns you if you ignore it. infrastructure terraform iac

Container Security in 2018: What Actually Changed August 20, 2018 · 3 min Eight months after my first container security post: what moved at the fintech startup and in the ecosystem — PodSecurityPolicy, image signing, scratch images. security containers docker

Why Monitoring Wasn't Enough and How We Built Observability at a Fintech Startup July 9, 2018 · 5 min After a mystery outage that our dashboards couldn't explain, I rebuilt the fintech startup's telemetry stack around metrics, logs, and traces. Here's what I learned. observability monitoring devops

SRE Principles Are Great. The Cargo-Culting Is Not. April 30, 2018 · 5 min The SRE hype train has everyone copying Google's playbook without asking whether it fits. What actually matters when you're not running at planet scale. sre devops reliability

Kubernetes Operators: Powerful, but Overhyped April 2, 2018 · 3 min Operators are the hot thing in the Kubernetes world right now. They're genuinely useful — but the hype is outpacing the reality for most teams. kubernetes operators devops

Two Years of Kubernetes in Production — The Boring Parts Are the Hard Parts January 22, 2018 · 7 min Year two of Kubernetes at the fintech startup: networking, resource tuning, and the operational grunt work nobody blogs about. kubernetes containers devops

What I Learned Building Our Platform Team This Year December 28, 2017 · 5 min Reflections on standing up the fintech startup's platform team in 2017 — what worked, what didn't, and why treating infra like a product changed everything. platform teams engineering

Your Containers Aren't Secure. Here's What to Actually Do About It. December 4, 2017 · 5 min Containers give you process isolation, not a security boundary. How we hardened images, locked down runtimes, and segmented networks at the fintech startup. containers docker kubernetes

Your Incident Process Will Break at 15 People. Here's What to Do. October 23, 2017 · 5 min What I learned building incident management at the fintech startup — from five people shouting across a room to actual structured response. incident-management devops on-call

You Don't Need to Be Netflix to Break Things on Purpose August 21, 2017 · 4 min Chaos engineering isn't just for the big players. Here's how a small team can start breaking things deliberately and actually learn from it. chaos-engineering reliability testing

Stop Doing Security Reviews by Hand July 17, 2017 · 4 min Your manual security gate is a bottleneck pretending to be a process. Here's how I moved security checks into the pipeline at the fintech startup. security devops devsecops

Monitoring Is Not Enough March 20, 2017 · 3 min Your dashboards look green. Your users say the site is broken. That gap is the whole problem. observability monitoring devops

A Year Running Kubernetes in Production — What Actually Happened January 16, 2017 · 6 min After a year of Kubernetes in production: the wins are real, but the sharp edges drew blood first. What paid off, what bit us, and what I'd do differently. kubernetes containers devops

Why We Deleted 42 Grafana Panels December 12, 2016 · 3 min Most teams monitor too much and alert on the wrong things. Five metrics are enough to run a startup backend. monitoring observability devops

Container Orchestration: Docker Swarm vs Kubernetes vs Mesos October 17, 2016 · 4 min Swarm, Kubernetes, and Mesos compared side by side after running all three at a mobility startup. Kubernetes is going to win, but the operational tax is real. containers docker kubernetes

Building a Security-First Engineering Culture October 3, 2016 · 5 min Security culture is not a training program or a tool purchase. It is a set of habits that leadership enforces through consistency, not speeches. security engineering culture

Log Aggregation at Scale: ELK vs Alternatives September 5, 2016 · 4 min ELK is powerful. It's also a second full-time job. Here's what I learned running it at a mobility startup, and what I'd consider instead. logging elk elasticsearch

Database Migrations Without Downtime August 15, 2016 · 7 min A practical guide to evolving schemas without maintenance windows by keeping old and new code compatible at every step. databases migrations postgresql

Why I Moved Our Infrastructure to Terraform June 20, 2016 · 6 min We moved from console-driven, script-heavy infrastructure to Terraform so changes are reviewed, reproducible, and recoverable from code. terraform infrastructure-as-code devops

Continuous Deployment Without the Chaos June 6, 2016 · 6 min Continuous deployment is a discipline problem, not a tooling problem. We deploy a mobility startup's backend dozens of times a day because we built habits first. continuous-deployment devops ci-cd

Security Incident Response for Startups May 23, 2016 · 9 min A practical incident response playbook for small teams: define incidents, assign owners, contain fast, investigate calmly, and recover with clear communication. security incident-management startups

Ansible Won Because It's the Simplest April 25, 2016 · 2 min I used all three. Ansible required the least ceremony. That's the whole argument. ansible puppet chef

Building a DevOps Culture from Scratch March 10, 2016 · 5 min DevOps is a cultural shift, not a job title. A practical path to shared responsibility, fast feedback, and resilient delivery without hand-wavy promises. devops culture engineering

Docker in Production: What We Learned Running Containers at Scale February 8, 2016 · 8 min Running Docker in production at a mobility startup forced us to get serious about image builds, networking, log aggregation, and security. What actually worked. docker containers devops