// Topics / Infrastructure

Infrastructure

Definition

Infrastructure coverage in this archive spans 41 posts from Feb 2016 to Mar 2026 and focuses on reliability, delivery speed, and cost discipline as one system, not three separate concerns. The strongest adjacent threads are devops, cloud, and kubernetes. Recurring title motifs include kubernetes, infrastructure, production, and need.

Working claims

Most posts prioritize predictable operations over feature breadth or stack novelty.
Early posts lean on production and kubernetes, while newer posts lean on infrastructure and engineering as constraints shifted.
This topic repeatedly intersects with devops, cloud, and kubernetes, so design choices here rarely stand alone.

How to apply this

Set SLOs first, then choose tooling that keeps deploy, observability, and rollback simple.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read devops and cloud before committing implementation details.

Where teams get burned

Adding platform layers faster than the team can operate and debug them.
Chasing throughput gains without proving they improve end-user reliability.
Applying guidance from 2016 to 2026 without revisiting assumptions as context changed.

Suggested reading path

Start here (current state): Beyond Cloud-Heavy Architecture: Why Agentic Systems Need Local-First, Hardware-Aware Design
Then read (operating middle): Comparing Infrastructure Testing Approaches: What Actually Catches Bugs
Finish with (foundational context): Docker in Production: What We Learned Running Containers at Scale

References

42 entries tagged “Infrastructure”

The 2026 AI Build vs. Buy Calculus (It’s Just Operational Cost) April 30, 2026 · 3 min By mid-2026, AI build vs buy has nothing to do with novelty. It is a ruthless mathematical calculation of telemetry, context freshness, and infrastructure lock-in. build-vs-buy ai architecture

Beyond Cloud-Heavy Architecture: Why Agentic Systems Need Local-First, Hardware-Aware Design March 9, 2026 · 7 min Local-first, hardware-aware architecture is becoming the default for high-reliability AI: cloud-heavy patterns cost too much and fail unpredictably. agenticops infrastructure hardware

Your AI Pipeline Is Just ETL With Extra Steps (And That's Fine) May 26, 2025 · 5 min AI data pipelines are ETL with a retrieval layer bolted on. The discipline is the same as always: detect change, chunk intelligently, keep indexes fresh. data pipelines ai

Your AI Infrastructure Is Not Special December 9, 2024 · 4 min AI infrastructure at scale is just infrastructure. The same boring patterns -- gateways, caching, circuit breakers, budgets -- solve the same boring problems. ai infrastructure scale

Your AI Infrastructure Is Not Ready for Scale. Neither Is Mine. December 18, 2023 · 4 min GPU shortage is real, rate limits are a production constraint, and your AI demo will collapse under real traffic. Annoyed thoughts on infrastructure realism. ai infrastructure scale

Vector Databases: What They Actually Are and When You Need One April 3, 2023 · 6 min A practical guide to vector databases -- what they store, how similarity search works, and the architectural decisions that matter in production. vector-database ai embeddings

Your Cloud Bill Is Not a Mystery December 19, 2022 · 3 min Most cloud cost problems are visibility problems. Fix tagging, kill idle resources, right-size what remains, and make cost a regular engineering conversation. cost cloud infrastructure

Platform Engineering: DevOps Grew Up November 7, 2022 · 4 min Platform engineering is what happens when you realize 'you build it, you run it' does not scale past a handful of teams. platform-engineering devops developer-experience

You Do Not Need a FinOps Team October 3, 2022 · 4 min Cloud cost management is not a discipline. It is basic engineering hygiene dressed up with a consulting-friendly name. cloud cost finops

Most Platform Teams Are Building the Wrong Thing November 1, 2021 · 6 min Most platform teams build tools nobody asked for while developers wait in ticket queues. Lessons from maturity assessments at a dozen enterprises. platform-engineering devops developer-experience

Your Kubernetes Bill Is Lying to You October 18, 2021 · 5 min Most Kubernetes clusters are 40-60% over-provisioned. Here's how I help teams cut their bills without sacrificing reliability. kubernetes cost finops

Database Reliability Engineering: What I've Learned the Hard Way August 9, 2021 · 7 min Practical database reliability from running Postgres in production: configs, safe migration patterns, and the operational habits that prevent outages. databases reliability sre

Data Engineering Patterns: Batch vs. CDC vs. Streaming May 17, 2021 · 6 min A comparison of data ingestion patterns from building the fintech startup's financial data pipelines, plus when each one actually makes sense. data-engineering analytics data-pipelines

Multi-Cloud Is Mostly a Marketing Strategy April 5, 2021 · 4 min Multi-cloud sounds great in vendor pitches. In practice, it doubles your operational burden for benefits most teams will never need. multi-cloud cloud architecture

Apple Silicon Won't Replace Your Servers (Yet) November 16, 2020 · 3 min The M1 is impressive hardware. The 'ARM everywhere in the data center' takes are not. Here's what actually matters for server infrastructure. arm apple-silicon infrastructure

Platform Engineering Is Just DevOps With a Rebrand October 19, 2020 · 3 min The industry loves renaming things. Platform engineering is DevOps done properly — and most companies still won't do it right. platform-engineering devops infrastructure

I Wrote Six Kubernetes Operators. Here's What Actually Matters. August 3, 2020 · 9 min Lessons from building production operators at Decloud: the reconciliation loop, controller-runtime patterns, and the mistakes that cost us sleep. kubernetes operators go

Stop Guessing Your Kubernetes Resource Limits June 1, 2020 · 6 min Most K8s clusters I audit are either wildly overprovisioned or one bad deploy away from eviction storms. Here's how I set requests, limits, and guardrails. kubernetes devops infrastructure

Your VPN Was Never a Security Architecture May 4, 2020 · 4 min COVID broke everyone's VPN. Good. It was a terrible security model to begin with. The answer isn't scaling your VPN — it's replacing the mental model entirely. vpn zero-trust infrastructure

Your Cloud Security Is Falling Apart Right Now April 27, 2020 · 7 min Everyone's scaling cloud infrastructure overnight. Security doesn't degrade under that pressure — it collapses. Make the secure path the easy path. security cloud aws

Your Video Infrastructure Isn't Ready for What's Coming March 30, 2020 · 6 min Most companies building video calling are making the same architecture mistakes. What I keep seeing, and how to fix it before your SFUs fall over. video infrastructure scaling

Comparing Infrastructure Testing Approaches: What Actually Catches Bugs February 17, 2020 · 6 min I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses. infrastructure testing terraform

Your Terraform Monolith Will Break. Here's How to Fix It Before It Does. September 23, 2019 · 6 min Lessons from splitting a 4000-resource Terraform state into something teams can work with: state layout, module boundaries, and workflow discipline. terraform infrastructure devops

Kubernetes Ships Insecure by Default. Here's What to Do About It. April 22, 2019 · 5 min Kubernetes defaults optimize for fast adoption, not safety. A hardening checklist from running production clusters at three startups. kubernetes security infrastructure

Your Cloud Bill Is Lying to You: A Cost Optimization Comparison April 8, 2019 · 5 min A direct comparison of cloud cost optimization strategies -- what actually moves the needle vs. what just makes finance feel better. cloud aws cost

GitOps: Stop SSHing Into Production February 11, 2019 · 9 min How I moved three teams off ad-hoc kubectl deployments and onto Git-driven infrastructure -- with code examples, repo layouts, and the mistakes I made along the way. gitops devops kubernetes

The Boring Kubernetes Checklist That Actually Keeps Production Alive January 14, 2019 · 5 min Most Kubernetes outages come from skipping the basics. Here's the checklist I use after running clusters at the fintech startup and now at Decloud. kubernetes devops infrastructure

Istio: Powerful, Painful, and Probably More Than You Need November 26, 2018 · 5 min My honest take on evaluating Istio at the fintech startup — what it actually gives you, what it costs you, and why most teams should think twice before adopting it. service-mesh istio kubernetes

IaC Patterns That Actually Work October 29, 2018 · 4 min Opinionated Terraform patterns from the fintech startup: repo layout, modules, state management, and what burns you if you ignore it. infrastructure terraform iac

Kubernetes Operators: Powerful, but Overhyped April 2, 2018 · 3 min Operators are the hot thing in the Kubernetes world right now. They're genuinely useful — but the hype is outpacing the reality for most teams. kubernetes operators devops

Zero Trust Is Not a Product. Here's How We Actually Built It. February 19, 2018 · 5 min Perimeter security is dead. How I replaced castle-and-moat at the fintech startup with zero trust — identity-first, micro-segmented, no implicit trust. security architecture zero-trust

Two Years of Kubernetes in Production — The Boring Parts Are the Hard Parts January 22, 2018 · 7 min Year two of Kubernetes at the fintech startup: networking, resource tuning, and the operational grunt work nobody blogs about. kubernetes containers devops

Spectre and Meltdown Broke My Weekend January 8, 2018 · 4 min Five days after the Spectre/Meltdown disclosure: what happened, what we patched, and why it changes the game for anyone on shared infrastructure. security infrastructure cpu

Your Containers Aren't Secure. Here's What to Actually Do About It. December 4, 2017 · 5 min Containers give you process isolation, not a security boundary. How we hardened images, locked down runtimes, and segmented networks at the fintech startup. containers docker kubernetes

Multi-Region Architecture: What I Wish Someone Had Told Me October 2, 2017 · 6 min What I learned evaluating multi-region at the fintech startup: the patterns that work, the ones that burn you, and when you should even bother. architecture distributed-systems cloud

Pitching Infrastructure to People Who Don't Care About Infrastructure September 4, 2017 · 3 min Your board doesn't care about Kubernetes. They care about money, risk, and speed. Here's how I learned to pitch infra investment at the fintech startup. infrastructure leadership business

Your Cloud Bill Is Lying to You July 3, 2017 · 4 min That clean AWS pricing page has almost nothing to do with your actual invoice. I learned this the hard way at the fintech startup. cloud aws cost

A Year Running Kubernetes in Production — What Actually Happened January 16, 2017 · 6 min After a year of Kubernetes in production: the wins are real, but the sharp edges drew blood first. What paid off, what bit us, and what I'd do differently. kubernetes containers devops

Log Aggregation at Scale: ELK vs Alternatives September 5, 2016 · 4 min ELK is powerful. It's also a second full-time job. Here's what I learned running it at a mobility startup, and what I'd consider instead. logging elk elasticsearch

The Real Cost of Running Your Own Servers in 2016 July 5, 2016 · 3 min Most startups have no business running their own servers. The math is not close. cloud infrastructure aws

Ansible Won Because It's the Simplest April 25, 2016 · 2 min I used all three. Ansible required the least ceremony. That's the whole argument. ansible puppet chef

Docker in Production: What We Learned Running Containers at Scale February 8, 2016 · 8 min Running Docker in production at a mobility startup forced us to get serious about image builds, networking, log aggregation, and security. What actually worked. docker containers devops