// Topics / Infrastructure

Infrastructure

Definition

Infrastructure coverage in this archive spans 41 posts from Feb 2016 to Mar 2026 and focuses on reliability, delivery speed, and cost discipline as one system, not three separate concerns. The strongest adjacent threads are devops, cloud, and kubernetes. Recurring title motifs include kubernetes, infrastructure, production, and need.

Working claims

  • Most posts prioritize predictable operations over feature breadth or stack novelty.
  • Early posts lean on production and kubernetes, while newer posts lean on infrastructure and engineering as constraints shifted.
  • This topic repeatedly intersects with devops, cloud, and kubernetes, so design choices here rarely stand alone.

How to apply this

  • Set SLOs first, then choose tooling that keeps deploy, observability, and rollback simple.
  • Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
  • When boundary questions appear, cross-read devops and cloud before committing implementation details.

Where teams get burned

  • Adding platform layers faster than the team can operate and debug them.
  • Chasing throughput gains without proving they improve end-user reliability.
  • Applying guidance from 2016 to 2026 without revisiting assumptions as context changed.

Suggested reading path

References

    The 2026 AI Build vs. Buy Calculus (It’s Just Operational Cost) By mid-2026, AI build vs buy has nothing to do with novelty. It is a ruthless mathematical calculation of telemetry, context freshness, and infrastructure lock-in. build-vs-buy ai architecture Beyond Cloud-Heavy Architecture: Why Agentic Systems Need Local-First, Hardware-Aware Design Local-first, hardware-aware architecture is becoming the default for high-reliability AI: cloud-heavy patterns cost too much and fail unpredictably. agenticops infrastructure hardware Your AI Pipeline Is Just ETL With Extra Steps (And That's Fine) AI data pipelines are ETL with a retrieval layer bolted on. The discipline is the same as always: detect change, chunk intelligently, keep indexes fresh. data pipelines ai Your AI Infrastructure Is Not Special AI infrastructure at scale is just infrastructure. The same boring patterns -- gateways, caching, circuit breakers, budgets -- solve the same boring problems. ai infrastructure scale Your AI Infrastructure Is Not Ready for Scale. Neither Is Mine. GPU shortage is real, rate limits are a production constraint, and your AI demo will collapse under real traffic. Annoyed thoughts on infrastructure realism. ai infrastructure scale Vector Databases: What They Actually Are and When You Need One A practical guide to vector databases -- what they store, how similarity search works, and the architectural decisions that matter in production. vector-database ai embeddings Your Cloud Bill Is Not a Mystery Most cloud cost problems are visibility problems. Fix tagging, kill idle resources, right-size what remains, and make cost a regular engineering conversation. cost cloud infrastructure Platform Engineering: DevOps Grew Up Platform engineering is what happens when you realize 'you build it, you run it' does not scale past a handful of teams. platform-engineering devops developer-experience You Do Not Need a FinOps Team Cloud cost management is not a discipline. It is basic engineering hygiene dressed up with a consulting-friendly name. cloud cost finops Most Platform Teams Are Building the Wrong Thing Most platform teams build tools nobody asked for while developers wait in ticket queues. Lessons from maturity assessments at a dozen enterprises. platform-engineering devops developer-experience Your Kubernetes Bill Is Lying to You Most Kubernetes clusters are 40-60% over-provisioned. Here's how I help teams cut their bills without sacrificing reliability. kubernetes cost finops Database Reliability Engineering: What I've Learned the Hard Way Practical database reliability from running Postgres in production: configs, safe migration patterns, and the operational habits that prevent outages. databases reliability sre Data Engineering Patterns: Batch vs. CDC vs. Streaming A comparison of data ingestion patterns from building the fintech startup's financial data pipelines, plus when each one actually makes sense. data-engineering analytics data-pipelines Multi-Cloud Is Mostly a Marketing Strategy Multi-cloud sounds great in vendor pitches. In practice, it doubles your operational burden for benefits most teams will never need. multi-cloud cloud architecture Apple Silicon Won't Replace Your Servers (Yet) The M1 is impressive hardware. The 'ARM everywhere in the data center' takes are not. Here's what actually matters for server infrastructure. arm apple-silicon infrastructure Platform Engineering Is Just DevOps With a Rebrand The industry loves renaming things. Platform engineering is DevOps done properly — and most companies still won't do it right. platform-engineering devops infrastructure I Wrote Six Kubernetes Operators. Here's What Actually Matters. Lessons from building production operators at Decloud: the reconciliation loop, controller-runtime patterns, and the mistakes that cost us sleep. kubernetes operators go Stop Guessing Your Kubernetes Resource Limits Most K8s clusters I audit are either wildly overprovisioned or one bad deploy away from eviction storms. Here's how I set requests, limits, and guardrails. kubernetes devops infrastructure Your VPN Was Never a Security Architecture COVID broke everyone's VPN. Good. It was a terrible security model to begin with. The answer isn't scaling your VPN — it's replacing the mental model entirely. vpn zero-trust infrastructure Your Cloud Security Is Falling Apart Right Now Everyone's scaling cloud infrastructure overnight. Security doesn't degrade under that pressure — it collapses. Make the secure path the easy path. security cloud aws Your Video Infrastructure Isn't Ready for What's Coming Most companies building video calling are making the same architecture mistakes. What I keep seeing, and how to fix it before your SFUs fall over. video infrastructure scaling Comparing Infrastructure Testing Approaches: What Actually Catches Bugs I tested Terraform modules with unit checks, policy engines, and full integration runs side by side. Here's what each approach actually catches and what it misses. infrastructure testing terraform Your Terraform Monolith Will Break. Here's How to Fix It Before It Does. Lessons from splitting a 4000-resource Terraform state into something teams can work with: state layout, module boundaries, and workflow discipline. terraform infrastructure devops Kubernetes Ships Insecure by Default. Here's What to Do About It. Kubernetes defaults optimize for fast adoption, not safety. A hardening checklist from running production clusters at three startups. kubernetes security infrastructure Your Cloud Bill Is Lying to You: A Cost Optimization Comparison A direct comparison of cloud cost optimization strategies -- what actually moves the needle vs. what just makes finance feel better. cloud aws cost GitOps: Stop SSHing Into Production How I moved three teams off ad-hoc kubectl deployments and onto Git-driven infrastructure -- with code examples, repo layouts, and the mistakes I made along the way. gitops devops kubernetes The Boring Kubernetes Checklist That Actually Keeps Production Alive Most Kubernetes outages come from skipping the basics. Here's the checklist I use after running clusters at the fintech startup and now at Decloud. kubernetes devops infrastructure Istio: Powerful, Painful, and Probably More Than You Need My honest take on evaluating Istio at the fintech startup — what it actually gives you, what it costs you, and why most teams should think twice before adopting it. service-mesh istio kubernetes IaC Patterns That Actually Work Opinionated Terraform patterns from the fintech startup: repo layout, modules, state management, and what burns you if you ignore it. infrastructure terraform iac Kubernetes Operators: Powerful, but Overhyped Operators are the hot thing in the Kubernetes world right now. They're genuinely useful — but the hype is outpacing the reality for most teams. kubernetes operators devops Zero Trust Is Not a Product. Here's How We Actually Built It. Perimeter security is dead. How I replaced castle-and-moat at the fintech startup with zero trust — identity-first, micro-segmented, no implicit trust. security architecture zero-trust Two Years of Kubernetes in Production — The Boring Parts Are the Hard Parts Year two of Kubernetes at the fintech startup: networking, resource tuning, and the operational grunt work nobody blogs about. kubernetes containers devops Spectre and Meltdown Broke My Weekend Five days after the Spectre/Meltdown disclosure: what happened, what we patched, and why it changes the game for anyone on shared infrastructure. security infrastructure cpu Your Containers Aren't Secure. Here's What to Actually Do About It. Containers give you process isolation, not a security boundary. How we hardened images, locked down runtimes, and segmented networks at the fintech startup. containers docker kubernetes Multi-Region Architecture: What I Wish Someone Had Told Me What I learned evaluating multi-region at the fintech startup: the patterns that work, the ones that burn you, and when you should even bother. architecture distributed-systems cloud Pitching Infrastructure to People Who Don't Care About Infrastructure Your board doesn't care about Kubernetes. They care about money, risk, and speed. Here's how I learned to pitch infra investment at the fintech startup. infrastructure leadership business Your Cloud Bill Is Lying to You That clean AWS pricing page has almost nothing to do with your actual invoice. I learned this the hard way at the fintech startup. cloud aws cost A Year Running Kubernetes in Production — What Actually Happened After a year of Kubernetes in production: the wins are real, but the sharp edges drew blood first. What paid off, what bit us, and what I'd do differently. kubernetes containers devops Log Aggregation at Scale: ELK vs Alternatives ELK is powerful. It's also a second full-time job. Here's what I learned running it at Dropbyke, and what I'd consider instead. logging elk elasticsearch The Real Cost of Running Your Own Servers in 2016 Most startups have no business running their own servers. The math is not close. cloud infrastructure aws Ansible Won Because It's the Simplest I used all three. Ansible required the least ceremony. That's the whole argument. ansible puppet chef Docker in Production: What We Learned Running Containers at Dropbyke Running Docker in production at Dropbyke forced us to get serious about image builds, networking, log aggregation, and security. What actually worked. docker containers devops