The archive

Writing

Ten years of operating notes, in reverse chronology. 327 entries since 2016, a continuous record of the same discipline applied to a moving target.

Long-form field notes for CEOs, founders, and technical leaders working through AI under real constraints: ownership, reliability, governance, cost, decision latency, and production reality.

The canonical reading path below is the clearest entry point into the current operating-model thesis.

Each post aims to answer five questions:

What is the core claim?
Why does it matter economically?
What operating model makes it work?
Where does it fail?
What language should a leadership team reuse?

Browse by problem

346 topics

AI Operating Systems 124 AI and Systems Architecture 72 Engineering 53 Go 53 Devops 49 Infrastructure 42 Security 40 Technical Leadership 37 LLM 34 All topics → Frameworks →

// Canonical reading

// Topics

2026 46 entries

Agent Identity Is the New Control Plane July 23, 2026 · 4 min An agent that acts needs an identity—scoped, short-lived, attributable, revocable—not a shared API key. ai reliability security Token Prices Fell. AI Bills Did Not. July 21, 2026 · 3 min Per-token prices keep falling while bills climb. Manage cost per governed workflow, not price per token. cost ai executive The Benchmark You Didn't Build July 16, 2026 · 4 min Public benchmarks are contaminated and gamed. The only eval that matters runs on your traffic, your failure modes, your bar—and you own it. ai reliability metrics Leading Senior Engineers in the AI Era: Autonomy, Standards, and Accountability July 14, 2026 · 4 min Leading senior engineers on AI work needs one concrete standard: a definition-of-done built on evals, named failure modes, and escalation triggers. leadership ai teams Sovereignty-by-Design for AI: How to Win Regulated Enterprise Deals July 9, 2026 · 4 min Sovereignty is an architecture you can demonstrate, not a checklist you assert. Trust boundaries decide revenue boundaries. ai privacy governance Agentic Systems at Scale: The New Reliability Contract July 7, 2026 · 4 min Agentic systems need SRE-style reliability contracts with explicit blast-radius limits, fallback paths, and kill switches. ai reliability operations The Anti-Fragile AI Organization July 2, 2026 · 4 min The best AI organizations do not merely survive model churn and vendor shocks; they convert each one into a capability they keep. teams ai reliability The AI Strategy Stack: What Boards Mistake for Moats June 30, 2026 · 4 min Most AI moat claims are distribution theater; durable moats come from routing economics, proprietary workflow data, and operational reliability. strategy ai executive From Model Demos to Profit Engines: The CTO Playbook for AI Unit Economics June 25, 2026 · 3 min AI value is won in routing and failure-cost control, not in picking a single “best” model. ai cost strategy The New Talent Stack: Product, Platform, and Applied AI Must Work as One System June 18, 2026 · 3 min AI organizations create leverage when product, platform, and applied AI are designed as one operating system instead of three kingdoms. teams ai platform-engineering The Executive Case for Local-First AI Infrastructure June 16, 2026 · 3 min Local-first AI is not ideology. It is control over placement, margin, latency, and failure modes. ai architecture cost Decision Latency as a P&L Variable: The Leadership Metric Nobody Owns June 10, 2026 · 2 min Decision latency is measurable and should be treated as a direct cost driver. leadership metrics strategy Designing the AI Leadership Bench: Roles, Interfaces, and Failure Boundaries June 10, 2026 · 2 min AI scaling needs explicit leadership interfaces between product, platform, reliability, and governance. leadership teams ai The Operating Cadence: Turning AI Leadership Interfaces Into Predictable Output June 10, 2026 · 4 min Interfaces describe who owns what. Cadence is what turns those interfaces into compounding output. leadership ai operations The Post-Prototype AI Org: Operating Models That Survive Year Two June 10, 2026 · 3 min Year-two AI failure usually comes from org-design mismatch, not model-quality mismatch. The handoffs are where the system slows down. ai teams leadership The AI Vendor Negotiation Playbook for CTOs June 9, 2026 · 3 min Vendor leverage in AI comes from architecture readiness, eval data, and exit credibility — not procurement theater. ai vendors cost How to Run an AI Incident Review That Changes Architecture, Not Slides June 2, 2026 · 2 min Incident reviews should produce architecture deltas and control updates, not narrative theater. reliability ai governance How Great CTOs Design AI Roadmaps That Survive Contact With Reality May 28, 2026 · 3 min AI roadmaps fail when they are sequenced around ambition instead of dependency, verification, and rollback cost. strategy ai leadership Hiring for AI Teams: The Operator Profile That Actually Scales May 26, 2026 · 3 min The highest-leverage AI hires are operators who can handle ambiguity, systems tradeoffs, and verification pressure. hiring ai leadership Technical Leadership in the AI Era (It’s About Throughput, Not Trends) May 21, 2026 · 3 min Technical leadership in mid-2026: anchor decisions in throughput, verification, and operability instead of chasing the latest agent framework. leadership ai teams Stop Building Internal AI Tools No One Uses May 19, 2026 · 4 min Internal AI tools fail when teams optimize for launch instead of habit formation, trust, and workflow fit. productivity ai leadership Build the System the Model Cannot Break May 14, 2026 · 12 min A manifesto for building AI-native organizations. Twelve tenets across strategy, architecture, economics, and people — and the only test that matters in year two. manifesto ai strategy Why Most AI Platform Teams Become the New Bottleneck May 14, 2026 · 3 min AI platform teams fail when they centralize decisions instead of capabilities. The queue is the bug. platform-engineering ai teams The CTO Communication Protocol: Aligning Engineers, Executives, and Investors in AI Programs May 12, 2026 · 3 min AI programs fail when each layer hears a different success definition. leadership communication ai AI Governance Without Bureaucracy May 7, 2026 · 2 min Effective AI governance is tighter defaults, clearer ownership, and faster escalation — not more committees. governance ai security The Board Deck Is Lying: How to Measure AI Progress Without Theater May 5, 2026 · 3 min Most AI progress reporting confuses activity with value. Executive measurement should collapse around adoption, reliability, margin, and delivery speed. metrics ai executive The 2026 AI Build vs. Buy Calculus (It’s Just Operational Cost) April 30, 2026 · 3 min By mid-2026, AI build vs buy has nothing to do with novelty. It is a ruthless mathematical calculation of telemetry, context freshness, and infrastructure lock-in. build-vs-buy ai architecture Margin, Risk, and Speed: The Three Numbers That Should Drive AI Strategy April 28, 2026 · 2 min Most AI strategy becomes clearer when leadership stops tracking novelty and starts forcing every decision through three numbers. ai metrics strategy AI Production Governance: A Maturity Model April 23, 2026 · 4 min The gap between stable AI features and shipping chaos isn't tools—it's production governance. How mature teams evaluate, deploy, and roll back. governance ai reliability Why Most Enterprise AI Architecture Fails in Year One April 21, 2026 · 3 min In 2026, enterprise AI isn't failing because models are bad. It is failing because organizations are building brittle demos instead of bounded, operable systems. architecture ai reliability AI Capital Allocation: What Great CTOs Stop Funding First April 16, 2026 · 4 min Strong AI strategy starts with a kill list. If a project cannot defend margin, risk, or speed, it should not survive the next budget meeting. ai strategy cost AI Strategy: The CTO Perspective (It's Just Data Infrastructure) April 14, 2026 · 3 min A CTO's AI strategy is not about chasing models. It is about resilient data infrastructure, operational boundaries, and measured throughput. strategy ai cto Sovereign Systems: Building for a World Where Data Privacy Is Non-Optional April 6, 2026 · 6 min Privacy is an architecture constraint, not a feature toggle. Building sovereignty in early avoids painful retrofits and closes enterprise deals faster. privacy security data-residency The Throughput Engineer: Why Headcount Is a Lagging Metric March 30, 2026 · 8 min Headcount is a lagging metric. The best engineering organizations measure throughput: decision speed, defect containment, and constraint removal. engineering-leadership productivity operations AI Agent Operations and the Networking Bottleneck: Why AI Agents Fail on Legacy Infrastructure March 23, 2026 · 7 min Most AI agent failures are infrastructure failures, not model failures. Legacy networking and missing circuit breakers are the real reliability bottleneck. agenticops networking zero-trust De-Risking the Black Swan: Red-Teaming Distributed Databases Before Production March 16, 2026 · 8 min Red-teaming distributed databases before production: most catastrophic failures are compound scenarios nobody practiced, not black swans. distributed-systems databases resilience Beyond Cloud-Heavy Architecture: Why Agentic Systems Need Local-First, Hardware-Aware Design March 9, 2026 · 7 min Local-first, hardware-aware architecture is becoming the default for high-reliability AI: cloud-heavy patterns cost too much and fail unpredictably. agenticops infrastructure hardware AI Startup Landscape 2026 March 2, 2026 · 3 min By early March 2026, the AI startup market looks less like a gold rush and more like a durable industry. Here's where leverage sits and what buyers reward. startups ai business AI Security: Evolving Threats and Defenses February 23, 2026 · 7 min As of late February 2026, AI security is defined by adaptive attacks and layered, operational defenses. security ai threats AI Team Structures 2026: Central, Embedded, and Hybrid Models February 16, 2026 · 8 min A practical guide to central, embedded, and hybrid AI team structures, with roles, tradeoffs, and scaling rules. teams ai organization AI Inference Cost Trends 2026: Model Pricing and Token Costs February 9, 2026 · 11 min AI inference costs are falling, but durable savings come from routing, caching, context control, and cost per outcome. cost ai economics AI Regulation Is Here. Stop Acting Surprised. February 2, 2026 · 7 min Regulation is already in procurement, security reviews, and internal sign-off. Teams that treat compliance as engineering ship faster than those who bolt it on. regulation ai compliance AI-Native Architecture Patterns 2026: Production Guide January 26, 2026 · 7 min Production AI architecture patterns for gateways, retrieval, evaluation, fallbacks, cost control, and ownership. architecture ai patterns Building Reliable AI Agents in Go January 19, 2026 · 6 min Reliable agents are engineered, not prompted: bounded tools, validation at every step, explicit recovery paths. Here's how I build them in Go. agents reliability ai AI Video Applications in Practice January 12, 2026 · 4 min Video AI is practical for scoped workflows. This post covers what works, how to design for reliability, and where human review still matters. video ai applications What I Actually Expect from AI in 2026 January 5, 2026 · 4 min Less hype, more plumbing. Agents get real but stay bounded, routing beats monolithic models, and the winners treat AI like software, not magic. predictions ai trends

2025 26 entries

2025: The Year AI Stopped Being Special Dec 2025 AI in 2025: The Year It Became Boring (Finally) Dec 2025 Scaling AI in the Enterprise Is a Management Problem Nov 2025 AI Incidents Don't Look Like Outages. That's the Problem. Nov 2025 AI Technical Debt Is Eating Your Team Alive (And You Can't Even See It) Oct 2025 AI Doesn't Make Your Team Faster. Shared Infrastructure Does. Oct 2025 Measuring AI ROI Without Lying to Yourself Sep 2025 AI Privacy Is a Plumbing Problem, Not a Policy Problem Sep 2025 AI Pair Programming: It's a Junior Dev, Not a Wizard Sep 2025 Running AI Locally: A Practical Guide for Teams Who Care About Control Aug 2025 AI Workflow Automation: Decisions Are Cheap, Actions Are Expensive Aug 2025 AI Docs That Don't Lie to Your Users Jul 2025 Your AI Metrics Are Measuring the Wrong Thing Jul 2025 Stop Fine-Tuning Models You Haven't Bothered to Prompt Properly Jun 2025 AI Customer Support That Doesn't Make People Hate You Jun 2025 Your AI Pipeline Is Just ETL With Extra Steps (And That's Fine) May 2025 Agent Orchestration: Four Patterns, Honest Tradeoffs May 2025 AI Security: Same Principles, New Attack Surface Apr 2025 Testing AI Where It Actually Runs Apr 2025 Your AI System Looks Healthy. It Is Not. Mar 2025 MCP in Practice: Building Tool Servers in Go Mar 2025 AI Governance That Does Not Suck Mar 2025 Video Understanding AI: What Actually Works Feb 2025 AI Code Review Is Mostly Noise Feb 2025 Reasoning Models in Production: A Practical Guide Jan 2025 AI in 2025: The Year Discipline Wins Jan 2025

2024 30 entries

2025 Will Reward the Boring Teams Dec 2024 2024: The Year AI Got Boring (In a Good Way) Dec 2024 Your AI Infrastructure Is Not Special Dec 2024 Your AI Team Problem Is Not Technical Dec 2024 Picking an AI Model for Production (Late 2024) Nov 2024 AI Safety Is Just Production Engineering Nov 2024 Agent Patterns That Survive Production Oct 2024 AI Cost Benchmarking: What Your Bill Actually Tells You Oct 2024 RAG Retrieval That Actually Works Sep 2024 Let AI Write Your First Draft, Not Your Docs Sep 2024 AI-Assisted Code Migration: What Actually Works Sep 2024 How I Actually Test LLM Features Aug 2024 The Best Model Is the Smallest One That Works Aug 2024 Stop Stuffing Your Context Window Jul 2024 Function Calling Patterns That Survive Production Jul 2024 Claude 3.5 Sonnet Analysis: Cost, Coding, and Model Routing Jun 2024 AI Compliance Without the Theater Jun 2024 Why Your Enterprise AI Pilot Is Stuck Jun 2024 Building Voice AI That People Actually Use May 2024 GPT-4o Changed the Interface, Not the Hard Part May 2024 LLM Structured Output in Go: JSON Schema, Validation, Retries Apr 2024 Most AI Developer Tools Are Not Worth Adopting Yet Apr 2024 Agentic Workflows: From Demo Magic to Production Reality Apr 2024 LLM Prompt Caching in Go: Cut Costs Without Breaking Things Mar 2024 Why I Run Multiple Models in Production Mar 2024 Claude 3 First Impressions: Three Models, One Decision Framework Mar 2024 LLM Evaluation: Stop Shipping on Vibes Feb 2024 Architecting AI-Native Applications (Without the Delusion) Feb 2024 Stop Paying OpenAI to Test Your Prompts Jan 2024 AI Engineering Is Its Own Discipline Now Jan 2024

2023 30 entries

2023: The Year Everything Changed (and I Barely Kept Up) Dec 2023 Your AI Infrastructure Is Not Ready for Scale. Neither Is Mine. Dec 2023 Multimodal AI: Five Use Cases That Actually Work (and Three That Do Not) Dec 2023 Two Weeks With the Assistants API: What I Like, What I Hate Dec 2023 OpenAI DevDay Happened and I Have Opinions Nov 2023 I Tracked My AI-Assisted Coding for Three Months. Here Are the Numbers. Nov 2023 LLM Security: A Field Guide for People Who Ship Things Oct 2023 Responsible AI Is Just Risk Management. Treat It That Way. Oct 2023 AI Technical Debt Is Eating Your Codebase (You Just Cannot See It Yet) Oct 2023 Agent Architecture Patterns That Actually Work in Production Sep 2023 Stop Starting With the Model: AI Product Strategy That Works Sep 2023 LLM Observability: Your Existing Monitoring Is Not Enough Aug 2023 What I Learned Building AI Features Into a Fintech Product Aug 2023 Your LLM Bill Is Your Own Fault Jul 2023 Embedding Models Compared: Retrieval Quality, Cost, and Latency Jul 2023 Most AI Startups Are Wrappers. That's the Problem. Jul 2023 Building Semantic Search in Go: From Embeddings to Production Jun 2023 Restructuring Engineering Orgs After Layoffs Jun 2023 AI Code Review: What It Actually Catches (And What It Misses) May 2023 Fine-Tuning vs. Prompting: A Decision Framework May 2023 LangChain Is the New ORM: Convenient Until It Is Not May 2023 RAG Patterns That Actually Work in Production Apr 2023 Vector Databases: What They Actually Are and When You Need One Apr 2023 Claude vs GPT: A User's Honest Take Mar 2023 AI Safety Is Just Security Engineering With Extra Steps Mar 2023 My First Week Building with GPT-4 Mar 2023 Leading Engineering Teams When Nobody Knows What Is Next Feb 2023 Prompt Engineering Is Not Engineering Feb 2023 LLM Integration Patterns That Actually Survive Production Jan 2023 AI in Production Is Just Engineering. Treat It That Way. Jan 2023

2022 30 entries

2022: The Year the Music Stopped Dec 2022 Your Cloud Bill Is Not a Mystery Dec 2022 Resilient Teams Are Boring Teams Dec 2022 Five Days With ChatGPT Dec 2022 My Honest Take on GitHub Copilot After Six Months Nov 2022 Infrastructure as Code Patterns That Actually Scale Nov 2022 Watching Layoffs From the Inside Nov 2022 Platform Engineering: DevOps Grew Up Nov 2022 Monorepo vs. Polyrepo: A Practical Decision Guide Oct 2022 Engineering Metrics That Actually Matter Oct 2022 You Do Not Need a FinOps Team Oct 2022 Testing Microservices Without Losing Your Mind Sep 2022 Kubernetes Requests and Limits: Lessons From Getting It Wrong Sep 2022 Go Concurrency Patterns I Use in Every Service Aug 2022 Caching: The Easy Part Is Adding It, the Hard Part Is Everything Else Aug 2022 When to Go Async (And When to Resist the Urge) Jul 2022 Container Scanning Without the Security Theater Jul 2022 Rate Limiting: The Boring Feature That Saves You at 3 AM Jun 2022 Your Engineering Docs Are Probably Useless Jun 2022 Distributed Systems Patterns I Keep Reaching For May 2022 TypeScript: A Go Developer's Honest Take May 2022 PostgreSQL Performance: Measure First, Tune Second May 2022 OAuth Tokens: Why They Keep Getting Stolen and How to Stop It Apr 2022 You Probably Don't Need a Service Mesh Apr 2022 Your Onboarding Is Broken. Here's the Fix. Mar 2022 API Versioning: Pick One and Stop Overthinking It Mar 2022 Zero-Downtime Database Migrations Without the Drama Feb 2022 Hardening Kubernetes: The Stuff That Actually Matters Feb 2022 DORA Metrics: Stop Ruining a Good Idea Jan 2022 What Log4j Actually Taught Us Jan 2022

2021 31 entries

2021: The Year Everything We Ignored Caught Fire Dec 2021 The AWS us-east-1 Outage Was Predictable. Your Architecture Was Not Ready. Dec 2021 Log4j Is on Fire. Here's What to Do Right Now. Dec 2021 Terraform at Scale: What Changed Since 2019 Dec 2021 What a 3 AM Outage Taught Me About Incident Management Nov 2021 OpenTelemetry in Late 2021: What's Ready and What's Not Nov 2021 Stop Renaming Your Ops Team to SRE Nov 2021 Most Platform Teams Are Building the Wrong Thing Nov 2021 Event Sourcing in Practice: What I Learned Building Financial Event Pipelines Oct 2021 Your Kubernetes Bill Is Lying to You Oct 2021 GraphQL Federation: I'm Still Skeptical Oct 2021 Most 'Technical Debt' Is Just Decisions You Disagree With Now Sep 2021 Feature Flags at Scale: What Nobody Warns You About Sep 2021 Zero Trust Architecture: What It Actually Looks Like Aug 2021 Database Reliability Engineering: What I've Learned the Hard Way Aug 2021 WebAssembly Beyond the Browser: A 2021 Progress Report Jul 2021 Most Teams Should Just Use Postgres Jul 2021 GitHub Copilot: First Impressions From a Go Developer Jun 2021 Observability-Driven Development Is Just Instrumenting Your Code Jun 2021 Embracing Remote Work: Benefits, Dangers, and Overcoming Challenges Jun 2021 API Gateway Patterns That Actually Work May 2021 Data Engineering Patterns: Batch vs. CDC vs. Streaming May 2021 Hybrid Work Is Harder Than Full Remote May 2021 DevSecOps in Practice: What I Actually Implement Apr 2021 Multi-Cloud Is Mostly a Marketing Strategy Apr 2021 Most Teams Are Not Ready for MLOps Mar 2021 Developer Portals: The Thing Nobody Wants to Build But Everyone Needs Mar 2021 Rust for Cloud Services: A Go Developer's Honest Take Feb 2021 GitOps + Progressive Delivery: How We Stopped Gambling on Deploys Feb 2021 eBPF Is Interesting. I Am Not Sold Yet. Jan 2021 Your Software Supply Chain Is Probably a Mess Jan 2021

2020 31 entries

2020: The Year That Broke the Playbook Dec 2020 SolarWinds Got Owned. Your Build Pipeline Might Be Next. Dec 2020 Your Container Image Scan Passed. Now What? Nov 2020 Apple Silicon Won't Replace Your Servers (Yet) Nov 2020 Your VPN Is a Liability. Here's What Replaces It. Nov 2020 Platform Engineering Is Just DevOps With a Rebrand Oct 2020 API Gateways: Build, Buy, or Regret Oct 2020 What Actually Works for Distributed Teams (Six Months In) Sep 2020 Observability for Small Distributed Teams (What Actually Works) Sep 2020 Most Developer Productivity Metrics Are Management Theater Aug 2020 GraphQL Federation Is Probably Not For You Aug 2020 I Wrote Six Kubernetes Operators. Here's What Actually Matters. Aug 2020 The GitHub Actions Patterns I Actually Use in Production Jul 2020 Event-Driven Architecture: What I Got Wrong and What Survived Jul 2020 Serverless vs Containers: Where the Math Stops Working Jun 2020 Most Chaos Engineering Is Theater Jun 2020 Stop Guessing Your Kubernetes Resource Limits Jun 2020 What I Actually Changed About Engineering Interviews Over Zoom May 2020 gRPC Patterns That Actually Work in Production May 2020 State Of Linux Usability 2020 May 2020 Your VPN Was Never a Security Architecture May 2020 Your Cloud Security Is Falling Apart Right Now Apr 2020 Your Team Isn't Remote. It's Just on Zoom. Apr 2020 Your Business Continuity Plan Is Corporate Theater Apr 2020 Your Video Infrastructure Isn't Ready for What's Coming Mar 2020 Your Team Just Went Remote. Here's What to Do Right Now. Mar 2020 Wasm Outside the Browser: Real Promise, Real Gaps Mar 2020 Comparing Infrastructure Testing Approaches: What Actually Catches Bugs Feb 2020 I Tried Every API Versioning Strategy. Here's the One I Actually Use. Feb 2020 Database Replication Patterns That Actually Matter Jan 2020 My Kubernetes Predictions for 2020 (Most of Yours Are Wrong) Jan 2020

2019 25 entries

2019: The Year I Quit, Built, and Started Over Dec 2019 Your Cloud Bill Is a Design Document Dec 2019 Most Edge Computing Projects Are Premature Optimization Nov 2019 How I Build CLI Tools in Go (And Why I Stopped Overthinking It) Nov 2019 Zero Downtime Deploys Are a Team Habit, Not a Tool Oct 2019 Your Onboarding Is Broken and Everyone Knows It Oct 2019 Your Terraform Monolith Will Break. Here's How to Fix It Before It Does. Sep 2019 Message Queues: The Patterns Nobody Tells You About Until 3 AM Sep 2019 Your Load Tests Are Lying to You Aug 2019 Internal Platforms vs. Ad-Hoc Tooling: Which Developer Experience Actually Wins Aug 2019 Data Mesh Is an Org Chart Fix, Not a Tech One Jul 2019 Your Incident Response Plan Is Useless Until Someone Bleeds Jul 2019 Your Monolith Is Probably Fine Jul 2019 You Probably Don't Need Multi-Region Jun 2019 Your Staging Environment Is Lying to You Jun 2019 Your SLOs Are Probably Useless (Here's How to Fix Them) May 2019 Design for Failure or It Will Design Your Weekend May 2019 Kubernetes Ships Insecure by Default. Here's What to Do About It. Apr 2019 Your Cloud Bill Is Lying to You: A Cost Optimization Comparison Apr 2019 The PostgreSQL Tuning Playbook I Actually Use Mar 2019 Your Internal Platform Is Probably a Liability Mar 2019 Your API Is a Contract You Can't Take Back Feb 2019 GitOps: Stop SSHing Into Production Feb 2019 Migrating to TypeScript Without Losing Your Mind Jan 2019 The Boring Kubernetes Checklist That Actually Keeps Production Alive Jan 2019

2018 27 entries

2018: The Year Tech Got Humbled Dec 2018 Async Job Processing: Patterns That Saved Us at a Fintech Startup Dec 2018 How We Track and Prioritize Tech Debt at a Fintech Startup Dec 2018 Istio: Powerful, Painful, and Probably More Than You Need Nov 2018 What I Learned Scaling an Engineering Team Nov 2018 IaC Patterns That Actually Work Oct 2018 API Rate Limiting: What Actually Works Oct 2018 What I Learned About Code Reviews the Hard Way Oct 2018 What Building Distributed Systems at a Fintech Startup Taught Me About Failure Sep 2018 Serverless: What Works, What Doesn't, and What Will Bite You Sep 2018 Container Security in 2018: What Actually Changed Aug 2018 Database Sharding: You Probably Don't Need It Yet Aug 2018 Securing Microservices: What Actually Works Jul 2018 Why Monitoring Wasn't Enough and How We Built Observability at a Fintech Startup Jul 2018 Making Go Services Fast: What Actually Matters Jun 2018 GraphQL in Production Is Harder Than They Tell You Jun 2018 GDPR Week One: What Actually Happened May 2018 GDPR for Engineers: What We Actually Built at a Fintech Startup May 2018 SRE Principles Are Great. The Cargo-Culting Is Not. Apr 2018 Stop Wasting Everyone's Time in Technical Interviews Apr 2018 Kubernetes Operators: Powerful, but Overhyped Apr 2018 Event Sourcing in Practice: What I Got Right and Wrong Mar 2018 A Go Developer Looks at Rust for Backend Work Mar 2018 Zero Trust Is Not a Product. Here's How We Actually Built It. Feb 2018 Machine Learning for Backend Engineers: What Actually Matters Feb 2018 Two Years of Kubernetes in Production — The Boring Parts Are the Hard Parts Jan 2018 Spectre and Meltdown Broke My Weekend Jan 2018

2017 25 entries

What I Learned Building Our Platform Team This Year Dec 2017 Stop Trying to Fix All Your Tech Debt Dec 2017 Async by Default: Reducing Decision Latency in Distributed Engineering Teams Dec 2017 Your Containers Aren't Secure. Here's What to Actually Do About It. Dec 2017 Service Mesh: You Probably Don't Need One Nov 2017 Stop Counting Code Reviews and Start Reading Them Nov 2017 Your Incident Process Will Break at 15 People. Here's What to Do. Oct 2017 Engineering Manager vs Tech Lead: What's Actually Different Oct 2017 Multi-Region Architecture: What I Wish Someone Had Told Me Oct 2017 Your Startup Doesn't Need a Security Team. It Needs a Security Champion. Sep 2017 Pitching Infrastructure to People Who Don't Care About Infrastructure Sep 2017 You Don't Need to Be Netflix to Break Things on Purpose Aug 2017 Stop Guessing: How I Fix Slow Databases Aug 2017 Stop Doing Security Reviews by Hand Jul 2017 Your Cloud Bill Is Lying to You Jul 2017 Leading Without a Title — What Actually Works Jun 2017 Serverless Patterns That Actually Work in Production Jun 2017 API Versioning: What Actually Works and What Doesn't May 2017 WannaCry Hit. Here's What It Actually Exposed. May 2017 How I Build Data Pipelines That Actually Survive Production Apr 2017 Why We Went Event-Driven (and What Nearly Broke) Apr 2017 Monitoring Is Not Enough Mar 2017 GDPR Is an Engineering Problem, Not a Legal One Feb 2017 GraphQL vs REST: Pick the Boring One Feb 2017 A Year Running Kubernetes in Production — What Actually Happened Jan 2017

2016 26 entries

2016: The Year I Stopped Fighting Infrastructure Dec 2016 Securing APIs: Authentication and Authorization Patterns Dec 2016 Why We Deleted 42 Grafana Panels Dec 2016 Building Effective Engineering Teams Dec 2016 Why We Chose Go for Our Backend Services Nov 2016 The Economics of State: Why Scaling Up Beats Sharding (Until It Doesn't) Nov 2016 The CTO's Guide to Technical Due Diligence Oct 2016 Container Orchestration: Docker Swarm vs Kubernetes vs Mesos Oct 2016 Building a Security-First Engineering Culture Oct 2016 Why Every Developer Should Understand Networking Sep 2016 Log Aggregation at Scale: ELK vs Alternatives Sep 2016 Database Migrations Without Downtime Aug 2016 Hiring Engineers When You Can't Compete on Salary Aug 2016 Building Resilient Systems: Lessons from Production Failures Jul 2016 The Real Cost of Running Your Own Servers in 2016 Jul 2016 Why I Moved Our Infrastructure to Terraform Jun 2016 Continuous Deployment Without the Chaos Jun 2016 Security Incident Response for Startups May 2016 API Design Principles That Stand the Test of Time May 2016 Ansible Won Because It's the Simplest Apr 2016 Postgres vs MySQL in 2016: A Practical Comparison Apr 2016 AWS Lambda: When Serverless Makes Sense (And When It Doesn't) Mar 2016 Building a DevOps Culture from Scratch Mar 2016 The True Cost of Technical Debt Feb 2016 Docker in Production: What We Learned Running Containers at Scale Feb 2016 Why Microservices Aren't Always the Answer Jan 2016