Section / Writing

Writing

Long-form field notes for CEOs, founders, and technical leaders working through AI under real constraints: ownership, reliability, governance, cost, decision latency, and production reality.

The canonical reading path below is the clearest entry point into the current operating-model thesis.

Each post aims to answer five questions:

  • What is the core claim?
  • Why does it matter economically?
  • What operating model makes it work?
  • Where does it fail?
  • What language should a leadership team reuse?

// Canonical reading

  1. No. 01 Build the System the Model Cannot Break An AI-native company is not the one that adopts the model fastest; it is the one whose operating model the model cannot break.
  2. No. 02 The Throughput Engineer: Why Headcount Is a Lagging Metric Headcount is a lagging metric; the real throughput ceiling is how fast an organization can decide.
  3. No. 03 The CTO Communication Protocol: Aligning Engineers, Executives, and Investors in AI Programs AI programs fail when leadership communication stays ad hoc instead of becoming an operating protocol.
  4. No. 04 Why Most AI Platform Teams Become the New Bottleneck A central AI platform team becomes a liability when every workflow improvement has to wait in its queue.

Recent

Technical Leadership in the AI Era (It’s About Throughput, Not Trends) A pragmatic view of technical leadership in mid-2026: Anchor decisions in throughput, verification, and operability rather than chasing the latest autonomous agent framework. leadership ai teams Stop Building Internal AI Tools No One Uses Internal AI tools fail when teams optimize for launch instead of habit formation, trust, and workflow fit. productivity ai leadership Build the System the Model Cannot Break A manifesto for building AI-native organizations. Twelve tenets across strategy, architecture, economics, and people — and the only test that matters in year two. manifesto ai strategy AI Governance Without Bureaucracy Effective AI governance is tighter defaults, clearer ownership, and faster escalation — not more committees. governance ai security The Board Deck Is Lying: How to Measure AI Progress Without Theater Most AI progress reporting confuses activity with value. Executive measurement should collapse around adoption, reliability, margin, and delivery speed. metrics ai executive The 2026 AI Build vs. Buy Calculus (It’s Just Operational Cost) By mid-2026, AI build vs buy has nothing to do with novelty. It is a ruthless mathematical calculation of telemetry, context freshness, and infrastructure lock-in. build-vs-buy ai architecture Margin, Risk, and Speed: The Three Numbers That Should Drive AI Strategy Most AI strategy becomes clearer when leadership stops tracking novelty and starts forcing every decision through three numbers. ai metrics strategy AI Production Governance: A Maturity Model By mid-April 2026, the gap between teams shipping stable AI features and teams shipping chaos isn't tools—it's production governance. Here is how mature teams evaluate, deploy, and rollback. governance ai reliability Why Most Enterprise AI Architecture Fails in Year One In 2026, enterprise AI isn't failing because models are bad. It is failing because organizations are building brittle demos instead of bounded, operable systems. architecture ai reliability AI Capital Allocation: What Great CTOs Stop Funding First Strong AI strategy starts with a kill list. If a project cannot defend margin, risk, or speed, it should not survive the next budget meeting. ai strategy cost AI Strategy: The CTO Perspective (It's Just Data Infrastructure) A CTO's AI strategy in mid-2026 is brutally simple: It is not about chasing models. It is about building resilient data infrastructure, setting operational boundaries, and measuring throughput. strategy ai cto Sovereign Systems: Building for a World Where Data Privacy Is Non-Optional Privacy is an architecture constraint, not a feature toggle. Teams that build sovereignty into their systems early avoid painful retrofits and close enterprise deals faster. privacy security data-residency AI Agent Operations and the Networking Bottleneck: Why AI Agents Fail on Legacy Infrastructure Most AI agent failures are infrastructure failures, not model failures. Legacy networking, flat trust boundaries, and missing circuit breakers are the real reliability bottleneck. agenticops networking zero-trust De-Risking the Black Swan: Red-Teaming Distributed Databases Before Production Structured red-teaming is a practical reliability discipline for distributed databases. Most catastrophic failures are compound scenarios nobody practiced, not black swans. distributed-systems databases resilience Beyond Cloud-Heavy Architecture: Why Agentic Systems Need Local-First, Hardware-Aware Design Local-first, hardware-aware architecture is becoming the default for high-reliability AI systems. The cloud-heavy pattern costs too much and fails too unpredictably for agentic workloads. agenticops infrastructure hardware

Archive

2026 9 posts
AI Startup Landscape 2026 AI Security: Evolving Threats and Defenses AI Team Structures 2026: Central, Embedded, and Hybrid Models AI Inference Cost Trends 2026: Model Pricing and Token Costs AI Regulation Is Here. Stop Acting Surprised. AI-Native Architecture Patterns 2026: Production Guide Building Reliable AI Agents in Go AI Video Applications in Practice What I Actually Expect from AI in 2026
2025 26 posts
2025: The Year AI Stopped Being Special AI in 2025: The Year It Became Boring (Finally) Scaling AI in the Enterprise Is a Management Problem AI Incidents Don't Look Like Outages. That's the Problem. AI Technical Debt Is Eating Your Team Alive (And You Can't Even See It) AI Doesn't Make Your Team Faster. Shared Infrastructure Does. Measuring AI ROI Without Lying to Yourself AI Privacy Is a Plumbing Problem, Not a Policy Problem AI Pair Programming: It's a Junior Dev, Not a Wizard Running AI Locally: A Practical Guide for Teams Who Care About Control AI Workflow Automation: Decisions Are Cheap, Actions Are Expensive AI Docs That Don't Lie to Your Users Your AI Metrics Are Measuring the Wrong Thing Stop Fine-Tuning Models You Haven't Bothered to Prompt Properly AI Customer Support That Doesn't Make People Hate You Your AI Pipeline Is Just ETL With Extra Steps (And That's Fine) Agent Orchestration: Four Patterns, Honest Tradeoffs AI Security: Same Principles, New Attack Surface Testing AI Where It Actually Runs Your AI System Looks Healthy. It Is Not. MCP in Practice: Building Tool Servers in Go AI Governance That Does Not Suck Video Understanding AI: What Actually Works AI Code Review Is Mostly Noise Reasoning Models in Production: A Practical Guide AI in 2025: The Year Discipline Wins
2024 30 posts
2025 Will Reward the Boring Teams 2024: The Year AI Got Boring (In a Good Way) Your AI Infrastructure Is Not Special Your AI Team Problem Is Not Technical Picking an AI Model for Production (Late 2024) AI Safety Is Just Production Engineering Agent Patterns That Survive Production AI Cost Benchmarking: What Your Bill Actually Tells You RAG Retrieval That Actually Works Let AI Write Your First Draft, Not Your Docs AI-Assisted Code Migration: What Actually Works How I Actually Test LLM Features The Best Model Is the Smallest One That Works Stop Stuffing Your Context Window Function Calling Patterns That Survive Production Claude 3.5 Sonnet Analysis: Cost, Coding, and Model Routing AI Compliance Without the Theater Why Your Enterprise AI Pilot Is Stuck Building Voice AI That People Actually Use GPT-4o Changed the Interface, Not the Hard Part LLM Structured Output in Go: JSON Schema, Validation, Retries Most AI Developer Tools Are Not Worth Adopting Yet Agentic Workflows: From Demo Magic to Production Reality LLM Prompt Caching in Go: Cut Costs Without Breaking Things Why I Run Multiple Models in Production Claude 3 First Impressions: Three Models, One Decision Framework LLM Evaluation: Stop Shipping on Vibes Architecting AI-Native Applications (Without the Delusion) Stop Paying OpenAI to Test Your Prompts AI Engineering Is Its Own Discipline Now
2023 30 posts
2023: The Year Everything Changed (and I Barely Kept Up) Your AI Infrastructure Is Not Ready for Scale. Neither Is Mine. Multimodal AI: Five Use Cases That Actually Work (and Three That Do Not) Two Weeks With the Assistants API: What I Like, What I Hate OpenAI DevDay Happened and I Have Opinions I Tracked My AI-Assisted Coding for Three Months. Here Are the Numbers. LLM Security: A Field Guide for People Who Ship Things Responsible AI Is Just Risk Management. Treat It That Way. AI Technical Debt Is Eating Your Codebase (You Just Cannot See It Yet) Agent Architecture Patterns That Actually Work in Production Stop Starting With the Model: AI Product Strategy That Works LLM Observability: Your Existing Monitoring Is Not Enough What I Learned Building AI Features Into a Fintech Product Your LLM Bill Is Your Own Fault Embedding Models Compared: Retrieval Quality, Cost, and Latency Most AI Startups Are Wrappers. That's the Problem. Building Semantic Search in Go: From Embeddings to Production Restructuring Engineering Orgs After Layoffs AI Code Review: What It Actually Catches (And What It Misses) Fine-Tuning vs. Prompting: A Decision Framework LangChain Is the New ORM: Convenient Until It Is Not RAG Patterns That Actually Work in Production Vector Databases: What They Actually Are and When You Need One Claude vs GPT: A User's Honest Take AI Safety Is Just Security Engineering With Extra Steps My First Week Building with GPT-4 Leading Engineering Teams When Nobody Knows What Is Next Prompt Engineering Is Not Engineering LLM Integration Patterns That Actually Survive Production AI in Production Is Just Engineering. Treat It That Way.
2022 30 posts
2022: The Year the Music Stopped Your Cloud Bill Is Not a Mystery Resilient Teams Are Boring Teams Five Days With ChatGPT My Honest Take on GitHub Copilot After Six Months Infrastructure as Code Patterns That Actually Scale Watching Layoffs From the Inside Platform Engineering: DevOps Grew Up Monorepo vs. Polyrepo: A Practical Decision Guide Engineering Metrics That Actually Matter You Do Not Need a FinOps Team Testing Microservices Without Losing Your Mind Kubernetes Requests and Limits: Lessons From Getting It Wrong Go Concurrency Patterns I Use in Every Service Caching: The Easy Part Is Adding It, the Hard Part Is Everything Else When to Go Async (And When to Resist the Urge) Container Scanning Without the Security Theater Rate Limiting: The Boring Feature That Saves You at 3 AM Your Engineering Docs Are Probably Useless Distributed Systems Patterns I Keep Reaching For TypeScript: A Go Developer's Honest Take PostgreSQL Performance: Measure First, Tune Second OAuth Tokens: Why They Keep Getting Stolen and How to Stop It You Probably Don't Need a Service Mesh Your Onboarding Is Broken. Here's the Fix. API Versioning: Pick One and Stop Overthinking It Zero-Downtime Database Migrations Without the Drama Hardening Kubernetes: The Stuff That Actually Matters DORA Metrics: Stop Ruining a Good Idea What Log4j Actually Taught Us
2021 31 posts
2021: The Year Everything We Ignored Caught Fire The AWS us-east-1 Outage Was Predictable. Your Architecture Was Not Ready. Log4j Is on Fire. Here's What to Do Right Now. Terraform at Scale: What Changed Since 2019 What a 3 AM Outage Taught Me About Incident Management OpenTelemetry in Late 2021: What's Ready and What's Not Stop Renaming Your Ops Team to SRE Most Platform Teams Are Building the Wrong Thing Event Sourcing in Practice: What I Learned Building Financial Event Pipelines Your Kubernetes Bill Is Lying to You GraphQL Federation: I'm Still Skeptical Most 'Technical Debt' Is Just Decisions You Disagree With Now Feature Flags at Scale: What Nobody Warns You About Zero Trust Architecture: What It Actually Looks Like Database Reliability Engineering: What I've Learned the Hard Way WebAssembly Beyond the Browser: A 2021 Progress Report Most Teams Should Just Use Postgres GitHub Copilot: First Impressions From a Go Developer Observability-Driven Development Is Just Instrumenting Your Code Embracing Remote Work: Benefits, Dangers, and Overcoming Challenges API Gateway Patterns That Actually Work Data Engineering Patterns: Batch vs. CDC vs. Streaming Hybrid Work Is Harder Than Full Remote DevSecOps in Practice: What I Actually Implement Multi-Cloud Is Mostly a Marketing Strategy Most Teams Are Not Ready for MLOps Developer Portals: The Thing Nobody Wants to Build But Everyone Needs Rust for Cloud Services: A Go Developer's Honest Take GitOps + Progressive Delivery: How We Stopped Gambling on Deploys eBPF Is Interesting. I Am Not Sold Yet. Your Software Supply Chain Is Probably a Mess
2020 31 posts
2020: The Year That Broke the Playbook SolarWinds Got Owned. Your Build Pipeline Might Be Next. Your Container Image Scan Passed. Now What? Apple Silicon Won't Replace Your Servers (Yet) Your VPN Is a Liability. Here's What Replaces It. Platform Engineering Is Just DevOps With a Rebrand API Gateways: Build, Buy, or Regret What Actually Works for Distributed Teams (Six Months In) Observability for Small Distributed Teams (What Actually Works) Most Developer Productivity Metrics Are Management Theater GraphQL Federation Is Probably Not For You I Wrote Six Kubernetes Operators. Here's What Actually Matters. The GitHub Actions Patterns I Actually Use in Production Event-Driven Architecture: What I Got Wrong and What Survived Serverless vs Containers: Where the Math Stops Working Most Chaos Engineering Is Theater Stop Guessing Your Kubernetes Resource Limits What I Actually Changed About Engineering Interviews Over Zoom gRPC Patterns That Actually Work in Production Your VPN Was Never a Security Architecture State Of Linux Usability 2020 Your Cloud Security Is Falling Apart Right Now Your Team Isn't Remote. It's Just on Zoom. Your Business Continuity Plan Is Corporate Theater Your Video Infrastructure Isn't Ready for What's Coming Your Team Just Went Remote. Here's What to Do Right Now. Wasm Outside the Browser: Real Promise, Real Gaps Comparing Infrastructure Testing Approaches: What Actually Catches Bugs I Tried Every API Versioning Strategy. Here's the One I Actually Use. Database Replication Patterns That Actually Matter My Kubernetes Predictions for 2020 (Most of Yours Are Wrong)
2019 25 posts
2019: The Year I Quit, Built, and Started Over Your Cloud Bill Is a Design Document Most Edge Computing Projects Are Premature Optimization How I Build CLI Tools in Go (And Why I Stopped Overthinking It) Zero Downtime Deploys Are a Team Habit, Not a Tool Your Onboarding Is Broken and Everyone Knows It Your Terraform Monolith Will Break. Here's How to Fix It Before It Does. Message Queues: The Patterns Nobody Tells You About Until 3 AM Your Load Tests Are Lying to You Internal Platforms vs. Ad-Hoc Tooling: Which Developer Experience Actually Wins Data Mesh Is an Org Chart Fix, Not a Tech One Your Incident Response Plan Is Useless Until Someone Bleeds Your Monolith Is Probably Fine You Probably Don't Need Multi-Region Your Staging Environment Is Lying to You Your SLOs Are Probably Useless (Here's How to Fix Them) Design for Failure or It Will Design Your Weekend Kubernetes Ships Insecure by Default. Here's What to Do About It. Your Cloud Bill Is Lying to You: A Cost Optimization Comparison The PostgreSQL Tuning Playbook I Actually Use Your Internal Platform Is Probably a Liability Your API Is a Contract You Can't Take Back GitOps: Stop SSHing Into Production Migrating to TypeScript Without Losing Your Mind The Boring Kubernetes Checklist That Actually Keeps Production Alive
2018 27 posts
2018: The Year Tech Got Humbled Async Job Processing: Patterns That Saved Us at a Fintech Startup How We Track and Prioritize Tech Debt at a Fintech Startup Istio: Powerful, Painful, and Probably More Than You Need What I Learned Scaling an Engineering Team IaC Patterns That Actually Work API Rate Limiting: What Actually Works What I Learned About Code Reviews the Hard Way What Building Distributed Systems at a Fintech Startup Taught Me About Failure Serverless: What Works, What Doesn't, and What Will Bite You Container Security in 2018: What Actually Changed Database Sharding: You Probably Don't Need It Yet Securing Microservices: What Actually Works Why Monitoring Wasn't Enough and How We Built Observability at a Fintech Startup Making Go Services Fast: What Actually Matters GraphQL in Production Is Harder Than They Tell You GDPR Week One: What Actually Happened GDPR for Engineers: What We Actually Built at a Fintech Startup SRE Principles Are Great. The Cargo-Culting Is Not. Stop Wasting Everyone's Time in Technical Interviews Kubernetes Operators: Powerful, but Overhyped Event Sourcing in Practice: What I Got Right and Wrong A Go Developer Looks at Rust for Backend Work Zero Trust Is Not a Product. Here's How We Actually Built It. Machine Learning for Backend Engineers: What Actually Matters Two Years of Kubernetes in Production — The Boring Parts Are the Hard Parts Spectre and Meltdown Broke My Weekend
2017 25 posts
What I Learned Building Our Platform Team This Year Stop Trying to Fix All Your Tech Debt Async by Default: Reducing Decision Latency in Distributed Engineering Teams Your Containers Aren't Secure. Here's What to Actually Do About It. Service Mesh: You Probably Don't Need One Stop Counting Code Reviews and Start Reading Them Your Incident Process Will Break at 15 People. Here's What to Do. Engineering Manager vs Tech Lead: What's Actually Different Multi-Region Architecture: What I Wish Someone Had Told Me Your Startup Doesn't Need a Security Team. It Needs a Security Champion. Pitching Infrastructure to People Who Don't Care About Infrastructure You Don't Need to Be Netflix to Break Things on Purpose Stop Guessing: How I Fix Slow Databases Stop Doing Security Reviews by Hand Your Cloud Bill Is Lying to You Leading Without a Title — What Actually Works Serverless Patterns That Actually Work in Production API Versioning: What Actually Works and What Doesn't WannaCry Hit. Here's What It Actually Exposed. How I Build Data Pipelines That Actually Survive Production Why We Went Event-Driven (and What Nearly Broke) Monitoring Is Not Enough GDPR Is an Engineering Problem, Not a Legal One GraphQL vs REST: Pick the Boring One A Year Running Kubernetes in Production — What Actually Happened
2016 26 posts
2016: The Year I Stopped Fighting Infrastructure Securing APIs: Authentication and Authorization Patterns Why We Deleted 42 Grafana Panels Building Effective Engineering Teams Why We Chose Go for Our Backend Services The Economics of State: Why Scaling Up Beats Sharding (Until It Doesn't) The CTO's Guide to Technical Due Diligence Container Orchestration: Docker Swarm vs Kubernetes vs Mesos Building a Security-First Engineering Culture Why Every Developer Should Understand Networking Log Aggregation at Scale: ELK vs Alternatives Database Migrations Without Downtime Hiring Engineers When You Can't Compete on Salary Building Resilient Systems: Lessons from Production Failures The Real Cost of Running Your Own Servers in 2016 Why I Moved Our Infrastructure to Terraform Continuous Deployment Without the Chaos Security Incident Response for Startups API Design Principles That Stand the Test of Time Ansible Won Because It's the Simplest Postgres vs MySQL in 2016: A Practical Comparison AWS Lambda: When Serverless Makes Sense (And When It Doesn't) Building a DevOps Culture from Scratch The True Cost of Technical Debt Docker in Production: What We Learned Running Containers at Dropbyke Why Microservices Aren't Always the Answer