Architecture

Definition

Architecture coverage in this archive spans 65 posts from Jan 2016 to Apr 2026 and deals with structural tradeoffs: coupling, failure boundaries, and long-term change cost. The strongest adjacent threads are ai, engineering, and go. Recurring title motifs include patterns, production, api, and architecture.

What the archive argues

Architecture is simply the set of decisions that are expensive or impossible to change later. Most pieces recommend choosing the simplest model that can be operated confidently at your target throughput.
Modern platform architecture requires ruthless cost-discipline. When native runtimes bottleneck, target paths should be rewritten using low-level, hardware-aware optimization.
This topic repeatedly intersects with AI integration, scaling engineering teams, and production telemetry.

Execution checklist

Define failure domains and data boundaries before introducing additional services or protocols.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read ai and engineering before committing implementation details.

Common failure modes

Breaking systems into many parts without clear ownership of cross-service behavior.
Choosing architecture for trend alignment rather than workload constraints.
Applying guidance from 2016 to 2026 without revisiting assumptions as context changed.

References

Sovereign Systems: Building for a World Where Data Privacy Is Non-Optional

Apr 2026

Privacy is an architecture constraint, not a feature toggle. Teams that build sovereignty into their systems early avoid painful retrofits and close enterprise deals faster.

AI-Native Architecture Patterns 2026

Jan 2026

As of late January 2026, AI-native architecture is a stable discipline with repeatable patterns for delivery, safety, and change management.

Agent Orchestration: Four Patterns, Honest Tradeoffs

May 2025

Multi-agent systems aren't magic. They're distributed systems with all the usual coordination headaches. Here are the four patterns I've seen work, and when each one falls apart.

MCP in Practice: Building Tool Servers in Go

Mar 2025

Model Context Protocol promises to standardize how AI talks to tools. I built an MCP server in Go to see if the promise holds up. Here's what I found.

Your AI Infrastructure Is Not Special

Dec 2024

AI infrastructure at scale is just infrastructure. The same boring patterns -- gateways, caching, circuit breakers, budget enforcement -- solve the same boring problems.

Agent Patterns That Survive Production

Oct 2024

Single-prompt agents break on real tasks. Plan-execute-replan, orchestrated specialists, structured memory, and explicit recovery -- in Go -- are what actually works.

The Best Model Is the Smallest One That Works

Aug 2024

Everyone reaches for GPT-4 by default. Most production tasks don't need it. Small models are faster, cheaper, and often better when the task is well-defined.

Stop Stuffing Your Context Window

Jul 2024

Bigger context windows aren't an excuse to stop thinking about what goes into them. Most teams are paying for irrelevant tokens and wondering why quality degrades.

Why I Run Multiple Models in Production

Mar 2024

Betting on a single model provider is like having a single database with no failover. Here is why multi-model is the only sane production strategy.

Architecting AI-Native Applications (Without the Delusion)

Feb 2024

The architecture of an AI-native app is fundamentally different from bolting a model onto a CRUD app. Here is how I structure them -- with code, layers, and hard-won opinions.

OpenAI DevDay Happened and I Have Opinions

Nov 2023

OpenAI DevDay was not just a product launch. It was a platform play that changes the build-vs-buy calculus for every team shipping AI features.

Agent Architecture Patterns That Actually Work in Production

Sep 2023

Most agent demos are impressive. Most agent production systems are not. Here is what separates the two.

RAG Patterns That Actually Work in Production

Apr 2023

RAG is the default architecture for grounding LLMs in private data. Here are the patterns that survive real traffic, with Go examples from production systems.

My First Week Building with GPT-4

Mar 2023

GPT-4 landed and everything changed. What I learned in the first week of building with it, and the architecture decisions that followed.

LLM Integration Patterns That Actually Survive Production

Jan 2023

Practical patterns for integrating LLMs into real applications -- prompt management, structured outputs, caching, fallbacks, and tool use -- with Go examples.

Monorepo vs. Polyrepo: A Practical Decision Guide

Oct 2022

Monorepo or polyrepo depends on coupling, team shape, and your appetite for build tooling. Here is how to decide without getting religious about it.

When to Go Async (And When to Resist the Urge)

Jul 2022

Async patterns solve real problems -- bursty traffic, slow dependencies, decoupled teams. But the complexity tax is real. Lessons from building event-driven systems at Decloud.

Rate Limiting: The Boring Feature That Saves You at 3 AM

Jun 2022

Rate limiting algorithms, implementation tradeoffs, and practical lessons from building limiters for high-traffic APIs at a real-time messaging company.

Distributed Systems Patterns I Keep Reaching For

May 2022

The patterns that actually survive production across failure handling, consistency, messaging, coordination, and scaling.

You Probably Don't Need a Service Mesh

Apr 2022

Service meshes solve real problems at real scale. But most teams adopt them before the problems exist. Here's how to decide honestly.

API Versioning: Pick One and Stop Overthinking It

Mar 2022

API versioning is a maintenance commitment, not a design exercise. URL paths win for public APIs, headers for internal ones. The real discipline is not versioning -- it's avoiding breaking changes in the first place.

The AWS us-east-1 Outage Was Predictable. Your Architecture Was Not Ready.

Dec 2021

December 7 reminded everyone that us-east-1 is a single point of failure for half the internet. Again. I am annoyed.

Event Sourcing in Practice: What I Learned Building Financial Event Pipelines

Oct 2021

Event sourcing is powerful but expensive to get wrong. Here's what actually works, with Go code, drawn from building event pipelines at the fintech startup.

Most 'Technical Debt' Is Just Decisions You Disagree With Now

Sep 2021

The term 'technical debt' has become meaningless. Everything inconvenient is debt. Here's what it actually is, when it matters, and why most teams handle it wrong.

Zero Trust Architecture: What It Actually Looks Like

Aug 2021

Zero trust from two perspectives: my NATO background in defense systems and work at a major telecom. The architecture patterns, the implementation path, and what most companies get wrong.

Most Teams Should Just Use Postgres

Jul 2021

Serverless databases are solving problems most teams don't have. Here's why Postgres with a connection pooler is still the right answer.

API Gateway Patterns That Actually Work

May 2021

Edge gateways, BFFs, and service mesh ingress -- what I've learned running them at Decloud and at large telecoms.

Multi-Cloud Is Mostly a Marketing Strategy

Apr 2021

Multi-cloud sounds great in vendor pitches. In practice, it doubles your operational burden for benefits most teams will never need.

API Gateways: Build, Buy, or Regret

Oct 2020

I've built a custom Go gateway, run Kong in prod, evaluated Envoy, and used managed cloud gateways. Here's what I actually recommend after doing all of them wrong at least once.

GraphQL Federation Is Probably Not For You

Aug 2020

Most teams adopting GraphQL federation don't need it. A frank take on when it makes sense, when REST is fine, and why conference talks are a terrible basis for architecture decisions.

Event-Driven Architecture: What I Got Wrong and What Survived

Jul 2020

Lessons from building event-driven systems at the fintech startup and Decloud. What actually works, what silently corrupts your data, and Go patterns for handling events without losing your mind.

Serverless vs Containers: Where the Math Stops Working

Jun 2020

Serverless is great until it isn't. A comparison of serverless and containers at different traffic scales, with actual numbers on where the economics flip.

I Tried Every API Versioning Strategy. Here's the One I Actually Use.

Feb 2020

After dealing with versioning messes at multiple companies, I landed on URL path versioning for anything public. Here's why the alternatives didn't survive contact with reality.

Database Replication Patterns That Actually Matter

Jan 2020

A practical breakdown of replication modes, topologies, and the tradeoffs between consistency, availability, and not losing your users' data at 3am.

Your Cloud Bill Is a Design Document

Dec 2019

Cloud cost management isn't a finance problem. It's an architecture problem disguised as a spreadsheet. Here's how to treat your AWS bill like the engineering signal it actually is.

Most Edge Computing Projects Are Premature Optimization

Nov 2019

Edge computing is real, but most teams adopting it don't have an edge problem. They have an architecture problem they're solving with geography.

Message Queues: The Patterns Nobody Tells You About Until 3 AM

Sep 2019

Queues look simple on a whiteboard. Then you deploy them. Here are the messaging patterns I've learned the hard way across three startups, with Go code and real failure stories.

Data Mesh Is an Org Chart Fix, Not a Tech One

Jul 2019

Most data problems are ownership problems. Data mesh gets that right. But adopting it as an architecture diagram exercise misses the point entirely.

Your Monolith Is Probably Fine

Jul 2019

Most teams shouldn't be migrating to microservices. Here's how to tell if you actually should, and how to do it without wrecking your delivery for eighteen months.

You Probably Don't Need Multi-Region

Jun 2019

Multi-region architecture is a strategic decision most teams make too early. Here's when it actually pays off, the patterns that work, and why data is the part that will ruin your week.

Design for Failure or It Will Design Your Weekend

May 2019

Failure is not an edge case. It is the default state you temporarily hold off with good engineering. A few hard-won rules for building systems that bend instead of shatter.

Async Job Processing: Patterns That Saved Us at a Fintech Startup

Dec 2018

Hard-won patterns for reliable background job processing -- queues, retries, idempotency, and the failures that taught me to care about all three.

API Rate Limiting: What Actually Works

Oct 2018

Algorithms, headers, and deployment patterns for rate limiting APIs -- drawn from building financial data services at the fintech startup.

What Building Distributed Systems at a Fintech Startup Taught Me About Failure

Sep 2018

Hard-won lessons from designing distributed systems that survive real-world failures -- timeouts, retries, bulkheads, and the operational habits that actually keep things running.

Serverless: What Works, What Doesn't, and What Will Bite You

Sep 2018

Real patterns and antipatterns from running serverless at the fintech startup. Where Lambda shines, where it hurts, and how to tell the difference before it's too late.

Database Sharding: You Probably Don't Need It Yet

Aug 2018

Most teams shard too early. Here's how we thought about it at the fintech startup, when it actually makes sense, and the SQL-level decisions that matter most.

Securing Microservices: What Actually Works

Jul 2018

You split the monolith. Now every service-to-service call is an attack surface. Here's how I think about identity, authorization, encryption, and secrets management in distributed systems.

Event Sourcing in Practice: What I Got Right and Wrong

Mar 2018

Lessons from building event-sourced systems at the fintech startup -- the patterns that held up, the modeling mistakes that bit us, and the operational realities nobody warns you about.

Zero Trust Is Not a Product. Here's How We Actually Built It.

Feb 2018

Perimeter security is dead. At the fintech startup, I ripped out the castle-and-moat model and replaced it with zero trust — identity-first, micro-segmented, no implicit trust anywhere. Here's what that actually looked like.

Stop Trying to Fix All Your Tech Debt

Dec 2017

A two-number scoring system for tech debt that tells you what to fix now, what to schedule, and what to quietly accept.

Multi-Region Architecture: What I Wish Someone Had Told Me

Oct 2017

We serve financial data to users across Europe at the fintech startup. Here's what I've learned about going multi-region -- the patterns that work, the ones that burn you, and when you should even bother.

Serverless Patterns That Actually Work in Production

Jun 2017

Most serverless tutorials teach you the wrong thing. Here's what matters when you're running it for real.

API Versioning: What Actually Works and What Doesn't

May 2017

We tried multiple API versioning approaches at the fintech startup. URL path versioning won. Here's why, plus how to handle deprecation without burning your consumers.

How I Build Data Pipelines That Actually Survive Production

Apr 2017

Every pipeline I've built at the fintech startup broke at some point. Here's the design approach that made them recoverable instead of catastrophic.

Why We Went Event-Driven (and What Nearly Broke)

Apr 2017

Lessons from building event-driven systems at the fintech startup and Dropbyke -- what worked, what broke, and why I'd do it again.

GraphQL vs REST: Pick the Boring One

Feb 2017

Everyone wants to debate GraphQL vs REST like it's a religion. It's not. One reduces round trips, the other is dead simple to cache. Here's how I actually decide.

Why We Chose Go for Our Backend Services

Nov 2016

How Go became the default backend language at Dropbyke and a fintech startup, what it replaced, and the honest tradeoffs we accepted along the way.

The Economics of State: Why Scaling Up Beats Sharding (Until It Doesn't)

Nov 2016

A production-grounded case for exhausting single-server headroom with pooling, replicas, and partitioning before taking on sharding complexity.

Building Resilient Systems: Lessons from Production Failures

Jul 2016

Production incidents show where architecture bends and where it breaks. These lessons focus on designing for failure, limiting blast radius, and making recovery routine.

API Design Principles That Stand the Test of Time

May 2016

Lessons from building the fintech startup's financial data API: the REST conventions that actually matter, the ones that don't, and why consistency beats cleverness every time.

Postgres vs MySQL in 2016: A Practical Comparison

Apr 2016

A grounded look at PostgreSQL and MySQL as of April 2016, focusing on integrity, query power, and operational tradeoffs rather than benchmark hype.

AWS Lambda: When Serverless Makes Sense (And When It Doesn't)

Mar 2016

Lambda is a sharp tool for specific jobs. The problem is everyone wants to use it for everything.

The True Cost of Technical Debt

Feb 2016

A pragmatic look at technical debt in 2016: what it is, how it shows up, how to measure it, and how to make a business case for paying it down without stalling delivery.

Why Microservices Aren't Always the Answer

Jan 2016

Most teams adopt microservices too early and pay for complexity they don't need yet. A well-structured monolith is faster, simpler, and keeps your options open.

Architecture

Definition

What the archive argues

Execution checklist

Common failure modes

Suggested reading path

Related posts

References

Sovereign Systems: Building for a World Where Data Privacy Is Non-Optional

AI-Native Architecture Patterns 2026

Agent Orchestration: Four Patterns, Honest Tradeoffs

MCP in Practice: Building Tool Servers in Go

Your AI Infrastructure Is Not Special

Agent Patterns That Survive Production

The Best Model Is the Smallest One That Works

Stop Stuffing Your Context Window

Why I Run Multiple Models in Production

Architecting AI-Native Applications (Without the Delusion)

OpenAI DevDay Happened and I Have Opinions

Agent Architecture Patterns That Actually Work in Production

RAG Patterns That Actually Work in Production

My First Week Building with GPT-4

LLM Integration Patterns That Actually Survive Production

Monorepo vs. Polyrepo: A Practical Decision Guide

When to Go Async (And When to Resist the Urge)

Rate Limiting: The Boring Feature That Saves You at 3 AM

Distributed Systems Patterns I Keep Reaching For

You Probably Don't Need a Service Mesh

API Versioning: Pick One and Stop Overthinking It

The AWS us-east-1 Outage Was Predictable. Your Architecture Was Not Ready.

Event Sourcing in Practice: What I Learned Building Financial Event Pipelines

Most 'Technical Debt' Is Just Decisions You Disagree With Now

Zero Trust Architecture: What It Actually Looks Like

Most Teams Should Just Use Postgres

API Gateway Patterns That Actually Work

Multi-Cloud Is Mostly a Marketing Strategy

API Gateways: Build, Buy, or Regret

GraphQL Federation Is Probably Not For You

Event-Driven Architecture: What I Got Wrong and What Survived

Serverless vs Containers: Where the Math Stops Working

I Tried Every API Versioning Strategy. Here's the One I Actually Use.

Database Replication Patterns That Actually Matter

Your Cloud Bill Is a Design Document

Most Edge Computing Projects Are Premature Optimization

Message Queues: The Patterns Nobody Tells You About Until 3 AM

Data Mesh Is an Org Chart Fix, Not a Tech One

Your Monolith Is Probably Fine

You Probably Don't Need Multi-Region

Design for Failure or It Will Design Your Weekend

Async Job Processing: Patterns That Saved Us at a Fintech Startup

API Rate Limiting: What Actually Works

What Building Distributed Systems at a Fintech Startup Taught Me About Failure

Serverless: What Works, What Doesn't, and What Will Bite You

Database Sharding: You Probably Don't Need It Yet

Securing Microservices: What Actually Works

Event Sourcing in Practice: What I Got Right and Wrong

Zero Trust Is Not a Product. Here's How We Actually Built It.

Stop Trying to Fix All Your Tech Debt

Multi-Region Architecture: What I Wish Someone Had Told Me

Serverless Patterns That Actually Work in Production

API Versioning: What Actually Works and What Doesn't

How I Build Data Pipelines That Actually Survive Production

Why We Went Event-Driven (and What Nearly Broke)

GraphQL vs REST: Pick the Boring One

Why We Chose Go for Our Backend Services

The Economics of State: Why Scaling Up Beats Sharding (Until It Doesn't)

Building Resilient Systems: Lessons from Production Failures

API Design Principles That Stand the Test of Time

Postgres vs MySQL in 2016: A Practical Comparison

AWS Lambda: When Serverless Makes Sense (And When It Doesn't)

The True Cost of Technical Debt

Why Microservices Aren't Always the Answer