LLM Prompt Caching in Go: Cut Costs Without Breaking Things
Caching LLM responses is the highest-leverage optimization most teams are not doing. Here is how I implement it in Go, with real patterns for keys, invalidation, and safety.
Performance coverage in this archive spans 10 posts from Aug 2017 to Mar 2024 and deals with structural tradeoffs: coupling, failure boundaries, and long-term change cost. The strongest adjacent threads are go, backend, and optimization. Recurring title motifs include go, caching, part, and postgresql.
Caching LLM responses is the highest-leverage optimization most teams are not doing. Here is how I implement it in Go, with real patterns for keys, invalidation, and safety.
Cache-aside, write-through, invalidation strategies, and the failure modes that will wake you up at night. With Go examples.
Most Postgres performance problems are indexing problems. The rest are vacuum problems. Here's how to find and fix both.
I write Go for a living. Rust is not replacing it. But I have to be honest about where Rust wins.
eBPF promises kernel-level observability without the pain of kernel modules. The tech is real. The hype-to-adoption ratio concerns me.
Most load tests produce comforting numbers instead of useful answers. Here's what I learned the hard way about getting honest results.
Battle-tested PostgreSQL tuning from running fintech and startup workloads: connection pooling, memory sizing, index discipline, vacuum management, and the queries that tell you what's broken.
Practical patterns for squeezing performance out of Go services — profiling, allocation control, bounded concurrency, and HTTP/DB tuning from real production work.
I write Go every day at the fintech startup. Here's why I've been spending evenings with Rust, what impressed me, and where it still hurts.
The repeatable process I use at the fintech startup to diagnose and fix database performance problems instead of throwing random indexes at the wall.