eBPF Is Interesting. I Am Not Sold Yet.

| 3 min read |
ebpf observability linux monitoring

eBPF promises kernel-level observability without the pain of kernel modules. The tech is real. The hype-to-adoption ratio concerns me.

eBPF is the most overhyped technology in the observability space right now, and it might also be the most important.

That’s not a contradiction. I’ve been running Linux in production since before containers were a thing. The idea of safely running custom programs inside the kernel – attaching to tracepoints, kprobes, uprobes, without writing a kernel module or rebooting anything – is genuinely exciting. When I first ran bpftrace against a production system and got per-process syscall counts in real time with near-zero overhead, I understood the appeal immediately.

bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

That one-liner gives you more insight into what your system is actually doing than most monitoring stacks costing six figures a year. The kernel verifier checks your program for safety. Data flows through maps and perf buffers. No agent bloat. No sampling artifacts. Just direct observation at the source.

So why am I skeptical?

Because the gap between “this is technically possible” and “my team can operate this in production” is enormous. And the eBPF community seems uninterested in acknowledging that gap.

The promise is real

I don’t want to undersell what eBPF enables. Traditional monitoring gives you counters, logs, and coarse sampling. Fine for dashboards. Terrible for understanding why a specific request took 800ms when the p50 is 12ms. eBPF lets you attach instrumentation at the exact point where something happens. Syscall latency. TCP retransmits by destination. Filesystem I/O by process. All with filtering done in-kernel so you aren’t drowning user space in data.

For container-dense environments – which is everything I work with these days – the ability to map kernel events to cgroups and namespaces is a game changer. Short-lived processes that vanish before your log collector notices? eBPF sees them.

The reality check

Here’s my problem. Every conference talk shows eBPF solving elegant debugging puzzles. Nobody talks about the operational burden.

Kernel version compatibility is a real issue. eBPF features vary across kernel versions, and the enterprise Linux distributions I see in production aren’t exactly bleeding edge. A program that works on kernel 5.10 might not work on 4.18. BTF (BPF Type Format) availability is inconsistent. CO-RE (Compile Once, Run Everywhere) helps but isn’t universally supported yet.

Then there’s the expertise problem. Writing eBPF programs isn’t like writing application code. You need to understand kernel internals, verifier constraints, and the performance implications of your hook points. Most engineering teams I work with can’t spare someone to become the eBPF specialist. They need tools that work out of the box.

BCC, bpftrace, and the growing ecosystem of pre-built tools help. Brendan Gregg’s work has been invaluable. But “install bcc-tools and run execsnoop” is a long way from “build a production observability pipeline backed by eBPF.”

Where I land

eBPF is infrastructure technology. It’ll become the foundation that observability vendors build on. Cilium is already proving this for networking. The profiling tools are getting there. Give it two or three more years and it will be invisible plumbing that powers your monitoring stack.

But right now, in early 2021, if someone tells me they’re building their observability strategy around eBPF, I ask two questions: what kernel version are you running, and who on your team understands the verifier? If they can’t answer both, they should start with existing tools – opensnoop, tcpconnect, biolatency – and build intuition before writing custom programs.

The technology is sound. The ecosystem is maturing. I’m watching closely. I’m just not rewriting my monitoring stack around it today.