Engineering | 6 min read

What eBPF Changed for Storage Observability

For decades, storage monitoring meant reading counters. You polled /proc/diskstats, computed deltas, and hoped the interval was short enough to catch what mattered. This approach worked well enough when disks were slow and milliseconds of resolution seemed fine. But as storage got faster, the gap between what we could observe and what actually happened inside the kernel grew wider.

eBPF changed this. Not incrementally, but fundamentally. It made it possible to instrument the kernel itself, capturing events as they occur rather than inferring them from counters after the fact. For storage observability, this shift matters more than it might first appear.

What Traditional Approaches Missed

The standard Linux interface for disk statistics, /proc/diskstats, exposes cumulative counters: total reads, total writes, total time spent in I/O. To get rates, you sample at intervals and compute differences. The kernel documentation notes that all fields are cumulative and monotonic, resetting only at boot or device reattachment.[1]

This creates a fundamental resolution problem. If your polling interval is 30 seconds, you get an average over those 30 seconds. A 50-millisecond latency spike that caused application timeouts becomes invisible, smoothed into a normal-looking mean. If your storage subsystem experiences latency problems from seconds 1 through 29 but recovers by second 30, you're working with "moments in time" information that misses the actual behavior.[2]

The alternative was tracing. Tools like blktrace could capture every block I/O event, but at a cost. The original blktrace paper reported less than 2% overhead in stressful I/O situations,[3] but this came with significant operational complexity: trace data had to be written somewhere, often generating gigabytes of output per minute under load. And the traces were post-hoc artifacts, useful for forensic analysis but not real-time visibility.

Latency 0 2ms 50ms Time 0s 30s 60s 50ms spike (invisible to polling) reported: ~2ms avg Actual latency What monitoring sees Sample points (every 30s)
Polling-based monitoring samples at fixed intervals, missing latency spikes that occur between samples.

eBPF: Kernel Instrumentation Without the Cost

eBPF (extended Berkeley Packet Filter) originated as a packet filtering mechanism but evolved into something broader: a way to run sandboxed programs inside the Linux kernel. These programs attach to specific kernel events and execute when those events occur. The kernel verifies each program before loading to guarantee safety, then JIT-compiles it to native machine code for speed.

What makes this different from prior kernel tracing is where the computation happens. Traditional tracing captures events and streams them to user space for processing. eBPF processes events in kernel context, maintaining data structures like histograms and counters without crossing the kernel-user boundary for every event. Only aggregated results transfer to user space.

Academic benchmarking confirms what practitioners observe: eBPF imposes measurable but modest overhead, and critically, data transfer affects eBPF's performance far less than alternatives like SystemTap.[4] The practical result is that you can instrument high-frequency events, including block I/O on busy systems, without significantly impacting the workloads you're trying to observe.

What This Means for Storage

The block I/O layer is where storage requests enter and exit the kernel. Every read and write passes through this layer before reaching the device driver, and completion notifications return through it. This makes it the natural place to observe storage behavior, and eBPF makes that observation practical.

Tools like biolatency from the BCC toolkit attach eBPF programs to block layer tracepoints, recording the time between request submission and completion. Rather than streaming individual events, the program maintains a power-of-two histogram in kernel memory. Only the histogram buckets transfer to user space. The result is a latency distribution with microsecond resolution and, as the tool's documentation notes, negligible overhead in typical use.[5]

This distribution is exactly what counter-based monitoring cannot provide. You see not just the average but the shape: the mode where most I/Os complete, the long tail where some I/Os take far longer, and the outliers that might indicate emerging problems. Research on storage behavior at scale has shown that this tail often tells you more than the mean.[6]

I/O count 64 128 256 512 1ms 2ms 4ms 8ms 16ms Latency (microseconds, log scale) mode: 512us tail latency Normal I/O Tail (>p99)
eBPF-based histograms reveal the full latency distribution, including tail latencies that averages obscure.

In Practice

The companies operating the largest storage fleets have adopted eBPF extensively. Netflix's performance engineering team developed many of the open-source block I/O tools now in common use, including biosnoop and biolatency.[7] They use these tools to identify machines with unacceptably bimodal or latent drives and remove them from distributed database tiers before they cause user-visible problems.

Meta uses eBPF across their infrastructure for profiling, load balancing, and security enforcement. Their Strobelight profiler reportedly reduces CPU cycles and server load by up to 20%.[8] The scheduler extensions project (sched_ext), a joint effort between Meta and Google, uses eBPF to make the Linux scheduler itself extensible at runtime.

Academic research continues to push the boundaries. The XRP project at USENIX OSDI '22 demonstrated that eBPF hooks placed near the NVMe driver can improve B-tree lookup throughput by up to 2.5x compared to normal system calls, by keeping more of the I/O path in kernel context.[9] This suggests eBPF's role in storage will expand beyond observability into active optimization.

Why This Matters Now

Modern NVMe drives complete I/O in tens of microseconds. A drive that averages 100 microseconds but occasionally spikes to 5 milliseconds has a 50x tail, and that tail might be what your database experiences during a critical transaction. Counter-based monitoring at one-second resolution cannot see this. Thirty-second resolution certainly cannot.

eBPF makes the gap visible. It enables continuous observation of every I/O event, summarized efficiently in kernel space, with overhead measured in nanoseconds per event rather than percentage points of system load. For storage teams managing fleets of hundreds or thousands of devices, this changes what questions you can answer. Not just "is the device responding" but "how is latency distributed across all devices right now, and which ones are developing tails that weren't there yesterday."

Conclusion

eBPF didn't invent kernel tracing. What it did was make kernel tracing practical for production use. By moving computation into the kernel, eliminating the per-event cost of user space transfer, and providing safety guarantees that let operators deploy instrumentation without fear of crashes, it crossed a threshold that prior approaches could not.

For storage observability specifically, this means latency distributions instead of latency averages. It means seeing the tail behavior that drives user experience. It means catching degradation patterns as they emerge rather than after applications have already suffered.

The storage devices got fast enough that the old monitoring approaches stopped working - eBPF is what closed the gap.

Rivana

Rivana Storage Monitoring

Fleet-wide latency telemetry and health monitoring for enterprise storage

See how Rivana helps storage teams track latency trends, catch issues early, and maintain SLA compliance across thousands of drives.

References

  1. [1] I/O statistics fields Linux Kernel Documentation
  2. [2] pt-diskstats: I/O Monitoring Tool Percona Toolkit Documentation
  3. [3] Block I/O Layer Tracing: blktrace Brunelle, Gelato-ICE, 2006
  4. [4] Towards eBPF Overhead Quantification: An Exemplary Comparison of eBPF and SystemTap ACM/SPEC ICPE, 2025
  5. [5] Linux eBPF Tracing Tools Brendan Gregg
  6. [6] The Tail at Scale Dean & Barroso, Communications of the ACM, 2013
  7. [7] BPF Performance Tools: Linux System and Application Observability Brendan Gregg, Addison-Wesley, 2019
  8. [8] eBPF Case Studies eBPF Foundation
  9. [9] XRP: In-Kernel Storage Functions with eBPF Zhong et al., USENIX OSDI, 2022