Building an eBPF Container Security Monitor: Debugging Through the Pain

“Monitoring containers without eBPF is whack-a-mole blindfolded.”

What started as a straightforward container security tool became a weeks-long exercise in kernel panics, parent process deception, and eBPF’s complete lack of forgiveness for sloppy code. This is the honest account of what it took to get it working.

Understanding the Fundamentals#

The Kernel#

The kernel controls everything — memory, devices, security. Every system call your containerized application makes passes through it. Corrupt the kernel and the system goes down, not just your process. That distinction matters a lot when you’re attaching probes to it.

What eBPF Actually Is#

eBPF (Extended Berkeley Packet Filter) lets you attach programs to kernel tracepoints and run them in a sandboxed environment. In practice:

Real-time syscall monitoring without touching application code
Network packet inspection at the kernel level
Process tree tracking across container boundaries
Low overhead — when implemented correctly

The “safely” qualifier does real work here. The verifier rejects programs it considers unsafe, which means your early learning curve is mostly rejected loads and cryptic error messages. Fun times.

The Problem Worth Solving#

Modern container deployments have a structural monitoring gap. Containers provide isolation, but isolation is not surveillance. A process doing something it shouldn’t inside a container is invisible unless you’re watching at the syscall level.

The numbers:

63% of containers run with excessive privileges
14 average escape routes per Kubernetes cluster
37% of organizations can’t detect container breakouts in real-time

The goal was to close that gap — monitor container processes at the kernel level, detect escape patterns, track process trees across the container boundary, and alert before a breakout becomes an incident.

Technical Architecture#

The monitor attaches eBPF programs to critical kernel tracepoints. Here’s the core execution hook:

SEC("tracepoint/syscalls/sys_enter_execve")
int monitor_execve(struct pt_regs *ctx) {
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid >> 32;

    if (!is_containerized(pid)) {
        return 0;
    }

    struct event_t event = {};
    event.pid = pid;
    event.timestamp = bpf_ktime_get_ns();

    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
                          &event, sizeof(event));
    return 0;
}

The full system flow:

Probe attachment — hooks into execve, open, and connect syscalls to monitor process creation, file access, and network activity.

Container filtering — reads cgroup paths to identify container processes, ignoring host processes unless they’re doing something suspicious.

Behavioral analysis — compares activity against known escape patterns, privilege escalation indicators, and anomalous network connections.

Userspace alerting — ships events to a daemon that logs, aggregates, and triggers alerts for critical violations.

What Actually Went Wrong#

Kernel Version Requirements#

$ uname -r
5.4.0-100-generic

The eBPF features I needed require kernel 5.8+. I was on 5.4. Upgrading a production kernel means a 2GB download, a reboot, and accepting that something might break. The right move is testing the upgrade in a VM first, validating everything, then doing it in production. That’s what I did. It worked — but the time cost is real, and if you skip the VM step you will have a bad time.

Build Dependencies (Found One Error at a Time)#

$ make
fatal error: linux/bpf.h: No such file or directory

Install all of these upfront. Don’t find them one compiler error at a time like I did:

libbpf-dev
linux-headers-$(uname -r)
clang
llvm
libelf-dev

Container Detection: First Attempt#

// This does not work. At all.
bool is_containerized(u32 pid) {
    return pid > 1000;
}

Yeah. PID range tells you absolutely nothing about container membership. The correct approach reads the actual cgroup path:

bool is_containerized(u32 pid) {
    char cgroup_path[256];
    snprintf(cgroup_path, sizeof(cgroup_path),
             "/proc/%d/cgroup", pid);

    return contains_container_indicator(cgroup_path);
}

Parse the cgroup file, check for docker, containerd, or kubepods. More code. Actually works.

Errors Worth Documenting#

Missing Headers#

error: implicit declaration of function 'getppid'

Two hours of debugging. Missing #include <unistd.h>. Include your headers. All of them. Up front. I shouldn’t have to say this.

Missing Directories#

FileNotFoundError: [Errno 2] No such file or directory: '/var/log/ebpf-monitor/'

import os
os.makedirs('/var/log/ebpf-monitor/', exist_ok=True)

Directories don’t create themselves. I keep learning this lesson on different projects.

Parent Process Lies#

Parent PID: 1
Actual parent: containerd-shim (PID: 3847)

Container processes are deceptive by nature. They’ll report PID 1 as their parent when they’re actually nested three levels deep in container runtime processes. Don’t trust the first parent you find — walk the whole tree:

u32 get_real_parent(u32 pid) {
    u32 current = pid;
    for (int i = 0; i < 10; i++) {
        u32 parent = get_parent_pid(current);
        if (is_container_runtime(parent)) {
            return parent;
        }
        current = parent;
    }
    return 0;
}

The Kernel Panic#

[  123.456789] BUG [<ffffffffc0ab1234>] ? my_ebpf_prog+0x45/0x67 [ebpf_monitor]
[  123.456790] Kernel panic - not syncing: Fatal exception

A null pointer dereference in eBPF doesn’t crash your program. It crashes the entire system. Not a recoverable error. Not a segfault you can handle. The whole machine goes down.

Check every pointer. Every single one:

struct task_struct *task = (struct task_struct *)bpf_get_current_task();
if (!task) {
    return 0;
}

char *comm = BPF_CORE_READ(task, comm);
if (!comm) {
    return 0;
}

The verifier catches a lot of this at load time, but not everything. Do not rely on the verifier to save you. Null-check everything and test in a VM before you touch a real system.

Results#

After all of that, here’s what the tool actually delivers:

Threat Type	Detection Rate	False Positives
Container Escape Attempts	94%	2%
Privilege Escalation	89%	5%
Suspicious Network Activity	91%	3%
File System Tampering	87%	4%

Performance overhead: ~2% CPU, 15MB for the daemon plus 8KB per eBPF program, under 1ms latency per monitored syscall. Acceptable for what you get.

Key Takeaways#

eBPF doesn’t forgive. One null pointer and you’re rebooting the server. Test in VMs. Always.

Container detection is harder than it looks. Processes lie. Runtimes lie. Build robust detection logic from the start — patching it later is painful.

Most eBPF documentation is written for a different kernel version than yours. Trust the official kernel docs and libbpf source over blog posts. Including this one, honestly.

Source#

Full implementation on GitHub — eBPF programs, userspace daemon, and deployment config included.

What’s Next#

Anomaly detection with lightweight ML models for behavioral baselining
Falco integration for enterprise deployments
Visualization dashboard for process tree and alert history
Automated response — blocking containers that trigger critical alerts