guestagent: ticker: watch sys_exit_bind with eBPF #4066

AkihiroSuda · 2025-09-22T14:07:22Z

The event watcher is now triggered immediately on sys_exit_bind, not waiting for the next 3-second tick.

This commit resolves the long-standing TODO since the initial commit:

lima/cmd/lima-guestagent/daemon_linux.go

Lines 57 to 63 in 7459f45

    
           newTicker := func() (<-chan time.Time, func()) { 
        
           	// TODO: use an equivalent of `bpftrace -e 'tracepoint:syscalls:sys_*_bind { printf("tick\n"); }')`, 
        
           	// without depending on `bpftrace` binary. 
        
           	// The agent binary will need CAP_BPF file cap. 
        
           	ticker := time.NewTicker(tick) 
        
           	return ticker.C, ticker.Stop 
        
           }

Close #3067
Close #3766
Close #4021

balajiv113 · 2025-09-22T19:51:06Z

Does this work for UDP as well ??

jandubois · 2025-09-22T19:52:29Z

I don't have time to test right now, but I think this will not trigger when the port is closed, right?

One common problem with our polling is that test suites create a container with exposed ports, and after stopping the container create another one again in the next test that exposes the same ports. Due to the 3 second polling the old forwards may not have been removed yet, blocking the new container from exposing the same ports again.

Maybe there is another trace point to also kick the ticker, like sys_exit_close or tcp:tcp_close, if that works reliably?

AkihiroSuda · 2025-09-23T03:39:49Z

Does this work for UDP as well ??

Yes

sys_exit_close

Seems too frequent

$ sudo bpftrace -e 'tracepoint:syscalls:sys_exit_close { printf("%d %s %s\n", elapsed / 1000 / 1000 / 1000, probe, comm); }'
Attaching 1 probe...                                   
0 tracepoint:syscalls:sys_exit_close tmux: server      
0 tracepoint:syscalls:sys_exit_close tmux: server      
0 tracepoint:syscalls:sys_exit_close tmux: server      
1 tracepoint:syscalls:sys_exit_close tmux: server      
1 tracepoint:syscalls:sys_exit_close tmux: server      
2 tracepoint:syscalls:sys_exit_close tmux: server      
2 tracepoint:syscalls:sys_exit_close tmux: server      
2 tracepoint:syscalls:sys_exit_close containerd        
2 tracepoint:syscalls:sys_exit_close containerd        
2 tracepoint:syscalls:sys_exit_close containerd        
2 tracepoint:syscalls:sys_exit_close containerd 
...

tcp:tcp_close

Doesn't seem to exist in /sys/kernel/debug/tracing/available_events

The event watcher is now triggered immediately on `sys_exit_bind`, not waiting for the next 3-second tick. This commit resolves the long-standing TODO since the initial commit: https://github.com/lima-vm/lima/blob/7459f4587987ed014c372f17b82de1817feffa2e/cmd/lima-guestagent/daemon_linux.go#L57-L63 Close PR 3067, 3766, 4021 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

AkihiroSuda · 2025-09-23T03:46:02Z

One common problem with our polling is that test suites create a container with exposed ports, and after stopping the container create another one again in the next test that exposes the same ports. Due to the 3 second polling the old forwards may not have been removed yet, blocking the new container from exposing the same ports again.

How does this happen? The guest kernel isn't aware of the usage of the host port, so it doesn't block bind() in the guest.
After bind() in the guest succeeds, the ticker event is triggered, and the hostagent updates the host port status accordingly.

balajiv113

LGTM 👍

Looks like i overcomplicated by thinking of getting a real event with open ports.

jandubois · 2025-09-23T16:37:38Z

How does this happen? The guest kernel isn't aware of the usage of the host port, so it doesn't block bind() in the guest.
After bind() in the guest succeeds, the ticker event is triggered, and the hostagent updates the host port status accordingly.

I guess the issue used to be that the test would connect to the dangling port on the host. So maybe just looking for bind events is enough; will need to write a test to verify.

It verifies that when a container is destroyed, its ports can be reused immediately and are not subject to being freed by a polling loop. See lima-vm#4066. Signed-off-by: Jan Dubois <jan.dubois@suse.com>

jandubois

I haven't reviewed the code (@balajiv113 seems to have done so already), but I wrote a test (#4077) to verify it works properly, which is seems to do.

This seems like the most straightforward fix to the issue!

It verifies that when a container is destroyed, its ports can be reused immediately and are not subject to being freed by a polling loop. See lima-vm#4066. Signed-off-by: Jan Dubois <jan.dubois@suse.com>

norio-nomura · 2025-09-25T00:40:29Z

After this is merged, the CPU utilization rate of lima-guestagent is always close to 90%.

I confirmed that it will be resolved with reverted.

It verifies that when a container is destroyed, its ports can be reused immediately and are not subject to being freed by a polling loop. See lima-vm#4066. Signed-off-by: Jan Dubois <jan.dubois@suse.com>

AkihiroSuda added this to the v2.0.0 milestone Sep 22, 2025

AkihiroSuda added the area/portfwd label Sep 22, 2025

AkihiroSuda mentioned this pull request Sep 22, 2025

Support for eBPF based port forwarding #3067

Closed

AkihiroSuda force-pushed the ebpf-tick branch from bd686c7 to 4962543 Compare September 23, 2025 03:43

AkihiroSuda requested a review from jandubois September 23, 2025 07:30

balajiv113 approved these changes 8000 Sep 23, 2025

View reviewed changes

jandubois mentioned this pull request Sep 23, 2025

Add port monitoring test script #4077

Merged

jandubois approved these changes Sep 23, 2025

View reviewed changes

jandubois merged commit 45c9527 into lima-vm:master Sep 23, 2025
140 of 144 checks passed

norio-nomura mentioned this pull request Sep 25, 2025

[v2.0.0-alpha.1] guestagent: the CPU utilization rate of lima-guestagent is always close to 90%. #4094

Closed

abiosoft mentioned this pull request Oct 1, 2025

Port mappings are exposed with a delay to the host abiosoft/colima#1424

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

guestagent: ticker: watch sys_exit_bind with eBPF #4066

guestagent: ticker: watch sys_exit_bind with eBPF #4066

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	newTicker := func() (<-chan time.Time, func()) {
	// TODO: use an equivalent of `bpftrace -e 'tracepoint:syscalls:sys_*_bind { printf("tick\n"); }')`,
	// without depending on `bpftrace` binary.
	// The agent binary will need CAP_BPF file cap.
	ticker := time.NewTicker(tick)
	return ticker.C, ticker.Stop
	}

guestagent: ticker: watch sys_exit_bind with eBPF #4066

guestagent: ticker: watch sys_exit_bind with eBPF #4066

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!