Description
There seems to be a goroutine leak in coderd.(*api).workspaceAgentTurn
.
This could be seen as two bugs:
- Goroutine leak
- Use of
(*http.Request).Context()
afterHijack
in more than one place
Steps to Reproduce
- Enable pprof for coder server
coder ssh dev
ctlr+d
- Goto 1
- Check pprof (
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine
)
The leak is in part due to reliance on the http.Request
context and use of websockets. The underlying websocket library calls (*http.Request).Hijack
which disables context propagation.
This happens here:
coder/coderd/workspaceagents.go
Line 304 in 668a671
And the following contexts will not cancel until the http handler completes:
coder/coderd/workspaceagents.go
Line 316 in 668a671
coder/coderd/workspaceagents.go
Line 320 in 668a671
We must avoid using r.Context()
after hijack, unless we are using it with the expectation that the http handler will exit (at which point the context will complete).
I'm unfamiliar with the pion/turn
package, but another factor could be wrt how it handles connection closure, perhaps it does not propagate as we expect since we're not calling wsConn.Close()
due to context reliance?
Similar reliance on request context after hijack is done elsewhere, we should rethink all of them. Example:
coder/coderd/workspaceagents.go
Line 155 in 668a671