refactor: replace startup script logs EOF with starting/ready time #8082

mafredri · 2023-06-19T15:04:52Z

This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting after that ID would give no logs and the API
would thus lose track of EOF.

mafredri · 2023-06-19T15:06:57Z

agent/agent.go

-// Only the latest state is reported, intermediate states may be
-// lost if the agent can't communicate with the API.
+// reportLifecycleLoop reports the current lifecycle state once. All state
+// changes are reported in order.


Review: Originally, this was a weird/performance choice to only submit the latest status which simply resulted in needing to document the behavior increasing the complexity. We now change it so that the agent always reports all states (still non-blocking), we achieve this by adding the timestamp for the event in the payload.

mafredri · 2023-06-19T15:08:55Z

agent/agent.go

 		a.lifecycleMu.Unlock()
 		return
 	}
-	a.lifecycleState = state
-	a.logger.Debug(ctx, "set lifecycle state", slog.F("state", state), slog.F("last", lastState))
+	a.lifecycleStates = append(a.lifecycleStates, report)


Review: Guaranteed to not grow unboundedly (len(a.lifecycleStates) <= enum entries).

mafredri · 2023-06-19T15:10:39Z

agent/agent.go

+		} else {
+			logger.Info(ctx, "script completed", slog.F("execution_time", execTime), slog.F("exit_code", exitCode))
+		}
+	}()


Review: Small cleanup to unify logging between startup and shutdown scripts.

mafredri · 2023-06-19T15:14:30Z

coderd/database/dbauthz/dbauthz.go

-		return err
-	}
-
-	workspace, err := q.db.GetWorkspaceByAgentID(ctx, agent.ID)


Review: Perhaps I shouldn't have made this simplification? I see mixed use in dbauthz not sure why we were doing both.

This commit reverts some of the changes in #8029 and implements an alternative method of keeping track of when the startup script has ended and there will be no more logs. This is achieved by adding new agent fields for tracking when the agent enters the "starting" and "ready"/"start_error" lifecycle states. The timestamps simplify logic since we don't need understand if the current state is before or after the state we're interested in. They can also be used to show data like how long the startup script took to execute. This also allowed us to remove the EOF field from the logs as the implementation was problematic when we returned the EOF log entry in the response since requesting _after_ that ID would give no logs and the API would thus lose track of EOF.

coderd/database/migrations/000128_drop_startup_logs_eof_and_add_completion.down.sql

coderd/workspaceagents.go

scripts/coder-dev.sh

mtojek · 2023-06-20T09:42:50Z

agent/agent.go

@@ -461,15 +470,20 @@ func (a *agent) reportLifecycleLoop(ctx context.Context) {
 // setLifecycle sets the lifecycle state and notifies the lifecycle loop.
 // The state is only updated if it's a valid state transition.
 func (a *agent) setLifecycle(ctx context.Context, state codersdk.WorkspaceAgentLifecycle) {
+	report := agentsdk.PostLifecycleRequest{
+		State:     state,
+		ChangedAt: database.Now(),


database.Now() or time.Now()?

Intentionally used database.Now() here for consistency since it's a value that will be stored in the DB. Startup logs also use database.Now() but we do seem to have some mixed use in the agent and some other places may be wrong?

Thoughts @kylecarbs? For all intents and purposes, this shouldn't matter since the DB fields are timestamptz, what's the motivation for using database.Now() that always uses UTC? Logging purposes?

what's the motivation for using database.Now() that always uses UTC? Logging purposes?

It might be a good candidate for the linter rule.

agent/agent.go

…t-end

mtojek

👍

Thanks for addressing all comments. If CI is happy to merge it, I'm cool with that too! You can work on improvements in follow-ups.

kylecarbs · 2023-07-14T22:14:13Z

This change breaks envbuilder, logstream-kube, and anything that writes startup logs to the UI after the startup script might be complete.

We should rename startup logs to something more generalized like agent logs to remove this confusion in the future. The way it worked before (I believe) was that the EOF was sent incorrectly, but we really shouldn't have been sending an EOF at all - there is no EOF, because infrastructure logs are never actually complete. A pod can always restart, or something can trigger a reboot of the agent.

kylecarbs · 2023-07-14T22:20:11Z

The only break is the EOF btw, which is like 6 lines of code ;p I'm just removing that for now after talking with Mathias, and we'll work on a better long-term fix soon.

kylecarbs · 2023-07-14T22:28:32Z

PR here: #8528

This just reverts the EOF, which should bring back the prior behavior.

github-actions bot assigned mafredri Jun 19, 2023

mafredri changed the title ~~refactor: Replace startup script logs EOF with starting/ready time~~ refactor: replace startup script logs EOF with starting/ready time Jun 19, 2023

mafredri commented Jun 19, 2023

View reviewed changes

mafredri force-pushed the mafredri/refactor-startup-script-eof-to-start-end branch 2 times, most recently from d8a51d1 to cedf8a4 Compare June 19, 2023 15:13

mafredri commented Jun 19, 2023

View reviewed changes

mafredri force-pushed the mafredri/refactor-startup-script-eof-to-start-end branch 3 times, most recently from 488b0ae to 1334344 Compare June 19, 2023 15:38

mafredri requested review from kylecarbs and mtojek June 19, 2023 15:45

mafredri force-pushed the mafredri/refactor-startup-script-eof-to-start-end branch from 1334344 to 73f19d6 Compare June 19, 2023 15:46

mafredri marked this pull request as ready for review June 19, 2023 15:51

mtojek reviewed Jun 20, 2023

View reviewed changes

mafredri added 3 commits June 20, 2023 10:39

PR comment fixes

43a0f87

Merge branch 'main' into mafredri/refactor-startup-script-eof-to-star…

8ff8c02

…t-end

./coderd/database/migrations/fix_migration_numbers.sh

b4f78eb

mtojek self-requested a review June 20, 2023 11:05

mtojek approved these changes Jun 20, 2023

View reviewed changes

mafredri merged commit 8dac035 into main Jun 20, 2023

mafredri deleted the mafredri/refactor-startup-script-eof-to-start-end branch June 20, 2023 11:41

github-actions bot locked and limited conversation to collaborators Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: replace startup script logs EOF with starting/ready time #8082

refactor: replace startup script logs EOF with starting/ready time #8082

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

refactor: replace startup script logs EOF with starting/ready time #8082

refactor: replace startup script logs EOF with starting/ready time #8082

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!