8000 refactor: replace startup script logs EOF with starting/ready time by mafredri · Pull Request #8082 · coder/coder · GitHub
[go: up one dir, main page]

Skip to content

refactor: replace startup script logs EOF with starting/ready time #8082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 20, 2023

Conversation

mafredri
Copy link
Member

This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting after that ID would give no logs and the API
would thus lose track of EOF.

@mafredri mafredri changed the title refactor: Replace startup script logs EOF with starting/ready time refactor: replace startup script logs EOF with starting/ready time Jun 19, 2023
// Only the latest state is reported, intermediate states may be
// lost if the agent can't communicate with the API.
// reportLifecycleLoop reports the current lifecycle state once. All state
// changes are reported in order.
Copy link
Member Author
@mafredri mafredri Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Originally, this was a weird/performance choice to only submit the latest status which simply resulted in needing to document the behavior increasing the complexity. We now change it so that the agent always reports all states (still non-blocking), we achieve this by adding the timestamp for the event in the payload.

a.lifecycleMu.Unlock()
return
}
a.lifecycleState = state
a.logger.Debug(ctx, "set lifecycle state", slog.F("state", state), slog.F("last", lastState))
a.lifecycleStates = append(a.lifecycleStates, report)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Guaranteed to not grow unboundedly (len(a.lifecycleStates) <= enum entries).

} else {
logger.Info(ctx, "script completed", slog.F("execution_time", execTime), slog.F("exit_code", exitCode))
}
}()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Small cleanup to unify logging between startup and shutdown scripts.

@mafredri mafredri force-pushed the mafredri/refactor-startup-script-eof-to-start-end branch 2 times, most recently from d8a51d1 to cedf8a4 Compare June 19, 2023 15:13
return err
}

workspace, err := q.db.GetWorkspaceByAgentID(ctx, agent.ID)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Perhaps I shouldn't have made this simplification? I see mixed use in dbauthz not sure why we were doing both.

@mafredri mafredri force-pushed the mafredri/refactor-startup-script-eof-to-start-end branch 3 times, most recently from 488b0ae to 1334344 Compare June 19, 2023 15:38
@mafredri mafredri requested review from kylecarbs and mtojek June 19, 2023 15:45
This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
@mafredri mafredri force-pushed the mafredri/refactor-startup-script-eof-to-start-end branch from 1334344 to 73f19d6 Compare June 19, 2023 15:46
@mafredri mafredri marked this pull request as ready for review June 19, 2023 15:51
@@ -461,15 +470,20 @@ func (a *agent) reportLifecycleLoop(ctx context.Context) {
// setLifecycle sets the lifecycle state and notifies the lifecycle loop.
// The state is only updated if it's a valid state transition.
func (a *agent) setLifecycle(ctx context.Context, state codersdk.WorkspaceAgentLifecycle) {
report := agentsdk.PostLifecycleRequest{
State: state,
ChangedAt: database.Now(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database.Now() or time.Now()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally used database.Now() here for consistency since it's a value that will be stored in the DB. Startup logs also use database.Now() but we do seem to have some mixed use in the agent and some other places may be wrong?

Thoughts @kylecarbs? For all intents and purposes, this shouldn't matter since the DB fields are timestamptz, what's the motivation for using database.Now() that always uses UTC? Logging purposes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the motivation for using database.Now() that always uses UTC? Logging purposes?

  • It might be a good candidate for the linter rule.

@mtojek mtojek self-requested a review June 20, 2023 11:05
Copy link
Member
@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Thanks for addressing all comments. If CI is happy to merge it, I'm cool with that too! You can work on improvements in follow-ups.

@mafredri mafredri merged commit 8dac035 into main Jun 20, 2023
@mafredri mafredri deleted the mafredri/refactor-startup-script-eof-to-start-end branch June 20, 2023 11:41
@github-actions github-actions bot locked and limited conversation to collaborators Jun 20, 2023
@kylecarbs
Copy link
Member

This change breaks envbuilder, logstream-kube, and anything that writes startup logs to the UI after the startup script might be complete.

We should rename startup logs to something more generalized like agent logs to remove this confusion in the future. The way it worked before (I believe) was that the EOF was sent incorrectly, but we really shouldn't have been sending an EOF at all - there is no EOF, because infrastructure logs are never actually complete. A pod can always restart, or something can trigger a reboot of the agent.

@kylecarbs
Copy link
Member

The only break is the EOF btw, which is like 6 lines of code ;p I'm just removing that for now after talking with Mathias, and we'll work on a better long-term fix soon.

@kylecarbs
Copy link
Member

PR here: #8528

This just reverts the EOF, which should bring back the prior behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0