8000 DRA resourceslice controller: fix recreation after quick delete by pohly · Pull Request #132683 · kubernetes/kubernetes · GitHub
[go: up one dir, main page]

Skip to content

DRA resourceslice controller: fix recreation after quick delete #132683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pohly
Copy link
Contributor
@pohly pohly commented Jul 2, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

If a ResourceSlice got published by the ResourceSlice controller in a DRA driver and then that ResourceSlice got deleted quickly (within one minute, the mutation cache TTL) by someone (for example, the kubelet because of a restart), then the controller did not react properly to the deletion unless some other event triggered the syncing of the pool.

Found while adding upgrade/downgrade tests with a driver which keeps running across the upgrade/downgrade.

Which issue(s) this PR is related to:

KEP: kubernetes/enhancements#4381

Special notes for your reviewer:

The exact sequence leading to this were:

  • controller adds ResourceSlice, schedules a sync for one minute in the future (the TTL)
  • someone else deletes the ResourceSlice
  • add and delete events schedule another sync 30 seconds in the future (the delay), overwriting the other scheduled sync
  • sync runs once, finds deleted slices in the mutation cache, does not re-create them, and also does not run again

One possible fix would be to set a resync period. But then work is done periodically, whether it's necessary or not.

Another fix is to ensure that the TTL is shorter than the delay. Then when a sync occurs, all locally stored additional slices are expired. But that renders the whole storing of recently created slices in the cache pointless.

So the fix used here is to keep track of when another sync has to run because of added slices. At the end of each sync, the next sync gets scheduled if (and only if) needed, until eventually syncing can stop.

Does this PR introduce a user-facing change?

DRA drivers: the resource slice controller sometimes didn't react properly when kubelet or someone else deleted a recently created ResourceSlice. It incorrectly assumed that the ResourceSlice still exists and didn't recreate it.

/assign @mortent

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 2, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. labels Jul 2, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 2, 2025
@k8s-ci-robot k8s-ci-robot requested review from bart0sh and klueska July 2, 2025 13:56
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2025
@@ -573,12 +574,14 @@ func testPrioritizedList(tCtx ktesting.TContext, enabled bool) {
}).WithTimeout(10 * time.Second).WithPolling(time.Second).Should(schedulingAttempted)
}

func testPublishResourceSlices(tCtx ktesting.TContext, disabledFeatures ...featuregate.Feature) {
func testPublishResourceSlices(tCtx ktesting.TContext, haveLatestAPI bool, disabledFeatures ...featuregate.Feature) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have event handler tests in the unit tests of the package. Testing work queue behavior is tricky there and there's always the question whether the implemented behavior works "in the real world".

Therefore I added an integration test instead. I also used the opportunity to tests the "update slice after unwanted update" scenario.

Both tests need the v1beta2 API, whereas the original "create slices" doesn't.

disabled = append(disabled, string(feature))
// Manually turn into the expected slices, considering that some fields get dropped.
expectedResources := resources.DeepCopy()
var expectedSliceSpecs []resourcev1beta2.ResourceSliceSpec
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison was avoided earlier because it depends on a specific API version, but it's useful to have it at least for one of them (more specifically, the latest one).

pohly added a commit to pohly/kubernetes that referenced this pull request Jul 2, 2025
}),
"NodeName": matchPointer(spec.NodeName),
"NodeSelector": matchPointer(spec.NodeSelector),
"AllNodes": gstruct.PointTo(gomega.BeTrue()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter hints that this should be BeTrueBecause. That hint can be ignored here, inside a struct match we only care about the true/false comparison.

@pohly
Copy link
Contributor Author
pohly commented Jul 2, 2025

/test pull-kubernetes-e2e-gce

Failed to come u.

/skip

One linter hint which can be ignored.

@@ -420,11 +433,13 @@ func (c *Controller) initInformer(ctx context.Context) error {
&cache.ListWatch{
ListWithContextFunc: func(ctx context.Context, options metav1.ListOptions) (runtime.Object, error) {
tweakListOptions(&options)
return c.resourceClient.ResourceSlices().List(ctx, options)
slices, err := c.resourceClient.ResourceSlices().List(ctx, options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I had some logging inserted between the call and the return. I'll revert to directly returning the values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// It's not sufficient to scheduled a delayed sync because
// another sync scheduled by an event overwrites the older
// one, so we would sync too soon and then not again.
lastAddByPool map[string]time.Time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify here that the key will be the name of a pool. It wasn't obvious to me until I saw the assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

8000
},
WatchFuncWithContext: func(ctx context.Context, options metav1.ListOptions) (watch.Interface, error) {
tweakListOptions(&options)
return c.resourceClient.ResourceSlices().Watch(ctx, options)
w, err := c.resourceClient.ResourceSlices().Watch(ctx, options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pohly pohly force-pushed the dra-resourceslice-recreate-fix branch from 009eec4 to 2a53b1c Compare July 3, 2025 06:18
@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jul 3, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Jul 3, 2025
If a ResourceSlice got published by the ResourceSlice controller in a DRA
driver and then that ResourceSlice got deleted quickly (within one minute, the
mutation cache TTL) by someone (for example, the kubelet because of a restart),
then the controller did not react properly to the deletion unless some other
event triggered the syncing of the pool.

Found while adding upgrade/downgrade tests with a driver which keeps running
across the upgrade/downgrade.

The exact sequence leading to this were:
- controller adds ResourceSlice, schedules a sync for one minute in the future (the TTL)
- someone else deletes the ResourceSlice
- add and delete events schedule another sync 30 seconds in the future (the delay),
  *overwriting* the other scheduled sync
- sync runs once, finds deleted slices in the mutation cache,
  does not re-create them, and also does not run again

One possible fix would be to set a resync period. But then work is done
periodically, whether it's necessary or not.

Another fix is to ensure that the TTL is shorter than the delay. Then when a
sync occurs, all locally stored additional slices are expired. But that renders
the whole storing of recently created slices in the cache pointless.

So the fix used here is to keep track of when another sync has to run because
of added slices. At the end of each sync, the next sync gets scheduled if (and
only if) needed, until eventually syncing can stop.
@pohly pohly force-pushed the dra-resourceslice-recreate-fix branch from 2a53b1c to 2e96624 Compare July 3, 2025 06:20
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-linter-hints 2e96624 link false /test pull-kubernetes-linter-hints

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@pohly
Copy link
Contributor Author
pohly commented Jul 3, 2025

/skip

@mortent
Copy link
Member
mortent commented Jul 3, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 3, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 010ce4b69fe631d737debe71a049696003490177

@k8s-ci-robot k8s-ci-robot merged commit 52787dd into kubernetes:master Jul 3, 2025
16 of 18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jul 3, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in SIG Node CI/Test Board Jul 3, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in SIG Apps Jul 3, 2025
@pohly pohly moved this from 🆕 New to ✅ Done in Dynamic Resource Allocation Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants
0