8000 DRA: core update for 1.34 · kubernetes/website@36162b1 · GitHub
[go: up one dir, main page]

Skip to content

Commit 36162b1

Browse files
committed
DRA: core update for 1.34
The feature gate and API examples get updated. Enabling it is now simpler, changes are only needed for backward compatibility. One particular troubleshooting step fits into the existing user-facing "allocate-devices-dra.md". Admin-facing troubleshooting and documentation of metrics which might be of interest can follow separately.
1 parent ab5c2db commit 36162b1

File tree

7 files changed

+79
-67
lines changed

7 files changed

+79
-67
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ creating or modifying ResourceSlices.
219219
Consider the following example ResourceSlice:
220220

221221
```yaml
222-
apiVersion: resource.k8s.io/v1beta1
222+
apiVersion: resource.k8s.io/v1
223223
kind: ResourceSlice
224224
metadata:
225225
name: cat-slice
@@ -233,14 +233,13 @@ spec:
233233
allNodes: true
234234
devices:
235235
- name: "large-black-cat"
236-
basic:
237-
attributes:
238-
color:
239-
string: "black"
240-
size:
241-
8000 string: "large"
242-
cat:
243-
boolean: true
236+
attributes:
237+
color:
238+
string: "black"
239+
size:
240+
string: "large"
241+
cat:
242+
boolean: true
244243
```
245244
This ResourceSlice is managed by the `resource-driver.example.com` driver in the
246245
`black-cat-pool` pool. The `allNodes: true` field indicates that any node in the
@@ -399,7 +398,7 @@ admin access grants access to in-use devices and may enable additional
399398
permissions when making the device available in a container:
400399

401400
```yaml
402-
apiVersion: resource.k8s.io/v1beta2
401+
apiVersion: resource.k8s.io/v1
403402
kind: ResourceClaimTemplate
404403
metadata:
405404
name: large-black-cat-claim-template
@@ -441,7 +440,7 @@ allocated if it is available. But if it is not and two small white devices are a
441440
the pod will still be able to run.
442441

443442
```yaml
444-
apiVersion: resource.k8s.io/v1beta2
443+
apiVersion: resource.k8s.io/v1
445444
kind: ResourceClaimTemplate
446445
metadata:
447446
name: prioritized-list-claim-template
@@ -495,7 +494,7 @@ handles this and it is transparent to the consumer as the ResourceClaim API is n
495494

496495
```yaml
497496
kind: ResourceSlice
498-
apiVersion: resource.k8s.io/v1beta2
497+
apiVersion: resource.k8s.io/v1
499498
metadata:
500499
name: resourceslice
501500
spec:
@@ -632,4 +631,4 @@ spec:
632631
- [Allocate devices to workloads using DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra/)
633632
- For more information on the design, see the
634633
[Dynamic Resource Allocation with Structured Parameters](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters)
635-
KEP.
634+
KEP.

content/en/docs/reference/command-line-tools-reference/feature-gates/DynamicResourceAllocation.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,13 @@ stages:
1313
- stage: beta
1414
defaultValue: false
1515
fromVersion: "1.32"
16+
toVersion: "1.33"
17+
- stage: stable
18+
defaultValue: true
19+
locked: false
20+
fromVersion: "1.34"
1621

17-
# TODO: as soon as this is locked to "true" (= GA), comments about other DRA
22+
# TODO: as soon as this is locked to "true" (= some time after GA, *not* yet in 1.34), comments about other DRA
1823
# feature gate(s) like "unless you also enable the `DynamicResourceAllocation` feature gate"
1924
# can be removed (for example, in dra-admin-access.md).
2025

content/en/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Allocate Devices to Workloads with DRA
33
content_type: task
4-
min-kubernetes-server-version: v1.32
4+
min-kubernetes-server-version: v1.34
55
weight: 20
66
---
77
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
@@ -157,6 +157,20 @@ claims in different containers.
157157
kubectl apply -f https://k8s.io/examples/dra/dra-example-job.yaml
158158
```
159159

160+
Try the following troubleshooting steps:
161+
162+
1. When the workload does not start as expected, drill down from Job
163+
to Pods to ResourceClaims and check the objects
164+
at each level with `kubectl describe` to see whether there are any
165+
status fields or events which might explain why the workload is
166+
not starting.
167+
1. When creating a Pod fails with `must specify one of: resourceClaimName,
168+
resourceClaimTemplateName`, check that all entries in `pod.spec.resourceClaims`
169+
have exactly one of those fields set. If they do, then it is possible
170+
that the cluster has a mutating Pod webhook installed which was built
171+
against APIs from Kubernetes < 1.32. Work with your cluster administrator
172+
to check this.
173+
160174
## Clean up {#clean-up}
161175

162176
To delete the Kubernetes objects that you created in this task, follow these
@@ -183,4 +197,4 @@ steps:
183197

184198
## {{% heading "whatsnext" %}}
185199

186-
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
200+
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)

content/en/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster.md

Lines changed: 42 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Set Up DRA in a Cluster"
33
content_type: task
4-
min-kubernetes-server-version: v1.32
4+
min-kubernetes-server-version: v1.34
55
weight: 10
66
---
77
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
@@ -37,30 +37,20 @@ For details, see
3737

3838
<!-- steps -->
3939

40-
## Enable the DRA API groups {#enable-dra}
40+
## Optional: enable legacy DRA API groups {#enable-dra}
4141

42-
To let Kubernetes allocate resources to your Pods with DRA, complete the
43-
following configuration steps:
42+
DRA graduated to stable in Kubernetes 1.34 and is enabled by default.
43+
Some older DRA drivers or workloads might still need the
44+
v1beta1 API from Kubernetes 1.30 or v1beta2 from Kubernetes 1.32.
45+
If and only if support for those is desired, then enable the following
46+
{{< glossary_tooltip text="API groups" term_id="api-group" >}}:
47+
48+
* `resource.k8s.io/v1beta1`
49+
* `resource.k8s.io/v1beta2`
50+
51+
For more information, see
52+
[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling).
4453

45-
1. Enable the `DynamicResourceAllocation`
46-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
47-
on all of the following components:
48-
49-
* `kube-apiserver`
50-
* `kube-controller-manager`
51-
* `kube-scheduler`
52-
* `kubelet`
53-
54-
1. Enable the following
55-
{{< glossary_tooltip text="API groups" term_id="api-group" >}}:
56-
57-
* `resource.k8s.io/v1beta1`: required for DRA to function.
58-
* `resource.k8s.io/v1beta2`: optional, recommended improvements to the user
59-
experience.
60-
61-
For more information, see
62-
[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling).
63-
6454
## Verify that DRA is enabled {#verify}
6555

6656
To verify that the cluster is configured correctly, try to list DeviceClasses:
@@ -81,15 +71,15 @@ similar to the following:
8171
```
8272
error: the server doesn't have a resource type "deviceclasses"
8373
```
74+
8475
Try the following troubleshooting steps:
8576

86-
1. Ensure that the `kube-scheduler` component has the `DynamicResourceAllocation`
87-
feature gate enabled *and* uses the
88-
[v1 configuration API](/docs/reference/config-api/kube-scheduler-config.v1/).
89-
If you use a custom configuration, you might need to perform additional steps
90-
to enable the `DynamicResource` plugin.
91-
1. Restart the `kube-apiserver` component and the `kube-controller-manager`
92-
component to propagate the API group changes.
77+
1. Reconfigure and restart the `kube-apiserver` component.
78+
79+
1. If the complete `.spec.resourceClaims` field gets removed from Pods, or if
80+
Pods get scheduled without considering the ResourceClaims, then verify
81+
that the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is not turned off
82+
for kube-apiserver, kube-controller-manager, kube-scheduler or the kubelet.
9383

9484
## Install device drivers {#install-drivers}
9585

@@ -112,6 +102,12 @@ cluster-1-device-pool-1-driver.example.com-lqx8x cluster-1-node-1 driver
112102
cluster-1-device-pool-2-driver.example.com-29t7b cluster-1-node-2 driver.example.com cluster-1-device-pool-2-446z 8s
113103
```
114104

105+
Try the following troubleshooting steps:
106+
107+
1. Check the health of the DRA driver and look for error messages about
108+
publishing ResourceSlices in its log output. The vendor of the driver
109+
may have further instructions about installation and troubleshooting.
110+
115111
## Create DeviceClasses {#create-deviceclasses}
116112

117113
You can define categories of devices that your application operators can
@@ -135,27 +131,25 @@ operators.
135131
The output is similar to the following:
136132

137133
```yaml
138-
apiVersion: resource.k8s.io/v1beta1
134+
apiVersion: resource.k8s.io/v1
139135
kind: ResourceSlice
140136
# lines omitted for clarity
141137
spec:
142138
devices:
143-
- basic:
144-
attributes:
145-
type:
146-
string: gpu
147-
capacity:
148-
memory:
149-
value: 64Gi
150-
name: gpu-0
151-
- basic:
152-
attributes:
153-
type:
154-
string: gpu
155-
capacity:
156-
memory:
157-
value: 64Gi
158-
name: gpu-1
139+
- attributes:
140+
type:
141+
string: gpu
142+
capacity:
143+
memory:
144+
value: 64Gi
145+
name: gpu-0
146+
- attributes:
147+
type:
148+
string: gpu
149+
capacity:
150+
memory:
151+
value: 64Gi
152+
name: gpu-1
159153
driver: driver.example.com
160154
nodeName: cluster-1-node-1
161155
# lines omitted for clarity
@@ -186,4 +180,4 @@ kubectl delete -f https://k8s.io/examples/dra/deviceclass.yaml
186180
## {{% heading "whatsnext" %}}
187181

188182
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
189-
* [Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra)
183+
* [Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra)

content/en/examples/dra/deviceclass.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1beta2
1+
apiVersion: resource.k8s.io/v1
22
kind: DeviceClass
33
metadata:
44
name: example-device-class

content/en/examples/dra/resourceclaim.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1beta2
1+
apiVersion: resource.k8s.io/v1
22
kind: ResourceClaim
33
metadata< 38BA /span>:
44
name: example-resource-claim

content/en/examples/dra/resourceclaimtemplate.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1beta2
1+
apiVersion: resource.k8s.io/v1
22
kind: ResourceClaimTemplate
33
metadata:
44
name: example-resource-claim-template

0 commit comments

Comments
 (0)
0