8000 [Feature] Deployment & Members Condition metrics (#1610) · arangodb/kube-arangodb@9980e66 · GitHub
[go: up one dir, main page]

Skip to content

Commit 9980e66

Browse files
authored
[Feature] Deployment & Members Condition metrics (#1610)
1 parent 1a2097c commit 9980e66

File tree

43 files changed

+304
-35
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+304
-35
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- (Feature) JobScheduler Volumes, Probes, Lifecycle and Ports integration
1111
- (Feature) Merge ArangoDB Usage Metrics
1212
- (Bugfix) Check Connection to the ArangoDB before creating Backup
13+
- (Feature) Deployment & Members Condition metrics
1314

1415
## [1.2.38](https://github.com/arangodb/kube-arangodb/tree/1.2.38) (2024-02-22)
1516
- (Feature) Extract GRPC Server

docs/generated/metrics/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@ has_toc: false
2424
| [arangodb_operator_agency_cache_member_serving](./arangodb_operator_agency_cache_member_serving.md) | arangodb_operator | agency_cache | Gauge | Determines if agency member is reachable |
2525
| [arangodb_operator_agency_cache_present](./arangodb_operator_agency_cache_present.md) | arangodb_operator | agency_cache | Gauge | Determines if local agency cache is present |
2626
| [arangodb_operator_agency_cache_serving](./arangodb_operator_agency_cache_serving.md) | arangodb_operator | agency_cache | Gauge | Determines if agency is serving |
27+
| [arangodb_operator_deployment_conditions](./arangodb_operator_deployment_conditions.md) | arangodb_operator | deployment | Gauge | Representation of the ArangoDeployment condition state (true/false) |
2728
| [arangodb_operator_engine_assertions](./arangodb_operator_engine_assertions.md) | arangodb_operator | engine | Counter | Number of assertions invoked during Operator runtime |
2829
| [arangodb_operator_engine_ops_alerts](./arangodb_operator_engine_ops_alerts.md) | arangodb_operator | engine | Counter | Counter for actions which requires ops attention |
2930
| [arangodb_operator_engine_panics_recovered](./arangodb_operator_engine_panics_recovered.md) | arangodb_operator | engine | Counter | Number of Panics recovered inside Operator reconciliation loop |
3031
| [arangodb_operator_kubernetes_client_request_errors](./arangodb_operator_kubernetes_client_request_errors.md) | arangodb_operator | kubernetes_client | Counter | Number of Kubernetes Client request errors |
3132
| [arangodb_operator_kubernetes_client_requests](./arangodb_operator_kubernetes_client_requests.md) | arangodb_operator | kubernetes_client | Counter | Number of Kubernetes Client requests |
33+
| [arangodb_operator_members_conditions](./arangodb_operator_members_conditions.md) | arangodb_operator | members | Gauge | Representation of the ArangoMember condition state (true/false) |
3234
| [arangodb_operator_members_unexpected_container_exit_codes](./arangodb_operator_members_unexpected_container_exit_codes.md) | arangodb_operator | members | Counter | Counter of unexpected restarts in pod (Containers/InitContainers/EphemeralContainers) |
3335
| [arangodb_operator_rebalancer_enabled](./arangodb_operator_rebalancer_enabled.md) | arangodb_operator | rebalancer | Gauge | Determines if rebalancer is enabled |
3436
| [arangodb_operator_rebalancer_moves_current](./arangodb_operator_rebalancer_moves_current.md) | arangodb_operator | rebalancer | Gauge | Define how many moves are currently in progress |
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
layout: page
3+
title: arangodb_operator_deployment_conditions
4+
parent: List of available metrics
5+
---
6+
7+
# arangodb_operator_deployment_conditions (Gauge)
8+
9+
## Description
10+
11+
Representation of the ArangoDeployment condition state (true/false)
12+
13+
## Labels
14+
15+
| Label | Description |
16+
|:---------:|:---------------------|
17+
| namespace | Deployment Namespace |
18+
| name | Deployment Name |
19+
| condition | Condition Name |
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
layout: page
3+
title: arangodb_operator_members_conditions
4+
parent: List of available metrics
5+
---
6+
7+
# arangodb_operator_members_conditions (Gauge)
8+
9+
## Description
10+
11+
Representation of the ArangoMember condition state (true/false)
12+
13+
## Labels
14+
15+
| Label | Description |
16+
|:---------:|:---------------------|
17+
| namespace | Deployment Namespace |
18+
| name | Deployment Name |
19+
| member | Member ID |
20+
| condition | Condition Name |

internal/metrics.go.tmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//
22
// DISCLAIMER
33
//
4-
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
4+
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
55
//
66
// Licensed under the Apache License, Version 2.0 (the "License");
77
// you may not use this file except in compliance with the License.

internal/metrics.item.go.tmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//
22
// DISCLAIMER
33
//
4-
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
4+
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
55
//
66
// Licensed under the Apache License, Version 2.0 (the "License");
77
// you may not use this file except in compliance with the License.

internal/metrics.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,18 @@ namespaces:
219219
description: "DeploymentReplication Namespace"
220220
- key: name
221221
description: "DeploymentReplication Name"
222+
deployment:
223+
conditions:
224+
shortDescription: "Representation of the ArangoDeployment condition state (true/false)"
225+
description: "Representation of the ArangoDeployment condition state (true/false)"
226+
type: "Gauge"
227+
labels:
228+
- key: namespace
229+
description: "Deployment Namespace"
230+
- key: name
231+
description: "Deployment Name"
232+
- key: condition
233+
description: "Condition Name"
222234
members:
223235
unexpected_container_exit_codes:
224236
shortDescription: "Counter of unexpected restarts in pod (Containers/InitContainers/EphemeralContainers)"
@@ -239,6 +251,19 @@ namespaces:
239251
description: "ExitCode"
240252
- key: reason
241253
description: "Reason"
254+
conditions:
255+
shortDescription: "Representation of the ArangoMember condition state (true/false)"
256+
description: "Representation of the ArangoMember condition state (true/false)"
257+
type: "Gauge"
258+
labels:
259+
- key: namespace
260+
description: "Deployment Namespace"
261+
- key: name
262+
description: "Deployment Name"
263+
- key: member
264+
description: "Member ID"
265+
- key: condition
266+
description: "Condition Name"
242267
engine:
243268
panics_recovered:
244269
shortDescription: "Number of Panics recovered inside Operator reconciliation loop"

pkg/deployment/deployment_inspector.go

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,18 @@ func (d *Deployment) inspectDeployment(lastInterval util.Interval) util.Interval
153153
d.metrics.Deployment.Propagated = updated.Status.Conditions.IsTrue(api.ConditionTypeSpecPropagated)
154154
d.metrics.Deployment.UpToDate = updated.Status.Conditions.IsTrue(api.ConditionTypeUpToDate)
155155

156+
d.metrics.Conditions.RefreshDeployment(updated.Status.Conditions,
157+
api.ConditionTypeSpecAccepted,
158+
api.ConditionTypeSpecPropagated,
159+
api.ConditionTypeUpToDate)
160+
161+
d.metrics.Conditions.RefreshMembers(updated.Status.Members.AsList(),
162+
api.ConditionTypeServing,
163+
api.ConditionTypeScheduled,
164+
api.ConditionTypeReachable,
165+
api.ConditionTypeStarted,
166+
api.ConditionTypeReady)
167+
156168
// Is the deployment in failed state, if so, give up.
157169
if d.GetPhase() == api.DeploymentPhaseFailed {
158170
d.log.Debug("Deployment is in Failed state.")
@@ -482,6 +494,11 @@ func (d *Deployment) isUpToDateStatus(status api.DeploymentStatus) (upToDate boo
482494
reason = "PVC is resizing"
483495
return
484496
}
497+
if !member.Conditions.IsTrue(api.ConditionTypeReady) {
498+
upToDate = false
499+
reason = "Not all members are ready"
500+
return
501+
}
485502
}
486503

487504
return

pkg/deployment/metrics.go

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//
22
// DISCLAIMER
33
//
4-
// Copyright 2016-2022 ArangoDB GmbH, Cologne, Germany
4+
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
55
//
66
// Licensed under the Apache License, Version 2.0 (the "License");
77
// you may not use this file except in compliance with the License.
@@ -41,6 +41,8 @@ type Metrics struct {
4141
Deployment struct {
4242
Accepted, UpToDate, Propagated bool
4343
}
44+
45+
Conditions ConditionsMetrics
4446
}
4547

4648
func (d *Deployment) CollectMetrics(m metrics.PushMetric) {
@@ -90,4 +92,7 @@ func (d *Deployment) CollectMetrics(m metrics.PushMetric) {
9092
if r := d.resources; r != nil {
9193
r.CollectMetrics(m)
9294
}
95+
96+
// Conditions
97+
d.metrics.Conditions.CollectMetrics(d.namespace, d.name, m)
9398
}

pkg/deployment/metrics_conditions.go

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
//
2+
// DISCLAIMER
3+
//
4+
// Copyright 2024 ArangoDB GmbH, Cologne, Germany
5+
//
6+
// Licensed under the Apache License, Version 2.0 (the "License");
7+
// you may not use this file except in compliance with the License.
8+
// You may obtain a copy of the License at
9+
//
10+
// http://www.apache.org/licenses/LICENSE-2.0
11+
//
12+
// Unless required by applicable law or agreed to in writing, software
13+
// distributed under the License is distributed on an "AS IS" BASIS,
14+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
// See the License for the specific language governing permissions and
16+
// limitations under the License.
17+
//
18+
// Copyright holder is ArangoDB GmbH, Cologne, Germany
19+
//
20+
21+
package deployment
22+
23+
import (
24+
"sync"
25+
26+
api "github.com/arangodb/kube-arangodb/pkg/apis/deployment/v1"
27+
"github.com/arangodb/kube-arangodb/pkg/generated/metric_descriptions"
28+
"github.com/arangodb/kube-arangodb/pkg/util"
29+
"github.com/arangodb/kube-arangodb/pkg/util/metrics"
30+
)
31+
32+
type ConditionsMetricsMap map[api.ConditionType]bool
33+
34+
type ConditionsMetrics struct {
35+
lock sync.Mutex
36+
37+
conditions ConditionsMetricsMap
38+
39+
memberConditions map[string]ConditionsMetricsMap
40+
}
41+
42+
func (c *ConditionsMetrics) CollectMetrics(namespace, name string, m metrics.PushMetric) {
43+
c.lock.Lock()
44+
defer c.lock.Unlock()
45+
46+
for k, v := range c.conditions {
47+
m.Push(metric_descriptions.ArangodbOperatorDeploymentConditionsGauge(util.BoolSwitch[float64](v, 1, 0), namespace, name, string(k)))
48+
}
49+
50+
for member := range c.memberConditions {
51+
for k, v := range c.memberConditions[member] {
52+
m.Push(metric_descriptions.ArangodbOperatorMembersConditionsGauge(util.BoolSwitch[float64](v, 1, 0), namespace, name, member, string(k)))
53+
}
54+
}
55+
}
56+
57+
func (c *ConditionsMetrics) RefreshDeployment(conditions api.ConditionList, types ...api.ConditionType) {
58+
c.lock.Lock()
59+
defer c.lock.Unlock()
60+
61+
c.conditions = c.extractConditionsMap(conditions, types...)
62+
}
63+
64+
func (c *ConditionsMetrics) RefreshMembers(members api.DeploymentStatusMemberElements, types ...api.ConditionType) {
65+
c.lock.Lock()
66+
defer c.lock.Unlock()
67+
68+
if len(members) == 0 {
69+
c.memberConditions = nil
70+
return
71+
}
72+
73+
ret := make(map[string]ConditionsMetricsMap, len(members))
74+
75+
for _, member := range members {
76+
ret[member.Member.ID] = c.extractConditionsMap(member.Member.Conditions, types...)
77+
}
78+
79+
c.memberConditions = ret
80+
}
81+
82+
func (c *ConditionsMetrics) extractConditionsMap(conditions api.ConditionList, types ...api.ConditionType) ConditionsMetricsMap {
83+
if len(types) == 0 {
84+
return nil
85+
}
86+
87+
ret := make(ConditionsMetricsMap, len(types))
88+
for _, t := range types {
89+
ret[t] = conditions.IsTrue(t)
90+
}
91+
92+
return ret
93+
}

pkg/deployment/reconcile/plan_builder_rotate_upgrade.go

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//
22
// DISCLAIMER
33
//
4-
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
4+
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
55
//
66
// Licensed under the Apache License, Version 2.0 (the "License");
77
// you may not use this file except in compliance with the License.
@@ -466,6 +466,8 @@ func groupReadyForRestart(context PlanBuilderContext, status api.DeploymentStatu
466466
return true, "Bootstrap not completed, restart is allowed"
467467
}
468468

469+
members := status.Members.MembersOfGroup(group)
470+
469471
// If current member did not become ready even once. Kill it
470472
if !member.Conditions.IsTrue(api.ConditionTypeStarted) {
471473
return true, "Member is not started"
@@ -476,10 +478,17 @@ func groupReadyForRestart(context PlanBuilderContext, status api.DeploymentStatu
476478
return true, "Member is not serving"
477479
}
478480

479-
if !status.Members.MembersOfGroup(group).AllMembersServing() {
481+
if !members.AllMembersServing() {
480482
return false, "Not all members are serving"
481483
}
482484

485+
if member.Conditions.IsTrue(api.ConditionTypeReady) {
486+
// Our pod is ready, lets check other pods
487+
if !members.AllMembersReady() {
488+
return false, "Not all members are ready"
489+
}
490+
}
491+
483492
switch group {
484493
case api.ServerGroupDBServers:
485494
agencyState, ok := context.GetAgencyCache()

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_health_present.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_healthy.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_leaders.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_member_commit_offset.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_member_serving.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_present.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_cache_serving.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_errors.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_fetches.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/generated/metric_descriptions/arangodb_operator_agency_index.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)
0