8000 Add process and go runtime metrics for controller by mindw · Pull Request #6966 · cert-manager/cert-manager · GitHub
[go: up one dir, main page]

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion pkg/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import (

"github.com/go-logr/logr"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
"github.com/prometheus/client_golang/prometheus/promhttp"
"k8s.io/utils/clock"

Expand Down Expand Up @@ -186,10 +187,16 @@ func New(log logr.Logger, c clock.Clock) *Metrics {
)
)

// Create Registry and register the recommended collectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest rewording this comment to explain why rather than what the following code does.
Perhaps:

We want to avoid using and mutating global state so we avoid
prometheus.DefaultRegisterer. Instead we create a local Registry and manually register
the same two collectors that are used by DefaultRegisterer.

Process collector

Exports the current state of process metrics including CPU, memory and file
descriptor usage as well as the process start time. The detailed behavior is
defined by the provided ProcessCollectorOpts. The zero value of
ProcessCollectorOpts creates a collector for the current process with an empty
namespace string and no error reporting.

The collector only works on operating systems with a Linux-style proc
filesystem and on Microsoft Windows. On other operating systems, it will not
collect any metrics.

Go collector

Exports metrics about the current Go process using debug.GCStats (base
metrics) and runtime/metrics.

registry := prometheus.NewRegistry()
registry.MustRegister(
collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}),
collectors.NewGoCollector(),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have been confused in my last review.
I thought the default collector included the debug metrics, but I see now that it does not.

We can add the debug metrics in a later PR if people call for it.

// Create server and register Prometheus metrics handler
m := &Metrics{
log: log.WithName("metrics"),
registry: prometheus.NewRegistry(),
registry: registry,

clockTimeSeconds: clockTimeSeconds,
clockTimeSecondsGauge: clockTimeSecondsGauge,
Expand Down
3 changes: 2 additions & 1 deletion test/integration/certificates/metrics_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ func TestMetricsController(t *testing.T) {
return err
}

if strings.TrimSpace(string(output)) != strings.TrimSpace(expectedOutput) {
trimmedOutput := strings.SplitN(string(output), "# HELP go_gc_duration_seconds", 2)[0]
if strings.TrimSpace(trimmedOutput) != strings.TrimSpace(expectedOutput) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this splits the metrics in two, on the first line of go_ metrics, and compares the first half assuming that the first half contains the cert-manger metrics.
Might be a bit brittle...likely to break if we ever introduce other metrics, but I guess we can cross that bridge when we come to it.

$ kubectl get --raw /api/v1/namespaces/cert-manager/pods/cert-manager-5844947869-zn7jg:9402/proxy/metrics | fgrep -C 3 '# HELP go_gc_duration_seconds'
certmanager_controller_sync_error_count{controller="certificates-key-manager"} 713
certmanager_controller_sync_error_count{controller="certificates-readiness"} 23
certmanager_controller_sync_error_count{controller="certificates-trigger"} 21
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.8496e-05
go_gc_duration_seconds{quantile="0.25"} 0.000419895

return fmt.Errorf("got unexpected metrics output\nexp:\n%s\ngot:\n%s\n",
expectedOutput, output)
}
Expand Down
0