8000 Add number of NIC replicas to telemetry data by jjngx · Pull Request #5245 · nginx/kubernetes-ingress · GitHub
[go: up one dir, main page]

Skip to content

Conversation

jjngx
Copy link
Contributor
@jjngx jjngx commented Mar 13, 2024

Proposed changes

This PR adds NIC replica count information to the telemetry data.

commands

13187  k scale -n nginx-ingress deployment nginx-ingress --replicas=1
13188  k scale -n nginx-ingress deployment nginx-ingress --replicas=3
13189  k scale -n nginx-ingress deployment nginx-ingress --replicas=6
13190  k scale -n nginx-ingress deployment nginx-ingress --replicas=9
13191  k scale -n nginx-ingress deployment nginx-ingress --replicas=1

NIC log:

I0313 17:37:20.342002       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:1}}
I0313 17:37:25.537657       1 collector.go:91] Collecting telemetry data
I0313 17:37:25.555606       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:3}}
I0313 17:37:30.664850       1 collector.go:91] Collecting telemetry data
I0313 17:37:30.686694       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:3}}
I0313 17:37:35.749217       1 collector.go:91] Collecting telemetry data
I0313 17:37:35.780244       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:3}}
I0313 17:37:41.250974       1 collector.go:91] Collecting telemetry data
I0313 17:37:41.281874       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:6}}
I0313 17:37:46.482953       1 collector.go:91] Collecting telemetry data
I0313 17:37:46.520021       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:6}}
I0313 17:37:51.660169       1 collector.go:91] Collecting telemetry data
I0313 17:37:51.696565       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:9}}
I0313 17:37:56.975177       1 collector.go:91] Collecting telemetry data
I0313 17:37:56.980994       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:1}}

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@jjngx jjngx requested a review from a team as a code owner March 13, 2024 17:50
@github-actions github-actions bot added enhancement Pull requests for new features/feature enhancements helm_chart Pull requests that update the Helm Chart labels Mar 13, 2024
@jjngx jjngx added this to the v3.5.0 milestone Mar 13, 2024
@jjngx jjngx linked an issue Mar 13, 2024 that may be closed by this pull request
shaun-nx
< 8000 div class="flex-auto flex-md-self-center"> shaun-nx suggested changes Mar 13, 2024
Copy link
Contributor
@shaun-nx shaun-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test with this branch using Daemonset as the deployment kind.
In this case, Replicas is 0 since we don't have any ReplicaSets to pull from.
So we will need some way to know if NIC is deployed as a Daemonset or a Deployment for this.

Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:a5419913-6bca-47b0-b7b8-e6f28ff5a1be ClusterVersion:v1.27.4+k3s1 ClusterPlatform:k3s InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:1 VirtualServerRoutes:0 TransportServers:0 Replicas:0}}

One approach we can take is to check the CurrentNumberScheduled field for a Daemonset

Here is a code snippet of what I tried:

// ReplicaCount returns a number of running NIC replicas.
func (c *Collector) ReplicaCount(ctx context.Context) (int, error) {
	var rs *appsV1.ReplicaSet
	var ds *appsV1.DaemonSet
	var replicas int
	pod, err := c.Config.K8sClientReader.CoreV1().Pods(c.Config.PodNSName.Namespace).Get(ctx, c.Config.PodNSName.Name, metaV1.GetOptions{})
	if err != nil {
		return 0, err
	}
	podRef := pod.GetOwnerReferences()
	if len(podRef) != 1 {
		return 0, fmt.Errorf("expected pod owner reference to be 1, got %d", len(podRef))
	}

	// Switch based on the owner reference
	switch podRef[0].Kind {
	case "ReplicaSet":
		rs, err = c.Config.K8sClientReader.AppsV1().ReplicaSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(*rs.Spec.Replicas)
	case "DaemonSet":
		ds, err = c.Config.K8sClientReader.AppsV1().DaemonSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(ds.Status.CurrentNumberScheduled)
	default:
		return 0, fmt.Errorf("expected pod owner reference to be either ReplicaSet or DaemonSet, got %s", pod.OwnerReferences[0].Kind)
	}
	return replicas, nil
}

If we do go with this approach, we will need to add more permissions to our RBAC:

E0313 18:44:02.647830       1 collector.go:94] Error collecting telemetry data: daemonsets.apps "test-nginx-ingress-controller" is forbidden: User "system:serviceaccount:default:test-nginx-ingress" cannot get resource "daemonsets" in API group "apps" in the namespace "default"

@jjngx
Copy link
Contributor Author
jjngx commented Mar 13, 2024

NIC logs on multinode cluster, DaemonSet

I0313 20:43:27.718603       1 collector.go:91] Collecting telemetry data
I0313 20:43:27.732759       1 collector.go:119
8000
] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:2bbabd21-3a0f-48cd-9e02-eb3f1fa2febd ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:3} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:2}}

@jjngx jjngx requested a review from shaun-nx March 13, 2024 20:51
@jjngx
Copy link
Contributor Author
jjngx commented Mar 13, 2024

I did a test with this branch using Daemonset as the deployment kind. In this case, Replicas is 0 since we don't have any ReplicaSets to pull from. So we will need some way to know if NIC is deployed as a Daemonset or a Deployment for this.

Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:a5419913-6bca-47b0-b7b8-e6f28ff5a1be ClusterVersion:v1.27.4+k3s1 ClusterPlatform:k3s InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:1 VirtualServerRoutes:0 TransportServers:0 Replicas:0}}

One approach we can take is to check the CurrentNumberScheduled field for a Daemonset

Here is a code snippet of what I tried:

// ReplicaCount returns a number of running NIC replicas.
func (c *Collector) ReplicaCount(ctx context.Context) (int, error) {
	var rs *appsV1.ReplicaSet
	var ds *appsV1.DaemonSet
	var replicas int
	pod, err := c.Config.K8sClientReader.CoreV1().Pods(c.Config.PodNSName.Namespace).Get(ctx, c.Config.PodNSName.Name, metaV1.GetOptions{})
	if err != nil {
		return 0, err
	}
	podRef := pod.GetOwnerReferences()
	if len(podRef) != 1 {
		return 0, fmt.Errorf("expected pod owner reference to be 1, got %d", len(podRef))
	}

	// Switch based on the owner reference
	switch podRef[0].Kind {
	case "ReplicaSet":
		rs, err = c.Config.K8sClientReader.AppsV1().ReplicaSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(*rs.Spec.Replicas)
	case "DaemonSet":
		ds, err = c.Config.K8sClientReader.AppsV1().DaemonSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(ds.Status.CurrentNumberScheduled)
	default:
		return 0, fmt.Errorf("expected pod owner reference to be either ReplicaSet or DaemonSet, got %s", pod.OwnerReferences[0].Kind)
	}
	return replicas, nil
}

If we do go with this approach, we will need to add more permissions to our RBAC:

E0313 18:44:02.647830       1 collector.go:94] Error collecting telemetry data: daemonsets.apps "test-nginx-ingress-controller" is forbidden: User "system:serviceaccount:default:test-nginx-ingress" cannot get resource "daemonsets" in API group "apps" in the namespace "default"

Thanks for catching this @shaun-nx ! Added DaemonSet to the telemetry.

@jjngx jjngx merged commit 4533e23 into main Mar 14, 2024
@jjngx jjngx deleted the feat/tel-nic-repolicas branch March 14, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Pull requests for new features/feature enhancements helm_chart Pull requests that update the Helm Chart
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect number of deployment/daemonset replicas of NIC
3 participants
0