Cheatsheet: Kubernetes Monitoring
Cluster state metrics MORE INFO > Container metrics
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND
Running pods kube_pod_status_phase kubectl get pods Containers running on a pod kube_pod_container_info kubectl describe pod <POD_NAME>
Number of pods desired for a Containers restarted on a pod kube_pod_container_status_restarts_total kubectl describe pod <POD_NAME>
kube_deployment_spec_replicas kubectl get deployment <DEPLOYMENT>
Deployment
Containers terminated on a pod kube_pod_container_status_terminated kubectl describe pod <POD_NAME>
Number of pods desired for a
kube_daemonset_status_desired_number_scheduled kubectl get daemonset <DAEMONSET>
DaemonSet
Number of pods currently running
in a Deployment
kube_deployment_status_replicas kubectl get deployment <DEPLOYMENT> Disk I/O & Network metrics
Number of pods currently running DESCRIPTION PROMETHEUS METRIC NAME COMMAND
kube_daemonset_status_current_number_scheduled kubectl get daemonset <DAEMONSET>
in a DaemonSet kubectl get --raw /api/v1/nodes/<NODE_
Network in per node container_network_receive_bytes_total
Number of pods currently NAME>/proxy/metrics/cadvisor
kube_deployment_status_replicas_available kubectl get deployment <DEPLOYMENT>
available in a Deployment kubectl get --raw /api/v1/nodes/<NODE_
Network out per node container_network_transmit_bytes_total
Number of pods currently NAME>/proxy/metrics/cadvisor
kube_daemonset_status_number_available kubectl get daemonset <DAEMONSET>
available in a DaemonSet kubectl get --raw /api/v1/nodes/<NODE_
Disk writes per node container_fs_writes_bytes_total
Number of pods currently not NAME>/proxy/metrics/cadvisor
kube_deployment_status_replicas_unavailable kubectl get deployment <DEPLOYMENT>
available in a Deployment kubectl get --raw /api/v1/nodes/<NODE_
Disk reads per node container_fs_reads_bytes_total
NAME>/proxy/metrics/cadvisor
Number of pods currently not
kube_daemonset_status_number_unavailable kubectl get daemonset <DAEMONSET>
available in a DaemonSet container_network_receive_errors_total, kubectl get --raw /api/v1/nodes/<NODE_
Network errors per node
container_network_transmit_errors_total NAME>/proxy/metrics/cadvisor
Node resource and status metrics MORE INFO >
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND
Kubernetes events MORE INFO >
DESCRIPTION COMMAND
Current health status of a node
kube_node_status_condition kubectl describe node <NODE_NAME>
(kubelet) List events kubectl get events
Total memory requests (bytes)
kube_pod_container_resource_requests_memory_bytes kubectl describe node <NODE_NAME>
per node
Total memory in use on a node N/A kubectl describe node <NODE_NAME>
Total CPU requests (cores) per
kube_pod_container_resource_requests_cpu_cores kubectl describe node <NODE_NAME>
node
Total CPU in use on a node N/A kubectl describe node <NODE_NAME>
Job metrics MORE INFO >
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND
kubectl get jobs --all-namespaces |
Number of successful jobs kube_job_status_succeeded
grep “succeeded”
kubectl get jobs --all-namespaces |
Number of failed jobs kube_job_status_failed
grep “failed”
Number of active jobs kube_job_status_active kubectl get jobs --all-namespaces
Number of CronJobs kube_cronjob_info kubectl get cronjobs --all-namespaces
Service metrics MORE INFO >
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND
Service types per cluster kube_service_info kubectl get services --all-namespaces
Number of pods running by kubectl get pods --selector=<SERVICE_SELECTOR>
kubectl get jobs --all-namespaces
service -o=name
Cheatsheet: Kubernetes Monitoring with Datadog
1. Cluster state metrics
METRIC DESCRIPTION DATADOG STATUS CHECK/METRIC NAME
Running pods kubernetes.pods.running
Number of pods desired for a Deployment kubernetes_state.deployment.replicas_desired
Number of pods desired for a DaemonSet kubernetes_state.daemonset.desired
Number of pods currently running in a Deployment kubernetes_state.deployment.replicas
Number of pods currently running in a DaemonSet kubernetes_state.daemonset.scheduled
Number of pods currently available in a Deployment kubernetes_state.deployment.replicas_available
Number of pods currently available in a DaemonSet kubernetes_state.daemonset.ready
Number of pods currently not available in a Deployment kubernetes_state.deployment.replicas_unavailable
Number of pods currently not available in a DaemonSet kubernetes_state.daemonset.desired - kubernetes_state.daemonset.ready
2. Node resource and status metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Current health status of a node (kubelet) kubernetes.kubelet.check
Total memory requests (bytes) per node kubernetes.memory.requests
Total memory in use on a node kubernetes.memory.usage
Total CPU requests (cores) per node kubernetes.cpu.requests
Total CPU in use on a node kubernetes.cpu.usage.total
3. Job metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Number of successful jobs kubernetes_state.job.succeeded
Number of failed jobs kubernetes_state.job.failed
Number of active jobs kubernetes_state.job.count
Number of CronJobs kubernetes_state.job.count (filtered by the owner_kind:cronjob tag)
4. Service metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Service types per cluster kubernetes_state.service.count
Number of pods running by service kubernetes.pods.running
5. Container metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Containers running on a pod kubernetes_state.container.running
Containers restarted on a pod kubernetes_state.container.restarts
Containers terminated on a pod kubernetes_state.container.terminated
6. Disk I/O & Network metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Network in per node kubernetes.network.rx_bytes
Network out per node kubernetes.network.tx_bytes
Disk writes per node kubernetes.io.write_bytes
Disk reads per node kubernetes.io.read_bytes
Network errors per node kubernetes.network.rx_errors, kubernetes.network.tx_errors
7. Events
Kubernetes events will appear in the Datadog Events Explorer and in event widgets on dashboards