Ensuring compatibility of webhook certificates before upgrading to v1.23


Starting from version 1.23, Kubernetes no longer supports server identity validation using the X.509 Common Name (CN) field in certificates. Instead, Kubernetes will only rely on information in the X.509 Subject Alternative Name (SAN) fields.

To prevent impact to your clusters, you must replace incompatible certificates without SANs for backends of webhooks and aggregated API servers before upgrading your clusters to Kubernetes version 1.23.

Why Kubernetes no longer supports backend certificates without SANs

GKE operates open-source Kubernetes, which uses the kube-apiserver component to contact your webhook and aggregated API server backends using Transport Layer Security (TLS). The kube-apiserver component is written in the Go programming language.

Before Go 1.15, TLS clients validated the identity of the servers they connected to using a two-step process:

  1. Check if the DNS name (or IP address) of the server is present as one of the SANs on the server's certificate.
  2. As a fallback, check if the DNS name (or IP address) of the server is equal to the CN on the server's certificate.

RFC 6125 fully deprecated server identity validation based on the CN field in 2011. Browsers and other security-critical applications no longer use the field.

To align with the wider TLS ecosystem, Go 1.15 removed Step 2 from its validation process, but left a debug switch (x509ignoreCN=0) to enable the old behavior to ease the migration process. Kubernetes version 1.19 was the first version built using Go 1.15. GKE clusters on versions from 1.19 to 1.22 enabled the debug switch by default to provide customers with more time to replace the certificates for the affected webhook and aggregated API server backends.

Kubernetes version 1.23 is built with Go 1.17, which removes the debug switch. Once GKE upgrades your clusters to version 1.23, calls will fail to connect from your cluster's control plane to webhooks or aggregated API services that do not provide a valid X.509 certificate with appropriate SAN.

Identifying affected clusters

For clusters running patch versions at least 1.21.9 or 1.22.3

For clusters on patch versions 1.21.9 and 1.22.3 or later with Cloud Logging enabled, GKE provides a Cloud Audit Logs log to identify calls to affected backends from your cluster. You can use the following filter to search for the logs:

logName =~ "projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
resource.type = "k8s_cluster"
operation.producer = "k8s.io"
"invalid-cert.webhook.gke.io"

If your clusters have not called backends with affected certificates, you won't see any logs. If you do see such an audit log, it will include the hostname of the affected backend.

The following is an example of the log entry, for a webhook backend hosted by a service named example-webhook in the default namespace:

{
  ...
  resource {
    type: "k8s_cluster",
    "labels": {
      "location": "us-central1-c",
      "cluster_name": "example-cluster",
      "project_id": "example-project"
    }
  },
  labels: {
    invalid-cert.webhook.gke.io/example-webhook.default.svc: "No subjectAltNames returned from example-webhook.default.svc:8443",
    ...
  },
  logName: "projects/example-project/logs/cloudaudit.googleapis.com%2Factivity",
  operation: {
    ...
    producer: "k8s.io",
    ...
  },
  ...
}

The hostnames of the affected services (e.g. example-webhook.default.svc) are included as suffixes in the label names that start with invalid-cert.webhook.gke.io/. You can also get the name of the cluster that made the call from the resource.labels.cluster_name label, which has example-cluster value in this example.

Deprecation insights

You can learn which clusters use incompatible certificates from deprecation insights. Insights are available for clusters running version 1.22.6-gke.1000 or later.

Other cluster versions

If you have a cluster on a patch version earlier than 1.22.3 on the 1.22 minor version, or any patch version earlier than 1.21.9, you have two options for determining whether your cluster is affected by this deprecation:

Option 1 (recommended): Upgrade your cluster to a patch version that supports identifying affected certificates with logs. Make sure that Cloud Logging is enabled for your cluster. After your cluster has been upgraded, the identifying Cloud Audit Logs logs will be produced each time the cluster attempts to call a Service that does not provide a certificate with an appropriate SAN. As the logs will only be produced on a call attempt, we recommend waiting for 30 days after an upgrade to make enough time for all call paths to be invoked.

Using logs to identify impacted services is recommended because this approach minimizes manual effort by automatically producing logs to show the affected services.

Option 2: Inspect the certificates used by Webhooks or Aggregated API Servers in your clusters to determine whether they are affected because of not having SANs:

  1. Get the list of Webhooks and Aggregated API Servers in your cluster and identify their backends (Services or URLs).
  2. Inspect the certificates used by the backend services.

Given the manual effort required to inspect all certificates in this way, this method should only be followed if you need to assess the impact of the deprecations in Kubernetes version 1.23 before upgrading your cluster to version 1.21. If you can upgrade your cluster to 1.21, you should upgrade it first and then follow the instructions in Option 1 to avoid the manual effort.

Identifying backend services to inspect

To identify backends that might be affected by the deprecation, get the list of Webhooks and Aggregated API Services and their associated backends in the cluster.

To list all relevant webhooks in the cluster, use the following kubectl commands:

kubectl get mutatingwebhookconfigurations -A   # mutating admission webhooks

kubectl get validatingwebhookconfigurations -A # validating admission webhooks

You can get an associated backend Service or URL for a given Webhook by examining clientConfig.service field or webhooks.clientConfig.url field in the Webhook's configuration:

kubectl get mutatingwebhookconfigurations example-webhook -o yaml

The output of this command is similar to the following:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
webhooks:
- admissionReviewVersions:
  clientConfig:
    service:
        name: example-service
        namespace: default
        port: 443

Note that clientConfig can specify its backend as a Kubernetes Service (clientConfig.service), or as a URL (clientConfig.url).

To list all relevant Aggregated API Services in the cluster, use the following kubectl command:

kubectl get apiservices -A |grep -v Local      # aggregated API services

The output of this command is similar to the following:

NAME                     SERVICE                      AVAILABLE   AGE
v1beta1.metrics.k8s.io   kube-system/metrics-server   True        237d

This example returns metric-server Service from the kube-system namespace.

You can get an associated Service for a given Aggregated API by examining spec.service field:

kubectl get apiservices v1beta1.metrics.k8s.io -o yaml

The output of this command is similar to the following:

...
apiVersion: apiregistration.k8s.io/v1
kind: APIService
spec:
  service:
    name: metrics-server
    namespace: kube-system
    port: 443

Inspecting the certificate of a Service

Once you have identified relevant backend Services to inspect, you can inspect the certificate of each specific Service, such as example-service:

  1. Find the selector and target port of the service:

    kubectl describe service example-service
    

    The output of this command is similar to the following:

    Name: example-service
    Namespace: default
    Labels: run=nginx
    Selector: run=nginx
    Type: ClusterIP
    IP: 172.21.xxx.xxx
    Port: 443
    TargetPort: 444
    

    In this example, example-service has the selector run=nginx and the target port 444.

  2. Find a pod matching the selector:

    kubectl get pods --selector=run=nginx
    

    The output of the command is similar to the following:

    NAME          READY   STATUS    RESTARTS   AGE
    example-pod   1/1     Running   0          21m
    
  3. Set up a port forward

    from your kubectl localhost to the pod.

    kubectl port-forward pods/example-pod LOCALHOST_PORT:TARGET_PORT # port forwarding in background
    

    Replace the following in the command:

    • LOCALHOST_PORT: the address to listen on.
    • TARGET_PORT the TargetPort from Step 1.
  4. Use openssl to print the certificate used by the Service:

    openssl s_client -connect localhost:LOCALHOST_PORT </dev/null | openssl x509 -noout -text
    

    This example output shows a valid certificate (with SAN entries):

    Subject: CN = example-service.default.svc
    X509v3 extensions:
      X509v3 Subject Alternative Name:
        DNS:example-service.default.svc
    

    This example output shows a certificate with a missing SAN:

    Subject: CN = example-service.default.svc
      X509v3 extensions:
          X509v3 Key Usage: critical
              Digital Signature, Key Encipherment
          X509v3 Extended Key Usage:
              TLS Web Server Authentication
          X509v3 Authority Key Identifier:
              keyid:1A:5F:29:D8:E9:3C:54:3C:35:CC:D8:AB:D1:21:FD:C3:56:25:C0:74
    
  5. Remove the port forward from running in the background with the following commands:

    $ jobs
    [1]+  Running                 kubectl port-forward pods/example-pod 8888:444 &
    $ kill %1
    [1]+  Terminated              kubectl port-forward pods/example 8888:444
    

Inspecting the certificate of a URL backend

If the webhook uses a url backend, directly connect to the hostname specified in the URL. For example, if the URL is https://example.com:123/foo/bar, use the following openssl command to print the certificate used by the backend:

  openssl s_client -connect example.com:123 </dev/null | openssl x509 -noout -text

Mitigating the risk of 1.23 upgrade

Once you have identified affected clusters and their backend services using certificates without SANs, you must update the webhooks and aggregated API server backends to use certificates with appropriate SANs prior to upgrading the clusters to version 1.23.

GKE will not automatically upgrade clusters on versions 1.22.6-gke.1000 or later with backends using incompatible certificates until you replace the certificates or until version 1.22 reaches end of standard support.

If your cluster is on a GKE version earlier than 1.22.6-gke.1000, you can temporarily prevent automatic upgrades by configuring a maintenance exclusion to prevent minor upgrades.

Resources

See the following resources for additional information on this change:

  • Kubernetes 1.23 release notes
    • Kubernetes is built using Go 1.17. This version of Go removes the ability to use a GODEBUG=x509ignoreCN=0 environment setting to re-enable deprecated legacy behavior of treating the CN of X.509 serving certificates as a host name.
  • Kubernetes 1.19 and Kubernetes 1.20 release notes
    • The deprecated, legacy behavior of treating the CN field on X.509 serving certificates as a host name when no SANs are present is now disabled by default.
  • Go 1.17 release notes
    • The temporary GODEBUG=x509ignoreCN=0 flag has been removed.
  • Go 1.15 release notes
    • The deprecated, legacy behavior of treating the CN field on X.509 certificates as a host when no SANs are present is now disabled by default.
  • RFC 6125 (page 46)
    • Although the use of the CN value is existing practice, it is deprecated, and Certificate Authorities are encouraged to provide subjectAltName values instead.
  • Admission webhooks