FFFF linkerd2-proxy panics with "period must be non-zero" after destination pod OOM, causing permanent "buffer's worker closed unexpectedly" state requiring sidecar restart · Issue #14954 · linkerd/linkerd2 · GitHub
[go: up one dir, main page]

Skip to content

linkerd2-proxy panics with "period must be non-zero" after destination pod OOM, causing permanent "buffer's worker closed unexpectedly" state requiring sidecar restart #14954

@kmramit

Description

@kmramit

What is the issue?

After the linkerd-proxy sidecar container in the linkerd-destination control plane pod was OOMKilled, all data plane sidecar proxies entered a permanently broken state where every outbound request fails with buffer's worker closed unexpectedly. The proxies did not self-heal even after the destination pod fully recovered. A manual restart of the data plane sidecar proxies (via pod restart) was required to restore connectivity.

The root cause is a panic in control.rs at line 118:
thread 'main' panicked at linkerd/app/core/src/control.rs:118:49: period must be non-zero.

This panic was triggered when DNS resolution for linkerd-dst-headless.linkerd.svc.cluster.local returned zero results during the brief window when the destination pod was recovering from OOM. The zero-result DNS response caused a resolution period of zero to be computed, which triggered the panic.
The panic killed the internal balance queu 6880 e worker task (at linkerd/proxy/balance/queue/src/service.rs:73), after which the proxy's outbound pipeline was permanently broken. Every subsequent outbound connection attempt failed with:

WARN outbound: linkerd_app_core::serve: Server failed to become ready error=buffer's worker closed unexpectedly

This is related to #14333

How can it be reproduced?

Restarting linkerd destination several times may help to reproduce

Logs, error output, etc

Phase 1: Normal startup

INFO linkerd2_proxy: release 2.316.0 (0a932ea) by linkerd on 2025-08-27T03:53:54Z INFO dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=10.x.x.x:x INFO dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=10.x.x.x:x INFO dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=10.x.x.x:x INFO linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:x

Phase 2: OOM event on destination pod proxy - gRPC streams break

WARN watch{port=8080}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Unknown error grpc.message="h2 protocol error: error reading a body from connection" WARN policy:controller:endpoint{addr=10.x.x.x:x}: linkerd_reconnect: Service failed error=endpoint 10.x.x.x:x: channel closed WARN policy:controller:endpoint{addr=10.x.x.x:x}: linkerd_reconnect: Failed to connect error=endpoint 10.x.x.x:x: Connection refused (os error 111)

Phase 3: DNS resolution fails for destination service
WARN dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: proto error: no records found for Query { name: Name("linkerd-dst-headless.linkerd.svc.cluster.local."), query_type: SRV, query_class: IN }; failed to resolve A record: proto error: no records found for Query { name: Name("linkerd-dst-headless.linkerd.svc.cluster.local."), query_type: AAAA, query_class: IN }

Phase 4: Fatal panic - period must be non-zero
thread 'main' panicked at linkerd/app/core/src/control.rs:118:49: period must be non-zero. note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Phase 5: Permanent broken state
thread 'main' panicked at /__w/linkerd2-proxy/linkerd2-proxy/linkerd/proxy/balance/queue/src/service.rs:73:18: worker must set a failure if it exits prematurely WARN outbound: linkerd_app_core::serve: Server failed to become ready error=buffer's worker closed unexpectedly client.addr=10.x.x.x:x

output of linkerd check -o short

NA

Environment

Linkerd version: stable-2.316.0 (proxy release 2.316.0, built 2025-08-27)

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0