8000 [main][Automation] Update elastic/beats to 1dc4d7a6380e by github-actions[bot] · Pull Request #9952 · elastic/elastic-agent · GitHub
[go: up one dir, main page]

Skip to content

Conversation

github-actions[bot]
Copy link
Contributor
@github-actions github-actions bot commented Sep 15, 2025

What

Update elastic/beats to the latest version on branch main.

Changeset


Bump beats

Update to elastic/beats@1dc4d7a6380e

ran shell command ".ci/scripts/update-beats.sh 1dc4d7a6380e"

GitHub Action workflow link
Updatecli logo

Created automatically by Updatecli

Options:

Most of Updatecli configuration is done via its manifest(s).

  • If you close this pull request, Updatecli will automatically reopen it, the next time it runs.
  • If you close this pull request and delete the base branch, Updatecli will automatically recreate it, erasing all previous commits made.

Feel free to report any issues at github.com/updatecli/updatecli.
If you find this tool useful, do not hesitate to star our GitHub repository as a sign of appreciation, and/or to tell us directly on our chat!

@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor
mergify bot commented Sep 15, 2025

This pull request does not have a backport label. Could you fix it @github-actions[bot]? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@ebeahan
Copy link
Member
ebeahan commented Sep 15, 2025

I didn't investigate each failure exhaustingly, but in quick review the test I see fail are related to liveness probing/monitoring:

=== FAIL: testing/integration/ess TestMonitoringPreserveTextConfig/TestMonitoringLiveness (2.14s)
...
=== FAIL: testing/integration/ess TestMonitoringPreserveTextConfig (57.85s)
...
=== FAIL: testing/integration/ess TestMonitoringLivenessReloadable/TestMonitoringLiveness (2.12s)
...
=== FAIL: testing/integration/ess TestMonitoringLivenessReloadable (53.82s)

@nkvoll could these failures relate to the recent liveness changes made in #9673?

@github-actions github-actions bot changed the title [main][Automation] Update elastic/beats to 577fc8462619 [main][Automation] Update elastic/beats to 6495466c8868 Sep 16, 2025
@swiatekm swiatekm self-assigned this Sep 16, 2025
@swiatekm swiatekm force-pushed the updatecli_main_updatecli-update-beats-main branch from 0fa4321 to 0cd3684 Compare September 16, 2025 11:01
@nkvoll
Copy link
Member
nkvoll commented Sep 16, 2025

@nkvoll could these failures relate to the recent liveness changes made in #9673?

Initially I don't think so, as I /think/ these tests basically exec and do elastic-agent status --output json, where the linked PR only changes the HTTP liveness endpoint.

@cmacknz
Copy link
Member
cmacknz commented Sep 16, 2025

These failures are related to the /processes endpoint as well which is separate from the /liveness endpoint.

=== RUN   TestMonitoringLivenessReloadable/TestMonitoringLiveness
    monitoring_probe_reload_test.go:177: component state: Healthy: communicating with pid '65589'
    monitoring_probe_reload_test.go:177: component state: Healthy: communicating with pid '65583'
    monitoring_probe_reload_test.go:177: component state: Healthy: communicating with pid '65564'
    monitoring_probe_reload_test.go:177: component state: Healthy: communicating with pid '65589'
    monitoring_probe_reload_test.go:177: component state: Healthy: communicating with pid '65583'
    monitoring_probe_reload_test.go:177: component state: Healthy: communicating with pid '65564'
    monitoring_probe_reload_test.go:160: 
        	Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1758025637851735731/elastic/elastic-agent/testing/integration/ess/monitoring_probe_reload_test.go:160
        	            				/opt/buildkite-agent/builds/bk-agent-prod-gcp-1758025637851735731/elastic/elastic-agent/testing/integration/ess/monitoring_probe_reload_test.go:143
        	            				/opt/buildkite-agent/.asdf/installs/golang/1.24.7/go/src/runtime/asm_amd64.s:1700
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:6792/liveness": dial tcp 127.0.0.1:6792: connect: connection refused
        	Test:       	TestMonitoringLivenessReloadable/TestMonitoringLiveness
    --- FAIL: TestMonitoringLivenessReloadable/TestMonitoringLiveness (2.14s)

Looking at the JSON test logs, the test fails at 2025-09-16T12:59:32.168290271Z

{"Time":"2025-09-16T12:59:32.168290271Z","Action":"output","Package":"github.com/elastic/elastic-agent/testing/integration/ess","Test":"TestMonitoringLivenessReloadable/TestMonitoringLiveness","Output":"        \t            \tGet \"http://localhost:6792/liveness\": dial tcp 127.0.0.1:6792: connect: connection refused\n"}

Looking at the agent diagnostics the monitoring server changes 3 times with the change to 127.0.0.1:6792 happening last at 2025-09-16T12:59:33.669Z which is ~1s after 16T12:59:32.168290271Z.

{"log.level":"info","@timestamp":"2025-09-16T12:59:09.248Z","log.logger":"api","log.origin":{"function":"github.com/elastic/elastic-agent-libs/api.(*Server).Start.func1","file.name":"api/server.go","file.line":87},"message":"Metrics endpoint listening on: 127.0.0.1:6791 (configured: http://localhost:6791)","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

{"log.level":"info","@timestamp":"2025-09-16T12:59:31.394Z","log.logger":"api","log.origin":{"function":"github.com/elastic/elastic-agent-libs/api.(*Server).Start.func1","file.name":"api/server.go","file.line":87},"message":"Metrics endpoint listening on: 127.0.0.1:6791 (configured: http://localhost:6791)","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

{"log.level":"info","@timestamp":"2025-09-16T12:59:33.669Z","log.logger":"api","log.origin":{"function":"github.com/elastic/elastic-agent-libs/api.(*Server).Start.func1","file.name":"api/server.go","file.line":87},"message":"Metrics endpoint listening on: 127.0.0.1:6792 (configured: http://localhost:6792)","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

The way the test is written seems inherently racy, it makes a policy change via the override API, then checks for components to be healthy (which is not an indicator the policy change happened), then makes a single HTTP request an expects it to succeed.

overrideEndpoint := fmt.Sprintf("/api/fleet/agent_policies/%s", runner.policyID)
statusCode, overrideResp, err := runner.info.KibanaClient.Request("PUT", overrideEndpoint, nil, nil, reader)
require.NoError(runner.T(), err)
require.Equal(runner.T(), http.StatusOK, statusCode, "non-200 status code; got response: %s", string(overrideResp))
runner.AllComponentsHealthy(ctx)
updatedEndpoint := "http://localhost:6792/processes"
// second stage: ensure the HTTP config has updated
req, err = http.NewRequestWithContext(ctx, "GET", updatedEndpoint, nil)
require.NoError(runner.T(), err)

The simplest fix is probably just to add some retries, a slightly better fix would be to poll for the policy revision to change or even the log line indicating the metrics server port change happened.

@cmacknz
Copy link
Member
cmacknz commented Sep 16, 2025

I linked to the /processes test above, the /liveness test does the same thing.

overrideEndpoint := fmt.Sprintf("/api/fleet/agent_policies/%s", runner.policyID)
statusCode, overrideResp, err := runner.info.KibanaClient.Request("PUT", overrideEndpoint, nil, nil, reader)
require.NoError(runner.T(), err)
require.Equal(runner.T(), http.StatusOK, statusCode, "non-200 status code; got response: %s", string(overrideResp))
runner.AllComponentsHealthy(ctx)
// check to make sure that we now have a liveness probe response
req, err = http.NewRequestWithContext(ctx, "GET", endpoint, nil)
require.NoError(runner.T(), err)

@ycombinator
Copy link
Contributor
ycombinator commented Sep 16, 2025

I've narrowed down the cause of the TestClientWithCertificate FIPS unit test failing. It has something to do with the github.com/stretchr/testify dependency being bumped from v1.10.0 to v1.11.1. In fact, the test fails if the dependency is bumped from v1.10.0 to v1.11.0 (which is the next release after v1.10.0) as well. Looking into what specifically changed between these versions that would cause the test to start failing.

[UPDATE 1] The change I made in 8525a53 should get this test to pass again.

[UPDATE 2] I didn't take a closer look but I suspect something in stretchr/testify#1427, which was included in v1.11.0, might've caused the breakage 🤷.

@ycombinator ycombinator force-pushed the updatecli_main_updatecli-update-beats-main branch from ce94411 to 8525a53 Compare September 16, 2025 23:11
@github-actions github-actions bot changed the title [main][Automation] Update elastic/beats to 6495466c8868 [main][Automation] Update elastic/beats to e3c6cce935b8 Sep 17, 2025
github-actions bot and others added 5 commits September 17, 2025 22:33
Made with ❤️️ by updatecli
* Add rollback field to UpgradeRequest

* Introduce rollback parameter to upgrade

* Concurrently retry taking over watcher

* Gracefully shutdown agent watcher

* Add rollbacks available to upgrade marker

* disable rollback window by default

* Add formal checks to manual rollback arguments

* Add minimum version check for creating rollbacks entries in update marker

* Gracefully terminate watcher process on windows

* Allow watcher to listen to signals only during watch loop

* make watcher rollback only if the agent has not been already rolled back

* Remove parent death signal for watcher on linux

* Distinguish between upgrade and rollback operations in upgrade subcommand

* remove DESIRED_OUTCOME in favor of watch --rollback

* Add version agent rollbacks to in manual rollback reason

* Check upgrade details state before allowing a manual rollback
…tion CFT environment (#10007)

* Have FIPS integration tests spin up deployments in Production CFT environment

* Add explanatory comment

* Run extended tests if FIPS integration tests pipeline changes

* Revert "Run extended tests if FIPS integration tests pipeline changes"

This reverts commit ae0e89d.
* Fix monitoring reloading tests

Ensure we verify the policy update was actually applied before querying
the monitoring endpoint. Add a retry on said request as well.

* Use the explicit policy revision returned by Kibana
@cmacknz cmacknz force-pushed the updatecli_main_updatecli-update-beats-main branch from 129623d to 183b202 Compare September 18, 2025 02:33
@cmacknz cmacknz requested a review from a team as a code owner September 18, 2025 02:33
@cmacknz
Copy link
Member
cmacknz commented Sep 18, 2025

Closing in favor of #10019, there were several conflicts in this branch with the fixes on main and my attempt to force rebase the fixes out failed.

@cmacknz cmacknz closed this Sep 18, 2025
Copy link

@elasticmachine
Copy link
Collaborator
elasticmachine commented Sep 18, 2025

💔 Build Failed

Failed CI Steps

History

cc @swiatekm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0