8000 Add live graph backend (merge to feature branch) by wildum · Pull Request #2645 · grafana/alloy · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add live graph backend (merge to feature branch) #2645

Open
wants to merge 9 commits into
base: alloy-live-graph
Choose a base branch
from

Conversation

wildum
Copy link
Contributor
@wildum wildum commented Feb 7, 2025

PR Description

This is the backend part of the new live graph.

The approach is similar to live debugging, except that it adds the callback to all components in the given module.

The live debugging feed is now a struct that contains data needed to build the new graph. Because the data string is not needed for the graph, it's passed around as a function. This way we can use the same struct for the live debugging and the live graph with limited performance overhead.

Which issue(s) this PR fixes

Fixes #2608

Notes to the Reviewer

This was tested manually via curl:

  • curl -N http://localhost:12345/api/v0/web/graph to get the data from all components at the root
  • curl -N http://localhost:12345/api/v0/web/graph/{moduleID} to get data from a particular module

PR Checklist

  • [na] CHANGELOG.md updated (will be done in the feature branch)
  • [na] Documentation added (will be done in another branch to the feature branch)
  • Tests updated
  • [na] Config converters updated

@wildum wildum requested a review from a team as a code owner February 7, 2025 11:07
Copy link
Contributor
@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! some comments, but nothing big

internal/component/discovery/discovery.go Outdated Show resolved Hide resolved
for _, cp := range components {
if c, ok := cp.Component.(component.LiveDebugging); ok {
// notify the component of the change
c.LiveDebugging(len(s.callbacks[ComponentID(cp.ID.String())]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: when LiveDebugging calls Update, won't it race with the runtime calling Update at the same time? I think runtime won't call Update concurrently, but with LiveDebugging it's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, definitely something we missed in the first implementation. I protected the Update functions that had no mutex with mutexes. Lmk if you have a better solution in mind to avoid this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's also not too great because other Update methods don't need mutexes, so it will be confusing and someone may one day remove it... so unless we have tests that verify this, it's a bit fragile.

8000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking another look: I'd be fine with mutex solution if you also add a comment on why this mutex is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you were right about the fact that it's a bit fragile. After testing, I triggered a data race because in the LiveDebugging function I use p.args which is being modified in the Update function :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuuuh, that's hairy... maybe worth looking into providing a callback fucntion to clean up subscriptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with a different approach here (for the receiver only) : 0dd571d

Copy link
Contributor
@thampiotr thampiotr Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach still reads kinda weird to me, because the path from the runtime calling is Update() -> update() and it's odd that we grab a lock, release it and then call update() and grab the same lock again....

I know you can refactor it further to make the mutex approach slightly more readable, but I have general concerns about this structure.

Can't we refactor the livedebuggingconsumer to:

  • have function pointers that allow to override behaviour like interceptor in Prometheus components, so it's more universal
  • rename it to interceptconsumer or something like that
  • always add it to the pipeline if livedebugging is enabled (as a feature), even if it's inactive for this component - if we do a lean implementation in the interceptors that checks whether it's active and quits, I think the overhead should be minimal? benchmarks should be able to confirm this
  • then we don't need the whole magic of notifying callbacks via LiveDebugging() method calling the Update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this should work, I guess we could simply intercept at the fanout consumer. I can give it a shot tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked it following your guidance, let me know if it matches what you expected. I can polish further with tests and comments if needed.
The livedebugging method is not called anymore and is just there for the interface marker. Because there is no Update magic, there is no need to notify the components anymore when a callback is added/removed. This simplified the live debugging service code. I like the approach that you suggested, it's much better than what we had :)

internal/web/api/api.go Outdated Show resolved Hide resolved
internal/web/api/api.go Show resolved Hide resolved
internal/web/api/api.go Outdated Show resolved Hide resolved
internal/web/api/api.go Outdated Show resolved Hide resolved
@wildum wildum force-pushed the live-graph-backend branch from a79c730 to 0dd571d Compare February 12, 2025 14:38
Comment on lines 55 to 57
if lazy, ok := cons.(*lazyconsumer.Consumer); ok {
ids = append(ids, lazy.ComponentID())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fragile too IMO because next time someone refactors and adds another layer of consumer wrappers for some reason, this will stop working

Can we avoid having to type cast entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved it a bit by adding an interface. Not super happy with the name of the interface though. I did not see another simple way to get the componentIDs from the next consumers

Copy link
Contributor
@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much, much cleaner without the callbacks to LiveDebugging(...) function from the service. Few more comments, but we're almost there!

mutatesData bool // must be set to true if the provided opts modifies the data
}

func Logs(nextLogs otelconsumer.Logs, mutatesData bool, f LogsInterceptorFunc) otelconsumer.Logs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much prefer to avoid having bool function parameters - check out some reasoning about it here: https://alexkondov.com/should-you-pass-boolean-to-functions/

Would be better to have Logs() and LogsMutating() or something like this.

@@ -0,0 +1,37 @@
package interceptorconsumer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package interceptorconsumer
package interceptconsumer

shorter and still makes sense

"go.opentelemetry.io/collector/pdata/plog"
)

type LogsInterceptorFunc func(context.Context, plog.Logs) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the 3 telemetry signals, can we use Go generics to simplify the code here? Seems like the variation would be only really on plogs.Logs and otelconsumer.Logs types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works for plogs.Logs but I don't think that we can use a generic type for otelconsumers.Logs because we call ConsumeLogs with it. That means we will still need an interceptor type for each telemetry signal and each will still have its corresponding ConsumeMetrics|ConsumeLogs|ConsumeTraces func

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Maybe that could be avoided with a func but if not, then we'll need to live with this duplication :(

"go.opentelemetry.io/collector/pdata/ptrace"
)

func extractIds(consumers []otelcol.Consumer) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep exported functions on top of the file and helpers at the bottom

Comment on lines +15 to +18
type ConsumerWithComponentID interface {
Consumer
ComponentID() string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type ConsumerWithComponentID interface {
Consumer
ComponentID() string
}
// ComponentMetadata can be implemented by, for example, consumers exported by components, to provide the ID of the component which is exporting given consumer. This is used for Live Graph / Live Debugging features.
type ComponentMetadata interface {
ComponentID() string
}

I think that's pretty good. I also don't think it needs to be a Consumer.

return ids
}

func PublishLogsIfActive(debugDataPublisher livedebugging.DebugDataPublisher, componentID string, ld plog.Logs, nextLogs []otelcol.Consumer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think nextLogs needs to be otelcol.Consumer, you only ever use them to extract component ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but it's quite handy for the components because it's easy to provide.

Would you prefer that I change it to:
PublishLogsIfActive(debugDataPublisher livedebugging.DebugDataPublisher, componentID string, ld plog.Logs, componentIDs []string)
and that I export the extractIDs function?

Or do you have a different idea in mind? I'd like to keep the code as little intrusive as possible in the components

DataFunc func() string
}

func NewData(componentID ComponentID, dataType DataType, count uint64, dataFunc func() string, opts ...DataOption) *Data {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested with returning Data by value instead of the pointer? some stack optimisations could reduce allocation and improve performance, but that needs benchmarks.

moduleID = livedebugging.ModuleID(vars["moduleID"])
}

window := setWindow(w, r.URL.Query().Get("window")) // in seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
window := setWindow(w, r.URL.Query().Get("window")) // in seconds
windowSeconds := setWindow(w, r.URL.Query().Get("window"))

Generally if we can make code more readable without comments, it's better to do that - comments may be treated as a duct tape to help make sense of a code that is otherwise not too readable ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can't window become a time.Duration right away please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use it as a float to calculate the rate:
data.Rate = float64(data.Count) / float64(window)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a problem that cannot be overcome if window is a duration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0