Add live graph backend (merge to feature branch) #2645

wildum · 2025-02-07T11:07:58Z

PR Description

This is the backend part of the new live graph.

The approach is similar to live debugging, except that it adds the callback to all components in the given module.

The live debugging feed is now a struct that contains data needed to build the new graph. Because the data string is not needed for the graph, it's passed around as a function. This way we can use the same struct for the live debugging and the live graph with limited performance overhead.

Which issue(s) this PR fixes

Fixes #2608

Notes to the Reviewer

This was tested manually via curl:

curl -N http://localhost:12345/api/v0/web/graph to get the data from all components at the root
curl -N http://localhost:12345/api/v0/web/graph/{moduleID} to get data from a particular module

PR Checklist

[na] CHANGELOG.md updated (will be done in the feature branch)
[na] Documentation added (will be done in another branch to the feature branch)
Tests updated
[na] Config converters updated

thampiotr

Looking good! some comments, but nothing big

internal/component/discovery/discovery.go

internal/component/otelcol/internal/lazyconsumer/lazyconsumer.go

internal/component/otelcol/internal/lazyconsumer/lazyconsumer_test.go

internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go

thampiotr · 2025-02-11T11:48:09Z

internal/service/livedebugging/livedebugging.go

+	for _, cp := range components {
+		if c, ok := cp.Component.(component.LiveDebugging); ok {
+			// notify the component of the change
+			c.LiveDebugging(len(s.callbacks[ComponentID(cp.ID.String())]))


Another question: when LiveDebugging calls Update, won't it race with the runtime calling Update at the same time? I think runtime won't call Update concurrently, but with LiveDebugging it's possible.

nice catch, definitely something we missed in the first implementation. I protected the Update functions that had no mutex with mutexes. Lmk if you have a better solution in mind to avoid this

Yeah, it's also not too great because other Update methods don't need mutexes, so it will be confusing and someone may one day remove it... so unless we have tests that verify this, it's a bit fragile.

After taking another look: I'd be fine with mutex solution if you also add a comment on why this mutex is necessary.

you were right about the fact that it's a bit fragile. After testing, I triggered a data race because in the LiveDebugging function I use p.args which is being modified in the Update function :(

uuuuh, that's hairy... maybe worth looking into providing a callback fucntion to clean up subscriptions?

I went with a different approach here (for the receiver only) : 0dd571d

This approach still reads kinda weird to me, because the path from the runtime calling is Update() -> update() and it's odd that we grab a lock, release it and then call update() and grab the same lock again....

I know you can refactor it further to make the mutex approach slightly more readable, but I have general concerns about this structure.

Can't we refactor the livedebuggingconsumer to:

have function pointers that allow to override behaviour like interceptor in Prometheus components, so it's more universal

rename it to interceptconsumer or something like that

always add it to the pipeline if livedebugging is enabled (as a feature), even if it's inactive for this component - if we do a lean implementation in the interceptors that checks whether it's active and quits, I think the overhead should be minimal? benchmarks should be able to confirm this

then we don't need the whole magic of notifying callbacks via LiveDebugging() method calling the Update

something like this should work, I guess we could simply intercept at the fanout consumer. I can give it a shot tomorrow

I reworked it following your guidance, let me know if it matches what you expected. I can polish further with tests and comments if needed.
The livedebugging method is not called anymore and is just there for the interface marker. Because there is no Update magic, there is no need to notify the components anymore when a callback is added/removed. This simplified the live debugging service code. I like the approach that you suggested, it's much better than what we had :)

internal/web/api/api.go

internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go

thampiotr · 2025-02-12T14:54:15Z

internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go

+		if lazy, ok := cons.(*lazyconsumer.Consumer); ok {
+			ids = append(ids, lazy.ComponentID())
+		}


This is fragile too IMO because next time someone refactors and adds another layer of consumer wrappers for some reason, this will stop working

Can we avoid having to type cast entirely?

I improved it a bit by adding an interface. Not super happy with the name of the interface though. I did not see another simple way to get the componentIDs from the next consumers

internal/web/api/api.go

thampiotr

Looks much, much cleaner without the callbacks to LiveDebugging(...) function from the service. Few more comments, but we're almost there!

thampiotr · 2025-02-17T11:47:54Z

internal/component/otelcol/internal/interceptorconsumer/logs.go

+	mutatesData   bool // must be set to true if the provided opts modifies the data
+}
+
+func Logs(nextLogs otelconsumer.Logs, mutatesData bool, f LogsInterceptorFunc) otelconsumer.Logs {


I very much prefer to avoid having bool function parameters - check out some reasoning about it here: https://alexkondov.com/should-you-pass-boolean-to-functions/

Would be better to have Logs() and LogsMutating() or something like this.

thampiotr · 2025-02-17T11:48:08Z

internal/component/otelcol/internal/interceptorconsumer/logs.go

@@ -0,0 +1,37 @@
+package interceptorconsumer


Suggested change

package interceptorconsumer

package interceptconsumer

shorter and still makes sense

thampiotr · 2025-02-17T11:50:00Z

internal/component/otelcol/internal/interceptorconsumer/logs.go

+	"go.opentelemetry.io/collector/pdata/plog"
+)
+
+type LogsInterceptorFunc func(context.Context, plog.Logs) error


For all the 3 telemetry signals, can we use Go generics to simplify the code here? Seems like the variation would be only really on plogs.Logs and otelconsumer.Logs types.

that works for plogs.Logs but I don't think that we can use a generic type for otelconsumers.Logs because we call ConsumeLogs with it. That means we will still need an interceptor type for each telemetry signal and each will still have its corresponding ConsumeMetrics|ConsumeLogs|ConsumeTraces func

I see. Maybe that could be avoided with a func but if not, then we'll need to live with this duplication :(

thampiotr · 2025-02-17T11:51:06Z

internal/component/otelcol/internal/livedebuggingpublisher/livedebuggingpublisher.go

+	"go.opentelemetry.io/collector/pdata/ptrace"
+)
+
+func extractIds(consumers []otelcol.Consumer) []string {


Nit: keep exported functions on top of the file and helpers at the bottom

thampiotr · 2025-02-17T12:00:58Z

internal/component/otelcol/consumer.go

+type ConsumerWithComponentID interface {
+	Consumer
+	ComponentID() string
+}


Suggested change

type ConsumerWithComponentID interface {

Consumer

ComponentID() string

}

// ComponentMetadata can be implemented by, for example, consumers exported by components, to provide the ID of the component which is exporting given consumer. This is used for Live Graph / Live Debugging features.

type ComponentMetadata interface {

ComponentID() string

}

I think that's pretty good. I also don't think it needs to be a Consumer.

thampiotr · 2025-02-17T12:01:50Z

internal/component/otelcol/internal/livedebuggingpublisher/livedebuggingpublisher.go

+	return ids
+}
+
+func PublishLogsIfActive(debugDataPublisher livedebugging.DebugDataPublisher, componentID string, ld plog.Logs, nextLogs []otelcol.Consumer) {


I don't think nextLogs needs to be otelcol.Consumer, you only ever use them to extract component ID.

True but it's quite handy for the components because it's easy to provide.

Would you prefer that I change it to:
PublishLogsIfActive(debugDataPublisher livedebugging.DebugDataPublisher, componentID string, ld plog.Logs, componentIDs []string)
and that I export the extractIDs function?

Or do you have a different idea in mind? I'd like to keep the code as little intrusive as possible in the components

thampiotr · 2025-02-17T12:04:53Z

internal/service/livedebugging/data.go

+	DataFunc func() string
+}
+
+func NewData(componentID ComponentID, dataType DataType, count uint64, dataFunc func() string, opts ...DataOption) *Data {


thampiotr · 2025-02-17T12:06:45Z

internal/web/api/api.go

+			moduleID = livedebugging.ModuleID(vars["moduleID"])
+		}
+
+		window := setWindow(w, r.URL.Query().Get("window")) // in seconds


Suggested change

window := setWindow(w, r.URL.Query().Get("window")) // in seconds

windowSeconds := setWindow(w, r.URL.Query().Get("window"))

Generally if we can make code more readable without comments, it's better to do that - comments may be treated as a duct tape to help make sense of a code that is otherwise not too readable ;)

Actually, can't window become a time.Duration right away please?

we use it as a float to calculate the rate:
data.Rate = float64(data.Count) / float64(window)

I don't think it's a problem that cannot be overcome if window is a duration

add live graph backend

1775219

wildum requested a review from a team as a code owner February 7, 2025 11:07

update discovery process

e18e0ca

thampiotr reviewed Feb 11, 2025

View reviewed changes

wildum added 4 commits February 11, 2025 17:34

review feedback1

9ae5904

review feedback 2

5be0102

change back to splitting functions in livedebugging service

4cf6bc5

fix race condition in otel receiver

0dd571d

wildum force-pushed the live-graph-backend branch from a79c730 to 0dd571d Compare February 12, 2025 14:38

add comment about memory leak

ec53660

thampiotr reviewed Feb 12, 2025

View reviewed changes

internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go Outdated Show resolved Hide resolved

thampiotr reviewed Feb 12, 2025

View reviewed changes

internal/web/api/api.go Show resolved Hide resolved

wildum added 2 commits February 14, 2025 14:53

rework live debugging for otel components

c2c9bf4

add comment about window unit

c36ebef

thampiotr reviewed Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add live graph backend (merge to feature branch) #2645

Add live graph backend (merge to feature branch) #2645

	window := setWindow(w, r.URL.Query().Get("window")) // in seconds
	windowSeconds := setWindow(w, r.URL.Query().Get("window"))

Add live graph backend (merge to feature branch) #2645

Are you sure you want to change the base?

Add live graph backend (merge to feature branch) #2645

Conversation

PR Description

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment