-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add live graph backend (merge to feature branch) #2645
base: alloy-live-graph
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! some comments, but nothing big
internal/component/otelcol/internal/lazyconsumer/lazyconsumer.go
Outdated
Show resolved
Hide resolved
internal/component/otelcol/internal/lazyconsumer/lazyconsumer_test.go
Outdated
Show resolved
Hide resolved
internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go
Outdated
Show resolved
Hide resolved
internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go
Outdated
Show resolved
Hide resolved
for _, cp := range components { | ||
if c, ok := cp.Component.(component.LiveDebugging); ok { | ||
// notify the component of the change | ||
c.LiveDebugging(len(s.callbacks[ComponentID(cp.ID.String())])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: when LiveDebugging calls Update, won't it race with the runtime calling Update at the same time? I think runtime won't call Update concurrently, but with LiveDebugging it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, definitely something we missed in the first implementation. I protected the Update functions that had no mutex with mutexes. Lmk if you have a better solution in mind to avoid this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's also not too great because other Update methods don't need mutexes, so it will be confusing and someone may one day remove it... so unless we have tests that verify this, it's a bit fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking another look: I'd be fine with mutex solution if you also add a comment on why this mutex is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you were right about the fact that it's a bit fragile. After testing, I triggered a data race because in the LiveDebugging function I use p.args which is being modified in the Update function :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuuuh, that's hairy... maybe worth looking into providing a callback fucntion to clean up subscriptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with a different approach here (for the receiver only) : 0dd571d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach still reads kinda weird to me, because the path from the runtime calling is Update() -> update()
and it's odd that we grab a lock, release it and then call update() and grab the same lock again....
I know you can refactor it further to make the mutex approach slightly more readable, but I have general concerns about this structure.
Can't we refactor the livedebuggingconsumer
to:
- have function pointers that allow to override behaviour like interceptor in Prometheus components, so it's more universal
- rename it to
interceptconsumer
or something like that - always add it to the pipeline if livedebugging is enabled (as a feature), even if it's inactive for this component - if we do a lean implementation in the interceptors that checks whether it's active and quits, I think the overhead should be minimal? benchmarks should be able to confirm this
- then we don't need the whole magic of notifying callbacks via LiveDebugging() method calling the Update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this should work, I guess we could simply intercept at the fanout consumer. I can give it a shot tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked it following your guidance, let me know if it matches what you expected. I can polish further with tests and comments if needed.
The livedebugging method is not called anymore and is just there for the interface marker. Because there is no Update magic, there is no need to notify the components anymore when a callback is added/removed. This simplified the live debugging service code. I like the approach that you suggested, it's much better than what we had :)
a79c730
to
0dd571d
Compare
internal/component/otelcol/internal/livedebuggingconsumer/livedebuggingconsumer.go
Outdated
Show resolved
Hide resolved
if lazy, ok := cons.(*lazyconsumer.Consumer); ok { | ||
ids = append(ids, lazy.ComponentID()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fragile too IMO because next time someone refactors and adds another layer of consumer wrappers for some reason, this will stop working
Can we avoid having to type cast entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improved it a bit by adding an interface. Not super happy with the name of the interface though. I did not see another simple way to get the componentIDs from the next consumers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much, much cleaner without the callbacks to LiveDebugging(...)
function from the service. Few more comments, but we're almost there!
mutatesData bool // must be set to true if the provided opts modifies the data | ||
} | ||
|
||
func Logs(nextLogs otelconsumer.Logs, mutatesData bool, f LogsInterceptorFunc) otelconsumer.Logs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much prefer to avoid having bool
function parameters - check out some reasoning about it here: https://alexkondov.com/should-you-pass-boolean-to-functions/
Would be better to have Logs()
and LogsMutating()
or something like this.
@@ -0,0 +1,37 @@ | |||
package interceptorconsumer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package interceptorconsumer | |
package interceptconsumer |
shorter and still makes sense
"go.opentelemetry.io/collector/pdata/plog" | ||
) | ||
|
||
type LogsInterceptorFunc func(context.Context, plog.Logs) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the 3 telemetry signals, can we use Go generics to simplify the code here? Seems like the variation would be only really on plogs.Logs
and otelconsumer.Logs
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that works for plogs.Logs but I don't think that we can use a generic type for otelconsumers.Logs because we call ConsumeLogs with it. That means we will still need an interceptor type for each telemetry signal and each will still have its corresponding ConsumeMetrics|ConsumeLogs|ConsumeTraces func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe that could be avoided with a func
but if not, then we'll need to live with this duplication :(
"go.opentelemetry.io/collector/pdata/ptrace" | ||
) | ||
|
||
func extractIds(consumers []otelcol.Consumer) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: keep exported functions on top of the file and helpers at the bottom
type ConsumerWithComponentID interface { | ||
Consumer | ||
ComponentID() string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type ConsumerWithComponentID interface { | |
Consumer | |
ComponentID() string | |
} | |
// ComponentMetadata can be implemented by, for example, consumers exported by components, to provide the ID of the component which is exporting given consumer. This is used for Live Graph / Live Debugging features. | |
type ComponentMetadata interface { | |
ComponentID() string | |
} |
I think that's pretty good. I also don't think it needs to be a Consumer
.
return ids | ||
} | ||
|
||
func PublishLogsIfActive(debugDataPublisher livedebugging.DebugDataPublisher, componentID string, ld plog.Logs, nextLogs []otelcol.Consumer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think nextLogs
needs to be otelcol.Consumer
, you only ever use them to extract component ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True but it's quite handy for the components because it's easy to provide.
Would you prefer that I change it to:
PublishLogsIfActive(debugDataPublisher livedebugging.DebugDataPublisher, componentID string, ld plog.Logs, componentIDs []string)
and that I export the extractIDs function?
Or do you have a different idea in mind? I'd like to keep the code as little intrusive as possible in the components
DataFunc func() string | ||
} | ||
|
||
func NewData(componentID ComponentID, dataType DataType, count uint64, dataFunc func() string, opts ...DataOption) *Data { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested with returning Data
by value instead of the pointer? some stack optimisations could reduce allocation and improve performance, but that needs benchmarks.
moduleID = livedebugging.ModuleID(vars["moduleID"]) | ||
} | ||
|
||
window := setWindow(w, r.URL.Query().Get("window")) // in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
window := setWindow(w, r.URL.Query().Get("window")) // in seconds | |
windowSeconds := setWindow(w, r.URL.Query().Get("window")) |
Generally if we can make code more readable without comments, it's better to do that - comments may be treated as a duct tape to help make sense of a code that is otherwise not too readable ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, can't window
become a time.Duration
right away please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use it as a float to calculate the rate:
data.Rate = float64(data.Count) / float64(window)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a problem that cannot be overcome if window is a duration
PR Description
This is the backend part of the new live graph.
The approach is similar to live debugging, except that it adds the callback to all components in the given module.
The live debugging feed is now a struct that contains data needed to build the new graph. Because the data string is not needed for the graph, it's passed around as a function. This way we can use the same struct for the live debugging and the live graph with limited performance overhead.
Which issue(s) this PR fixes
Fixes #2608
Notes to the Reviewer
This was tested manually via curl:
curl -N http://localhost:12345/api/v0/web/graph
to get the data from all components at the rootcurl -N http://localhost:12345/api/v0/web/graph/{moduleID}
to get data from a particular modulePR Checklist