8000 Improve shutdown handling of custom OpenTelemetry tracer by Aaron1011 · Pull Request #3805 · tensorzero/tensorzero · GitHub
[go: up one dir, main page]

Skip to content

Conversation

Aaron1011
Copy link
Member
@Aaron1011 Aaron1011 commented Oct 3, 2025

The shutdown handling is now done in CustomTracer::drop.
Almost all of the logic around custom tracers is now called from
'tensorzero_tracing_middleware' (including the possible creation
of a fresh CustomTracer).

The 'build_with_context' is now very simple - it just looks up
an entry in the context, and delegates build_with_context on
an already-constructed SdkTracer.

By performing the shutdown logic in Drop, we ensure that we only run
the shutdown after all associated Spans have already been dropped,
avoiding the problem of shutting down the tracer before we've finished
executing all of our spans (e.g. if we create a span for the
rate-limiting return_tickets call, which happens in the background)

This also eliminates the need for a moka eviction handler, since
dropping the final Arc<CustomTracer> from anywhere is now enough
to perform a correct shutdown.


Important

Improve shutdown handling of custom OpenTelemetry tracer by moving logic to CustomTracer::drop and updating middleware.

  • Behavior:
    • Move shutdown logic to CustomTracer::drop to ensure shutdown occurs after all spans are dropped.
    • tensorzero_tracing_middleware now handles most custom tracer logic, including creating new CustomTracer.
    • Simplify build_with_context to delegate to existing SdkTracer.
    • Eliminate need for moka eviction handler; dropping final Arc<CustomTracer> triggers shutdown.
  • Middleware:
    • Rename tensorzero_otlp_headers_middleware to tensorzero_tracing_middleware.
    • tensorzero_tracing_middleware now creates top-level span and handles OTLP headers.
  • Misc:
    • Update apply_otel_http_trace_layer to accept otel_tracer parameter in observability/mod.rs.
    • Minor updates to comments and documentation in observability/mod.rs.

This description was created by Ellipsis for 6961268. You can customize this summary. It will automatically update as commits are pushed.

Previously, custom OTLP header extraction was separated from the
creation of the top-level request span - we used both a `TraceLayer`
and a custom middleware function.

We now register a single middleware function, which performs all of the
necessarily logic. This removes the need to store data in
`request.extesnions` - we can now just use normal function calls.
The shutdown handling is now done in `CustomTracer::drop`.
Almost all of the logic around custom tracers is now called from
'tensorzero_tracing_middleware' (including the possible creation
of a fresh `CustomTracer`).

The 'build_with_context' is now very simple - it just looks up
an entry in the context, and delegates `build_with_context` on
an already-constructed `SdkTracer`.

By performing the shutdown logic in `Drop`, we ensure that we only run
the shutdown after all associated `Spans` have already been dropped,
avoiding the problem of shutting down the tracer before we've finished
executing all of our spans (e.g. if we create a span for the
rate-limiting `return_tickets` call, which happens in the background)

This also eliminates the need for a moka eviction handler, since
dropping the final `Arc<CustomTracer>` from anywhere is now enough
to perform a correct shutdown.
@Copilot Copilot AI review requested due to automatic review settings October 3, 2025 19:40
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the OpenTelemetry custom tracer handling to improve shutdown behavior by implementing Drop for CustomTracer and simplifying the middleware architecture. The key improvement ensures that tracer shutdown only occurs after all associated spans have been dropped, preventing premature shutdown issues.

Key changes:

  • Implements Drop for CustomTracer to handle automatic shutdown when the tracer is no longer in use
  • Consolidates custom tracer logic into a single tensorzero_tracing_middleware function that handles both header parsing and tracer creation
  • Removes the need for cache eviction handlers by using Arc<CustomTracer> reference counting

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tensorzero-core/src/observability/mod.rs Refactors custom tracer architecture with Drop implementation, new middleware structure, and updated context handling
gateway/src/main.rs Updates router to pass tracer wrapper to the new middleware signature

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

//! This is automatically propagated to descendant spans by `tracing-opentelemetry`. Once all of descendant spans are dropped
//! (i.e. all background processing for our request is finished), the `CustomTracer` will get automatically dropped,
//! which triggers shutdown in our `Drop` impl for `CustomTracer`
//! 4. When a span is exported using `TracerWrapper::build_with_context`, we inspect the `Context` for a `CustomTracerContextEntry`.
Copy link
Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step numbering is inconsistent - there are two steps labeled '4.' in the documentation. The second occurrence should be step '5.'

Suggested change
//! 4. When a span is exported using `TracerWrapper::build_with_context`, we inspect the `Context` for a `CustomTracerContextEntry`.
//! 5. When a span is exported using `TracerWrapper::build_with_context`, we inspect the `Context` for a `CustomTracerContextEntry`.

Copilot uses AI. Check for mistakes.

Comment on lines +249 to +250
if let Some(key) = parent_cx.get::<CustomTracerContextEntry>() {
key.inner.inner.build_with_context(builder, parent_cx)
Copy link
Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context lookup should check for CustomTracerContextEntry but the variable name suggests it expects a CustomTracerKey. This appears to be correct since CustomTracerContextEntry is the new context entry type, but the variable name key is misleading.

Suggested change
if let Some(key) = parent_cx.get::<CustomTracerContextEntry>() {
key.inner.inner.build_with_context(builder, parent_cx)
if let Some(entry) = parent_cx.get::<CustomTracerContextEntry>() {
entry.inner.inner.build_with_context(builder, parent_cx)

Copilot uses AI. Check for mistakes.

}
}
fn apply_otel_http_trace_layer(self, otel_tracer: Option<Arc<TracerWrapper>>) -> Self {
// If OpenTelemetry is disable, then we don't need to create extra spans
Copy link
Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'disable' to 'disabled'.

Suggested change
// If OpenTelemetry is disable, then we don't need to create extra spans
// If OpenTelemetry is disabled, then we don't need to create extra spans

Copilot uses AI. Check for mistakes.

@Aaron1011
Copy link
Member Author

Depends on #3801

@virajmehta virajmehta self-assigned this Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0