8000 fix(openai): make streaming telemetry monotonic + privacy-safe by kevinshowkat · Pull Request #3405 · traceloop/openllmetry · GitHub
[go: up one dir, main page]

Skip to content

Conversation

kevinshowkat
Copy link
@kevinshowkat kevinshowkat commented Oct 10, 2025
1

This PR addresses three ongoing telemetry issues in the OpenAI wrapper: inaccurate latency measurements, loss of streaming model metadata, and raw prompt/header payloads in span attributes.



What ships

  • Monotonic timing. Every duration—span lifetime, TTFT, streaming generate—
    now reads from perf_counter(). Clock slews no longer create negative spans
    or spike histograms.
  • Streaming metadata retention. We keep the first non-empty model/id values
    across chunks, handle multimodal deltas gracefully, and continue collecting
    completions when a filtered choice appears.
  • Prompt hygiene. Prompt/completion attributes become a 128-character preview
    plus SHA-256 hash so teams can correlate traffic without leaking full text
    or exploding cardinality.
  • Header scrubbing & semconv parity. Authorization headers are redacted,
    llm.reproducible_run=true is emitted when a seed is present, and
    response IDs are mirrored into llm.receipt_id. Added the new constants
    to SpanAttributes.
  • Unit coverage. test_chat_wrappers_unit.py exercises model persistence,
    prompt hashing, header scrubbing, receipt propagation, and streaming
    accumulation logic.




3

Why it matters

  • Latency SLOs remain steady through NTP or manual clock adjustments;
    operators can trust the telemetry.
  • Model attribution survives streaming workloads; multi-model routing and
    debugging remain accurate.
  • Span attributes stay privacy-conscious and low-cardinality by default,
    reducing downstream risk.
  • Receipts and reproducibility hints enable richer postmortems and “rerun
    this call” tooling.

Validation

python3 -m pytest packages/opentelemetry-instrumentation-openai/tests/traces/
test_chat_wrappers_unit.py

No flags required; behaviour improves on upgrade.


Important

Fix telemetry issues in OpenAI wrapper by implementing monotonic timing, metadata retention, prompt privacy, and header scrubbing, with added unit tests.

  • Behavior:
    • Use perf_counter() for monotonic timing in chat_wrappers.py and completion_wrappers.py to prevent negative spans.
    • Retain streaming metadata by keeping first non-empty model/id values in chat_wrappers.py and completion_wrappers.py.
    • Truncate prompt/completion attributes to 128 characters and add SHA-256 hash in chat_wrappers.py and completion_wrappers.py.
    • Redact authorization headers and add llm.reproducible_run and llm.receipt_id attributes in shared/__init__.py.
  • Tests:
    • Add test_chat_wrappers_unit.py to test model persistence, prompt hashing, header scrubbing, and receipt propagation.
  • Misc:
    • Add LLM_RECEIPT_ID, LLM_REPRODUCIBLE_RUN, and LLM_CONTENT_HASH_ATTRIBUTE to SpanAttributes in semconv_ai/__init__.py.

This description was created by Ellipsis for df3643c. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features
    • Added span attributes for receipt ID, reproducible runs (when seed is provided), and prompt content hash with preview (128 chars).
  • Bug Fixes
    • Switched to monotonic timing for accurate durations and streaming measurements.
    • Redacted sensitive request headers while preserving non-sensitive ones; only emit headers when present.
    • Improved streaming aggregation to safely handle delta content and preserve initial model/ID.
    • Ensured robust stream cleanup and span closure.
  • Documentation
    • Added Unreleased changelog entry describing OpenAI instrumentation fixes.
  • Tests
    • Added unit tests covering prompts, hashing, headers, receipts, and streaming behavior.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


kevin-ops seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
coderabbitai bot commented Oct 10, 2025

Walkthrough

Adds monotonic timing for chat/completion wrappers, prompt preview and SHA-256 hashing, header scrubbing for sensitive headers, new span attributes (receipt ID, reproducible run, content hash), response ID propagation, and related unit tests. Updates CHANGELOG and semantic conventions with new constants. Streaming accumulation logic tightened and cleanup improved.

Changes

Cohort / File(s) Summary
Semantic conventions constants
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
Adds SpanAttributes: LLM_RECEIPT_ID, LLM_REPRODUCIBLE_RUN, LLM_CONTENT_HASH_ATTRIBUTE.
OpenAI shared helpers
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
Introduces header scrubbing for sensitive keys; merges and conditionally records headers; sets LLM_REPRODUCIBLE_RUN when seed provided; writes both GEN_AI_RESPONSE_ID and LLM_RECEIPT_ID.
Chat wrappers timing & hashing
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
Replaces time.time with monotonic timing; computes prompt preview and SHA-256 hash; serializes list/dict content; refines stream accumulation; improves ChatStream cleanup and duration metrics.
Completion wrappers prompts & streaming
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
Adds PROMPT_PREVIEW_LIMIT and prompt preview/hash helper; records per-prompt preview/hash; only sets model/id when present during streaming accumulation.
Unit tests for wrappers
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py
Adds tests for stream accumulation, prompt truncation and hashing, header scrubbing, reproducible run flag, receipt ID propagation.
Changelog
CHANGELOG.md
Adds Unreleased notes for monotonic streaming spans, header redaction, prompt hash, and receipt attributes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App
  participant Wrapper as OpenAI Wrapper
  participant Clock as MonotonicClock
  participant OTEL as Span

  App->>Wrapper: create chat/completion request (content, headers, seed?)
  Wrapper->>Clock: now()
  Clock-->>Wrapper: t0 (monotonic)
  Note over Wrapper: Compute prompt preview + SHA-256 hash
  Wrapper->>OTEL: set LLM_PROMPTS.* + content_hash
  Wrapper->>Wrapper: scrub headers (auth/proxy-auth redacted)
  alt headers present
    Wrapper->>OTEL: set LLM_HEADERS (scrubbed)
  end
  alt seed provided
    Wrapper->>OTEL: set llm.reproducible_run=true
  end
  App->>Wrapper: stream chunks
  loop each chunk
    Wrapper->>Wrapper: accumulate delta (preserve model/id if first)
    Wrapper->>Clock: now()
    Clock-->>Wrapper: tN
    Wrapper->>OTEL: update streaming timings (tN - t0)
  end
  Wrapper->>Clock: now()
  Clock-->>Wrapper: tEnd
  Wrapper->>OTEL: set duration (tEnd - t0), response.id
  Wrapper->>OTEL: set gen_ai.response_id + llm.receipt_id
  Wrapper-->>App: final response
Loading
sequenceDiagram
  participant Stream as ChatStream
  participant Guard as CleanupGuard
  participant OTEL as Span
  participant Clock as MonotonicClock

  Stream->>Clock: now() for first token
  Stream->>Stream: accumulate(model/id/content) safely
  note over Stream: Handle list/dict deltas → text
  Stream->>Guard: cleanup()
  alt success
    Guard->>OTEL: end span (idempotent)
  else failure
    Guard->>OTEL: end span (best-effort)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • nirga

Poem

Thump-thump goes my rabbit clock, tick-free and clean,
I scrubbed the headers till no secrets are seen.
A hash of the prompt, a receipt in paw,
Streams stitched together—without timing flaw.
With monotone time and tidy tracks,
I hop through traces, leaving perfect hacks. 🐇⌚️

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly conveys the primary fixes by highlighting the shift to monotonic timing for streaming telemetry and the addition of privacy safeguards. It directly reflects the key objectives of the pull request without extraneous wording. Colleagues reviewing history will immediately grasp the core improvements delivered.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • SHA-256: Entity not found: Issue - Could not find referenced Issue.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor
@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to df3643c in 1 minute and 33 seconds. Click for details.
  • Reviewed 554 lines of code in 6 files
  • Skipped 3 files when reviewing.
  • Skipped posting 7 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:29
  • Draft comment:
    Good use of a set for sensitive header keys. Consider adding a brief comment about why only these keys are scrubbed (e.g. privacy concerns) and whether any additional headers might be needed in future.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the suggestion to document security-related code is generally good practice, this comment is more of a nice-to-have suggestion rather than identifying a clear issue that needs fixing. The headers being scrubbed (authorization and proxy-authorization) are obviously sensitive by their names alone. The comment also speculates about future headers which violates our rule about speculative comments. Documentation about security decisions can be important. Maybe there's a real security risk in not documenting why only these specific headers are scrubbed. The headers being scrubbed are standard authorization headers with obvious sensitivity. Additional documentation would be redundant and the speculation about future headers isn't helpful. Delete this comment as it's more of a nice-to-have suggestion rather than identifying a clear issue, and it includes speculative elements about future headers.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py:160
  • Draft comment:
    Nice merging of headers from 'headers' and 'extra_headers' with scrubbing and sorted JSON. Adding a brief comment to clarify that this ensures low cardinality and privacy-safe telemetry would help future maintainers.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is purely informative and suggests adding a comment for future maintainers. It doesn't provide a specific code suggestion or point out a potential issue with the code.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:68
  • Draft comment:
    Using perf_counter via _monotonic_now() for timing is a solid improvement over time.time(), ensuring measurements are monotonic even if the system clock skews.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, praising the use of perf_counter over time.time(). It doesn't provide any actionable feedback or suggestions for improvement.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:72
  • Draft comment:
    The _prompt_preview_and_hash function effectively produces a 128-character preview and a SHA-256 hash. Ensure that slicing the Unicode string does not inadvertently split a multi-byte character (though Python slicing is generally safe, a comment could clarify this behavior).
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 30% <= threshold 50% The comment is informative and suggests ensuring that slicing does not split a multi-byte character, which is a valid concern in some contexts. However, it also acknowledges that Python slicing is generally safe, making the comment more informative than actionable. It does not directly suggest a code change or ask for a test, which would make it more useful.
5. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py:48
  • Draft comment:
    The completion wrapper now reuses the prompt preview and hash logic for input sanitization. This is consistent with the chat wrapper. A comment noting that this prevents leaking full prompt details and avoids high-cardinality attributes would be beneficial.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is purely informative, explaining what the code does and suggesting adding a comment for clarity. It doesn't ask for a specific code change or test, nor does it point out a potential issue or ask for confirmation of intent. It violates the rule against making purely informative comments.
6. packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py:72
  • Draft comment:
    Tests now cover prompt truncation and hash generation for both chat and completion wrappers. This is a robust validation for privacy-safe prompt handling.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, describing what the tests now cover. It doesn't provide any actionable feedback or suggestions for improvement.
7. packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py:94
  • Draft comment:
    New constants (LLM_RECEIPT_ID, LLM_REPRODUCIBLE_RUN, LLM_CONTENT_HASH_ATTRIBUTE) are added. Their naming and placement follow the conventions. Ensure all instrumentation elsewhere is updated to reference these consistently.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that all instrumentation is updated to reference the new constants consistently. This falls under the rule of not asking the author to ensure or double-check things. The comment does not provide a specific suggestion or point out a specific issue with the code.

Workflow ID: wflow_oN9z8Bb1X089nluM

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (1)

45-51: LGTM! Prompt preview and hashing helper is correct.

The function properly:

  • Encodes content as UTF-8 before hashing (correct for byte-level hashing)
  • Computes SHA-256 digest (appropriate for content fingerprinting)
  • Returns truncated preview (128 chars) and hex digest

Note: This function and PROMPT_PREVIEW_LIMIT are duplicated in chat_wrappers.py. Consider extracting to a shared utility module as an optional refactor to reduce duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e66894f and df3643c.

⛔ Files ignored due to path filters (3)
  • 1.png is excluded by !**/*.png
  • 2.png is excluded by !**/*.png
  • 3.png is excluded by !**/*.png
📒 Files selected for processing (6)
  • CHANGELOG.md (1 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (17 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py (1 hunks)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py
🧬 Code graph analysis (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)
  • _prompt_preview_and_hash (72-75)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-264)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (54-61)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-264)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (1)
  • _prompt_preview_and_hash (48-51)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)
  • _set_span_attribute (54-61)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-264)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py (4)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-264)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (2)
  • _accumulate_stream_items (1148-1225)
  • _set_prompts (453-512)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (2)
  • _set_prompts (166-188)
  • _accumulate_streaming_response (276-302)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (2)
  • _set_request_attributes (128-221)
  • _set_response_attributes (225-275)
🪛 Ruff (0.13.3)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py

32-32: Unused lambda argument: args

(ARG005)


32-32: Unused lambda argument: kwargs

(ARG005)


33-33: Unused lambda argument: instance

(ARG005)


34-34: Unused lambda argument: base_url

(ARG005)

🔇 Additional comments (16)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

95-97: LGTM! New semantic convention attributes align with PR objectives.

The three new constants support receipt tracking, reproducibility signaling, and content hashing as described in the PR. The naming follows existing conventions and the values are appropriate for span attributes.

CHANGELOG.md (1)

1-5: LGTM! Changelog entry accurately describes the changes.

The entry correctly summarizes the three main improvements: monotonic timing, header redaction, and prompt hash/receipt attributes. The categorization as a Fix is appropriate.

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_wrappers_unit.py (2)

30-40: LGTM! Test fixture correctly mocks shared helpers.

The autouse fixture ensures tests are isolated from external dependencies. The unused lambda arguments flagged by static analysis are intentional no-ops for mocking, so those warnings are false positives.


43-148: Excellent test coverage for new functionality.

The six test functions comprehensively validate:

  • Streaming accumulation preserving model/id and concatenating content (chat and completion)
  • Prompt truncation to 128 chars with SHA-256 hashing
  • Per-prompt hashing for batch prompts
  • Header scrubbing (Authorization redacted, others preserved)
  • Reproducible run flag when seed is present
  • Receipt ID propagation from response

Tests are well-structured and verify the core behaviors described in the PR objectives.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (3)

29-51: LGTM! Header scrubbing implementation is sound.

The function correctly:

  • Returns None for empty/invalid input
  • Uses case-insensitive comparison for header names (appropriate for HTTP)
  • Redacts sensitive headers (authorization, proxy-authorization)
  • Preserves other headers unchanged

This maintains privacy while enabling debugging with request metadata.


159-175: LGTM! Request attribute handling improved with header scrubbing and reproducibility tracking.

The changes:

  • Merge and scrub headers from both headers and extra_headers kwargs
  • Only emit LLM_HEADERS when there's actual header data (good defensive programming)
  • Set LLM_REPRODUCIBLE_RUN=True when a seed is provided (enables rerun correlation)
  • Use sorted JSON for deterministic header serialization

These align with the PR's privacy and reproducibility objectives.


241-243: LGTM! Dual receipt attribution enables correlation.

Setting both GEN_AI_RESPONSE_ID (OpenTelemetry standard) and LLM_RECEIPT_ID (custom) from the same response ID enables correlation between OpenTelemetry traces and provider-specific receipts for debugging and postmortems.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (2)

166-188: LGTM! Per-prompt hashing enables batch prompt correlation.

The updated logic correctly:

  • Handles both single prompts and batch prompts (lists)
  • Computes independent preview+hash for each prompt item
  • Uses dynamic attribute names (LLM_PROMPTS.{idx}.user, LLM_PROMPTS.{idx}.content_hash)

This enables correlating prompts across requests without storing full text, maintaining privacy while enabling debugging.


276-285: Good defensive programming: conditional model/id assignment.

Checking if model: and if response_id: before assignment prevents overwriting existing values with empty/None values from subsequent chunks. This preserves the first non-empty model and ID during streaming accumulation.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (7)

65-75: LGTM! Monotonic timing and hashing utilities are well-structured.

The additions:

  • _monotonic_now() uses perf_counter() (correct choice for measuring durations immune to system clock adjustments)
  • _prompt_preview_and_hash() provides privacy-safe content fingerprinting with 128-char preview + SHA-256 digest

These utilities support the PR's monotonic timing and privacy objectives. Note: Minor duplication with completion_wrappers.py (see earlier comment).


108-112: Excellent: Monotonic timing prevents negative span durations.

Replacing time.time() with perf_counter() for start/end times ensures durations are immune to system clock adjustments (NTP corrections, daylight saving time, manual changes). This eliminates negative spans and histogram spikes in telemetry, enabling accurate latency SLOs.

Applied consistently across sync and async wrappers.

Also applies to: 207-211


452-487: LGTM! Content serialization and hashing maintain privacy while enabling correlation.

The enhanced logic:

  • Serializes complex content (lists → JSON, dicts → JSON, others → string) before hashing
  • Processes base64 images (if configured) before serialization
  • Computes 128-char preview + SHA-256 digest
  • Stores preview in content attribute and digest in content_hash attribute

This approach:

  • Reduces cardinality (fixed-length preview + deterministic hash)
  • Maintains privacy (no full prompt text in spans)
  • Enables correlation (hash matching across requests)

Aligns perfectly with the PR's privacy and low-cardinality objectives.


537-537: Good fix: Continue ensures full choice iteration.

Changing return to continue ensures the loop processes all choices even when:

  • Line 537: A choice has finish_reason="content_filter" (filtered content)
  • Line 541: A choice has no message (edge case)

This prevents early exit and ensures completions for all choices are recorded when possible.

Also applies to: 541-541


741-741: Systematic monotonic timing applied across all timing measurements.

All duration calculations now use _monotonic_now():

  • Span lifetime (duration histogram)
  • Time-to-first-token (TTFT)
  • Streaming time-to-generate
  • Partial metrics in cleanup

This comprehensive change ensures all telemetry timing is immune to clock adjustments, providing stable and accurate latency measurements for SLOs and monitoring.

Also applies to: 780-780, 789-789, 849-849, 904-904, 931-931, 937-937, 975-975, 1002-1002, 1008-1008


1152-1196: Robust streaming accumulation with metadata retention and multimodal support.

The enhanced logic:

  • Preserves first non-empty model and id (Lines 1152-1157): Prevents overwriting with empty values from later chunks
  • Handles multimodal content chunks (Lines 1188-1196): When delta.content is a list (e.g., mixed text/image), joins parts into text by extracting "text" keys or stringifying

This addresses the PR's "streaming metadata retention" objective and handles edge cases like multimodal deltas gracefully.


806-843: Thread-safe cleanup with comprehensive error handling.

The _ensure_cleanup() method:

  • Uses _cleanup_lock for thread safety
  • Prevents duplicate cleanup with _cleanup_completed flag
  • Records partial metrics before closing span (ensures data is captured even on early cleanup)
  • Has fallback error handling to always mark cleanup as complete (prevents infinite loops)

This robust design ensures spans are always closed and metrics are recorded even in error scenarios, improving telemetry reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0