10000 [tracer] Serialize span links in meta by nhulston · Pull Request #6656 · DataDog/dd-trace-dotnet · GitHub
[go: up one dir, main page]

Skip to content

[tracer] Serialize span links in meta #6656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

nhulston
Copy link
Contributor
@nhulston nhulston commented Feb 11, 2025

Summary of changes

This PR does two things:

  1. Serialize span links in meta under Span[meta][_dd.span_links]. Previously, span links were only under Span[span_links]
  2. Always serialize trace flags in span links, even if the sampling priority is unset.

Reason for change

  1. Span links should be serialized under both Span[span_links] and Span[meta][_dd.span_links]. This is how span links work in all other tracers. See https://datadoghq.atlassian.net/wiki/spaces/TS/pages/3404366375/Troubleshooting+Span+Links
  2. Also, we should always serialize trace flags in span links, even if it's just empty (0). (This is how the other tracers also work). Some downstream services on the backend or agent expect this field. e.g. this code errors if the trace flags field is missing: https://github.com/DataDog/libdatadog/blob/787d15e9867a9b8998b34d281a9cf5ce0301cd10/trace-utils/src/trace_utils.rs#L42-L47

These fixes are prerequisites for a new feature I'm working on called 'span pointers'. Span pointers will be implemented in a future PR, but I opened a similar PR in Java if you want more context: DataDog/dd-trace-java#8075.

Implementation details

Updates SpanMessagePackFormatter.cs to (1) always serialize trace flags, and (2) construct the json string of the span links to store under the span meta. There is a performance overhead of this json string construction, but this code only runs if the span actually has span links.

Test coverage

Unit tests in SpanMessagePackFormatterTests.cs.

Also, manual testing shows we no longer get panics in libdatadog due to failing to pass the trace flags.

I also tested manually, and the span links are correctly serialized in Span[meta][_dd.span_links], which allows the frontend to properly detect them.
Screenshot 2025-02-11 at 3 17 08 PM

Integration tests were updated:

./build.sh Clean BuildTracerHome
./build.sh BuildAndRunOsxIntegrationTests --framework "net6.0" --filter "NetActivitySdkTests" --SampleName "NetActivitySdk"
./build.sh BuildAndRunOsxIntegrationTests --framework "net6.0" --filter "OpenTelemetrySdkTests" --SampleName "OpenTelemetrySdk"

Other details

@andrewlock
Copy link
Member
andrewlock commented Feb 11, 2025

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6656) - mean (69ms)  : 66, 72
     .   : milestone, 69,
    master - mean (69ms)  : 66, 72
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (6656) - mean (993ms)  : 967, 1018
     .   : milestone, 993,
    master - mean (992ms)  : 960, 1024
     .   : milestone, 992,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6656) - mean (102ms)  : 99, 105
     .   : milestone, 102,
    master - mean (102ms)  : 100, 104
     .   : milestone, 102,

    section CallTarget+Inlining+NGEN
    This PR (6656) - mean (672ms)  : 656, 688
     .   : milestone, 672,
    master - mean (674ms)  : 659, 690
     .   : milestone, 674,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6656) - mean (89ms)  : 87, 91
     .   : milestone, 89,
    master - mean (89ms)  : 87, 91
     .   : milestone, 89,

    section CallTarget+Inlining+NGEN
    This PR (6656) - mean (627ms)  : 609, 644
     .   : milestone, 627,
    master - mean (630ms)  : 615, 645
     .   : milestone, 630,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6656) - mean (191ms)  : 187, 195
     .   : milestone, 191,
    master - mean (191ms)  : 188, 195
     .   : milestone, 191,

    section CallTarget+Inlining+NGEN
    This PR (6656) - mean (1,110ms)  : 1082, 1137
     .   : milestone, 1110,
    master - mean (1,105ms)  : 1077, 1133
     .   : milestone, 1105,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6656) - mean (271ms)  : 266, 277
     .   : milestone, 271,
    master - mean (272ms)  : 266, 278
     .   : milestone, 272,

    section CallTarget+Inlining+NGEN
    This PR (6656) - mean (867ms)  : 837, 897
     .   : milestone, 867,
    master - mean (868ms)  : 834, 903
     .   : milestone, 868,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6656) - mean (263ms)  : 259, 266
     .   : milestone, 263,
    master - mean (262ms)  : 258, 266
     .   : milestone, 262,

    section CallTarget+Inlining+NGEN
    This PR (6656) - mean (841ms)  : 816, 865
     .   : milestone, 841,
    master - mean (849ms)  : 811, 886
     .   : milestone, 849,

Loading

@andrewlock
Copy link
Member
andrewlock commented Feb 11, 2025

Benchmarks Report for tracer 🐌

Benchmarks for #6656 compared to master:

  • 1 benchmarks are faster, with geometric mean 1.177
  • 1 benchmarks are slower, with geometric mean 1.187
  • 1 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 7.98μs 45.4ns 311ns 0.0155 0.00774 0 5.61 KB
master StartStopWithChild netcoreapp3.1 10μs 52.6ns 258ns 0.0195 0.00973 0 5.8 KB
master StartStopWithChild net472 16μs 53.4ns 207ns 1.03 0.294 0.0952 6.21 KB
#6656 StartStopWithChild net6.0 7.71μs 43ns 275ns 0.0191 0.0114 0.00381 5.6 KB
#6656 StartStopWithChild netcoreapp3.1 10.4μs 52.1ns 255ns 0.0155 0.00517 0 5.81 KB
#6656 StartStopWithChild net472 16.1μs 44.7ns 167ns 1.04 0.312 0.104 6.21 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 493μs 578ns 2.24μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 647μs 263ns 984ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 840μs 545ns 2.11μs 0.417 0 0 3.3 KB
#6656 WriteAndFlushEnrichedTraces net6.0 490μs 734ns 2.65μs 0 0 0 2.7 KB
#6656 WriteAndFlushEnrichedTraces netcoreapp3.1 645μs 501ns 1.94μs 0 0 0 2.7 KB
#6656 WriteAndFlushEnrichedTraces net472 835μs 346ns 1.34μs 0.417 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 130μs 447ns 1.73μs 0.131 0 0 14.47 KB
master SendRequest netcoreapp3.1 150μs 185ns 716ns 0.225 0 0 17.27 KB
master SendRequest net472 0.00081ns 0.000362ns 0.0014ns 0 0 0 0 b
#6656 SendRequest net6.0 133μs 402ns 1.56μs 0.193 0 0 14.47 KB
#6656 SendRequest netcoreapp3.1 147μs 407ns 1.58μs 0.216 0 0 17.27 KB
#6656 SendRequest net472 6.27E‑05ns 6.27E‑05ns 0.000235ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #6656

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑net6.0 41.58 KB 41.85 KB 276 B 0.66%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 587μs 3.11μs 16.2μs 0.561 0 0 41.58 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 662μs 3.48μs 17.4μs 0.372 0 0 41.89 KB
master WriteAndFlushEnrichedTraces net472 837μs 3.84μs 14.9μs 8.22 2.47 0.411 53.31 KB
#6656 WriteAndFlushEnrichedTraces net6.0 602μs 3.38μs 24.4μs 0.604 0 0 41.85 KB
#6656 WriteAndFlushEnrichedTraces netcoreapp3.1 713μs 4.01μs 28.6μs 0.357 0 0 41.85 KB
#6656 WriteAndFlushEnrichedTraces net472 847μs 4.11μs 16.4μs 8.45 2.53 0.422 53.28 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.43μs 1.15ns 4.44ns 0.0146 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.78μs 1.67ns 6.45ns 0.0133 0 0 1.02 KB
master ExecuteNonQuery net472 2.08μs 1.54ns 5.95ns 0.157 0.00104 0 987 B
#6656 ExecuteNonQuery net6.0 1.31μs 1.9ns 7.36ns 0.0143 0 0 1.02 KB
#6656 ExecuteNonQuery netcoreapp3.1 1.72μs 2.21ns 8.58ns 0.0138 0 0 1.02 KB
#6656 ExecuteNonQuery net472 2.05μs 2.47ns 9.55ns 0.156 0.00102 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.22μs 1.95ns 7.55ns 0.0135 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.61μs 3.18ns 12.3ns 0.0127 0 0 976 B
master CallElasticsearch net472 2.53μs 2.74ns 10.6ns 0.157 0 0 995 B
master CallElasticsearchAsync net6.0 1.39μs 0.608ns 2.36ns 0.0133 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.75μs 0.527ns 1.97ns 0.0139 0 0 1.02 KB
master CallElasticsearchAsync net472 2.62μs 1.67ns 6.47ns 0.167 0 0 1.05 KB
#6656 CallElasticsearch net6.0 1.26μs 0.552ns 2.14ns 0.014 0 0 976 B
#6656 CallElasticsearch netcoreapp3.1 1.57μs 0.442ns 1.71ns 0.0133 0 0 976 B
#6656 CallElasticsearch net472 2.63μs 2.33ns 9.04ns 0.157 0 0 995 B
#6656 CallElasticsearchAsync net6.0 1.28μs 0.662ns 2.57ns 0.0135 0 0 952 B
#6656 CallElasticsearchAsync netcoreapp3.1 1.66μs 1.14ns 4.4ns 0.0142 0 0 1.02 KB
#6656 CallElasticsearchAsync net472 2.6μs 1.56ns 6.04ns 0.166 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6656

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net6.0 1.187 1,206.61 1,432.59

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.21μs 0.645ns 2.5ns 0.0134 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.63μs 0.947ns 3.67ns 0.013 0 0 952 B
master ExecuteAsync net472 1.84μs 0.772ns 2.78ns 0.145 0 0 915 B
#6656 ExecuteAsync net6.0 1.43μs 1.52ns 5.7ns 0.0136 0 0 952 B
#6656 ExecuteAsync netcoreapp3.1 1.66μs 1.7ns 6.36ns 0.0123 0 0 952 B
#6656 ExecuteAsync net472 1.87μs 0.59ns 2.13ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.4μs 2.09ns 8.08ns 0.033 0 0 2.31 KB
master SendAsync netcoreapp3.1 5.44μs 3.52ns 13.6ns 0.0379 0 0 2.85 KB
master SendAsync net472 7.57μs 18.5ns 71.6ns 0.495 0 0 3.12 KB
#6656 SendAsync net6.0 4.34μs 1.52ns 5.28ns 0.0325 0 0 2.31 KB
#6656 SendAsync netcoreapp3.1 5.29μs 2.07ns 7.74ns 0.0373 0 0 2.85 KB
#6656 SendAsync net472 7.58μs 1.57ns 6.07ns 0.493 0 0 3.12 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.53μs 1.95ns 7.55ns 0.0228 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.25μs 0.92ns 3.56ns 0.0225 0 0 1.64 KB
master EnrichedLog net472 2.66μs 1.13ns 4.39ns 0.25 0 0 1.57 KB
#6656 EnrichedLog net6.0 1.49μs 1.43ns 5.52ns 0.0227 0 0 1.64 KB
#6656 EnrichedLog netcoreapp3.1 2.22μs 1.39ns 5.02ns 0.0223 0 0 1.64 KB
#6656 EnrichedLog net472 2.7μs 0.498ns 1.93ns 0.249 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 115μs 309ns 1.2μs 0.0568 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 117μs 204ns 791ns 0.0578 0 0 4.28 KB
master EnrichedLog net472 150μs 137ns 529ns 0.67 0.223 0 4.46 KB
#6656 EnrichedLog net6.0 115μs 207ns 802ns 0 0 0 4.28 KB
#6656 EnrichedLog netcoreapp3.1 116μs 113ns 409ns 0 0 0 4.28 KB
#6656 EnrichedLog net472 150μs 211ns 817ns 0.674 0.225 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 3.03μs 2.27ns 8.79ns 0.0317 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.14μs 3.12ns 12.1ns 0.0291 0 0 2.2 KB
master EnrichedLog net472 4.84μs 1.26ns 4.89ns 0.32 0 0 2.02 KB
#6656 EnrichedLog net6.0 3.17μs 0.593ns 2.3ns 0.0301 0 0 2.2 KB
#6656 EnrichedLog netcoreapp3.1 4.19μs 1.61ns 6.01ns 0.0293 0 0 2.2 KB
#6656 EnrichedLog net472 4.92μs 0.896ns 3.47ns 0.32 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.3μs 0.649ns 2.34ns 0.0157 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.81μs 0.752ns 2.91ns 0.0153 0 0 1.14 KB
master SendReceive net472 2.05μs 1.18ns 4.56ns 0.183 0 0 1.16 KB
#6656 SendReceive net6.0 1.38μs 0.549ns 2.05ns 0.0159 0 0 1.14 KB
#6656 SendReceive netcoreapp3.1 1.88μs 0.837ns 3.24ns 0.015 0 0 1.14 KB
#6656 SendReceive net472 2.09μs 0.624ns 2.34ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.67μs 0.988ns 3.83ns 0.0227 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 3.9μs 1.95ns 7.56ns 0.0215 0 0 1.65 KB
master EnrichedLog net472 4.4μs 4.33ns 16.8ns 0.323 0 0 2.04 KB
#6656 EnrichedLog net6.0 2.69μs 1.18ns 4.41ns 0.0215 0 0 1.6 KB
#6656 EnrichedLog netcoreapp3.1 3.96μs 6.4ns 24.8ns 0.0216 0 0 1.65 KB
#6656 EnrichedLog net472 4.26μs 5.46ns 19.7ns 0.324 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #6656

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑netcoreapp3.1 1.177 767.99 652.22

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 394ns 0.572ns 2.22ns 0.00803 0 0 576 B
master StartFinishSpan netcoreapp3.1 642ns 1.53ns 5.92ns 0.00767 0 0 576 B
master StartFinishSpan net472 632ns 0.781ns 3.02ns 0.0918 0 0 578 B
master StartFinishScope net6.0 482ns 0.67ns 2.6ns 0.00988 0 0 696 B
master StartFinishScope netcoreapp3.1 769ns 0.957ns 3.71ns 0.00928 0 0 696 B
master StartFinishScope net472 867ns 1.12ns 4.34ns 0.104 0 0 658 B
#6656 StartFinishSpan net6.0 394ns 0.401ns 1.55ns 0.00802 0 0 576 B
#6656 StartFinishSpan netcoreapp3.1 603ns 0.48ns 1.8ns 0.00781 0 0 576 B
#6656 StartFinishSpan net472 636ns 1.27ns 4.91ns 0.0917 0 0 578 B
#6656 StartFinishScope net6.0 476ns 0.434ns 1.68ns 0.0097 0 0 696 B
#6656 StartFinishScope netcoreapp3.1 650ns 1.33ns 5.14ns 0.00948 0 0 696 B
#6656 StartFinishScope net472 832ns 1.79ns 6.92ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 640ns 0.8ns 3.1ns 0.00982 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 938ns 1.89ns 7.31ns 0.00943 0 0 696 B
master RunOnMethodBegin net472 1.15μs 2.2ns 8.51ns 0.104 0 0 658 B
#6656 RunOnMethodBegin net6.0 614ns 0.576ns 2.23ns 0.00992 0 0 696 B
#6656 RunOnMethodBegin netcoreapp3.1 898ns 1.4ns 5.43ns 0.00955 0 0 696 B
#6656 RunOnMethodBegin net472 1.15μs 1.61ns 6.23ns 0.104 0 0 658 B

Copy link
Contributor
github-actions bot commented Feb 12, 2025

Snapshots difference summary

The following differences have been observed in committed snapshots. It is meant to help the reviewer.
The diff is simplistic, so please check some files anyway while we improve it.

1 occurrences of :

+      _dd.span_links: [{"trace_id":link_trace_id_low,"trace_id_high":link_trace_id_high,"span_id":link_span_id,"flags":2147483648,"tracestate":"","attributes":{"some_string":"five","some_string[].0":"a","some_string[].1":"b","some_string[].2":"c","some_bool":"False","some_bool[].0":"True","some_bool[].1":"False","some_int":"5","some_int[].0":"5","some_int[].1":"55","some_int[].2":"555"}},{"trace_id":link_trace_id_low,"trace_id_high":link_trace_id_high,"span_id":link_span_id,"flags":2147483649,"tracestate":"dd=s:2;p:TsParentId;t.dm:-4,foo=1,bar=baz","attributes":{}}]

1 occurrences of :

+      _dd.span_links: [{"trace_id":link_trace_id_low,"trace_id_high":link_trace_id_high,"span_id":link_span_id,"flags":2147483648,"tracestate":"","attributes":{}}]

8 occurrences of :

+      _dd.span_links: [{"trace_id":link_trace_id_low,"trace_id_high":link_trace_id_high,"span_id":link_span_id,"flags":2147483649,"tracestate":"","attributes":{}}]

8 occurrences of :

+      _dd.span_links: [{"trace_id":link_trace_id_low,"trace_id_high":link_trace_id_high,"span_id":link_span_id,"flags":2147483649,"tracestate":"","attributes":{}},{"trace_id":link_trace_id_low,"trace_id_high":link_trace_id_high,"span_id":link_span_id,"flags":2147483649,"tracestate":"","attributes":{}}]

@nhulston nhulston marked this pull request as ready for review February 13, 2025 15:50
@nhulston nhulston requested a review from a team as a code owner February 13, 2025 15:50
@tonyredondo
Copy link
Member

I wonder if this should be ported to the CI Visibility serializer as well (https://github.com/DataDog/dd-trace-dotnet/tree/master/tracer/src/Datadog.Trace/Ci/Agent/MessagePack)

@andrewlock
Copy link
Member

Throughput/Crank Report ⚡

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6656) (10.022M)   : 0, 10021920
    master (11.317M)   : 0, 11316593
    benchmarks/2.9.0 (11.045M)   : 0, 11045405

    section Automatic
    This PR (6656) (7.302M)   : 0, 7302242
    master (7.283M)   : 0, 7282654
    benchmarks/2.9.0 (7.885M)   : 0, 7885346

    section Trace stats
    master (7.494M)   : 0, 7494455

    section Manual
    master (11.171M)   : 0, 11170673

    section Manual + Automatic
    This PR (6656) (6.817M)   : 0, 6816505
    master (6.734M)   : 0, 6734216

    section DD_TRACE_ENABLED=0
    master (10.296M)   : 0, 10295906

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6656) (9.543M)   : 0, 9543493
    benchmarks/2.9.0 (9.586M)   : 0, 9586476

    section Automatic
    This PR (6656) (6.347M)   : 0, 6347158

    section Manual + Automatic
    This PR (6656) (6.038M)   : 0, 6038405

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6656) (10.442M)   : 0, 10441583

    section Automatic
    This PR (6656) (6.715M)   : 0, 6715038

    section Manual + Automatic
    This PR (6656) (6.307M)   : 0, 6307131

Loading

Copy link
Member
@lucaspimentel lucaspimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should always serialize trace flags in span links, even if it's just empty (0). (This is how the other tracers also work). Some downstream services on the backend or agent expect this field. e.g. this code errors if the trace flags field is missing.

The Go agent accepts skipping empty fields (which we do for most fields to keeps payloads smaller), so I think the Rust agent should match that behavior for compatibility reasons, even though payloads size is not a big concern for this particular field. This has been an issue before, see DataDog/libdatadog#781. Not sure if the fix in that PR can apply to this as well.

Copy link
Member
@lucaspimentel lucaspimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span links should be serialized under both Span[span_links] and Span[meta][_dd.span_links]. This is how span links work in all other tracers.

We try to keep payloads as small as possible to minimize resource usage and improve performance, and this feel like a waste of payload bytes. If this data really needs duplicated (why? 😭), couldn't it be done in the agent or the backend?

edit: the Span Links RFC specifically says that the JSON format in Span[meta][_dd.span_links] is only a fallback for agent endpoint v0.5 which doesn't support ``Span[span_links]. We only support v0.4 in dd-trace-dotnet`, so we shouldn't need the json tag fallback.

@lucaspimentel lucaspimentel requested a review from a team February 13, 2025 23:25
@@ -765,6 +767,61 @@ private void InitializeAasTags()
}
}

private string ConvertSpanLinksToJson(IReadOnlyList<SpanLink> links)
{
var result = new List<Dictionary<string, object>>(links.Count);
Copy link
Member
@lucaspimentel lucaspimentel Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the heap allocations and the json serializer's use reflection, this won't be great for performance. The MessagePack library has a MessagePackSerializer.ConvertToJson(), but I don't think we can use it easily here since it takes the MessagePack bytes as an argument.

If we really need to include this json (which is a separate question, see my other comments), we should serialize it manually, like we do for the MessagePack itself. Here's an example from the Newtonsoft.Json docs:

public static string ToJson(this Person p)
{
    StringWriter sw = new StringWriter();
    JsonTextWriter writer = new JsonTextWriter(sw);

    // {
    writer.WriteStartObject();

    // "name" : "Jerry"
    writer.WritePropertyName("name");
    writer.WriteValue(p.Name);

    // "likes": ["Comedy", "Superman"]
    writer.WritePropertyName("likes");
    writer.WriteStartArray();
    foreach (string like in p.Likes)
    {
        writer.WriteValue(like);
    }
    writer.WriteEndArray();

    // }
    writer.WriteEndObject();

    return sw.ToString();
}

@lucaspimentel lucaspimentel requested a review from a team February 13, 2025 23:42
@nhulston
Copy link
Contributor Author

We try to keep payloads as small as possible to minimize resource usage and improve performance, and this feel like a waste of payload bytes. If this data really needs duplicated (why? 😭), couldn't it be done in the agent or the backend?

edit: the Span Links RFC specifically says that the JSON format in Span[meta][_dd.span_links] is only a fallback for agent endpoint v0.5. We only support v0.4 in dd-trace-dotnet, and both v0.4 and v0.7 support Span[span_links] so we do not need the fallback.

(Posting here so other reviewers can see)

As discussed in Slack, querying on the frontend doesn't always work with the v0.5 approach. I am starting a conversation with the team responsible for indexing on the backend, and hopefully we can fully support v0.5 and avoid this data duplication.

Worst case, we support v0.4 for now, and then revert this change once the backend fully supports v0.5 querying.

I will follow up after discussing with the team

@nhulston
Copy link
Contributor Author

we should always serialize trace flags in span links, even if it's just empty (0). (This is how the other tracers also work). Some downstream services on the backend or agent expect this field. e.g. this code errors if the trace flags field is missing.

The Go agent accepts skipping empty fields (which we do for most fields to keeps payloads smaller), so I think the Rust agent should match that behavior for compatibility reasons, even though payloads size is not a big concern for this particular field. This has been an issue before, see DataDog/libdatadog#781. Not sure if the fix in that PR can apply to this as well.

I can definitely just make a change in libdatadog

@nhulston
Copy link
Contributor Author
  • Regarding the serialization of optional fields like trace flags -- I have opened a PR in libdatdog: Fix span link deserialization when missing optional fields libdatadog#887
  • Regarding the serialization of span links in meta -- I've started a conversation with the backend team about resolving this. Closing for now, but might reopen if they are unable to find a solution.

@nhulston nhulston closed this Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0