8000 Add clean,deploy and port-forward scripts and values for Jaeger + OpenSearch + OTel Demo by danish9039 · Pull Request #7516 · jaegertracing/jaeger · GitHub
[go: up one dir, main page]

Skip to content

Conversation

danish9039
Copy link
Contributor

Which problem is this PR solving?

part of mentorship work

Description of the changes

Summary
This PR introduces a complete, repeatable deployment for the OpenTelemetry Demo with Jaeger , HotROD app and OpenSearch (including Dashboards) under examples/otel-demo. It provides a single entrypoint script that supports both upgrade and clean install modes, plus the necessary Helm values and a ClusterIP service for Jaeger Query.

Deployment script

• Added deploy-all.sh with modes: upgrade (default) and clean (uninstall + fresh install)
◦ Pre-flight checks for required CLIs (bash, git, curl, kubectl, helm) and cluster availability
◦ Validates presence of required values files
◦ Clones jaegertracing/helm-charts v2 and builds dependencies
◦ Deploys in order: OpenSearch -> OpenSearch Dashboards -> Jaeger (all-in-one) -> OTel Demo
◦ Waits for StatefulSets/Deployments to be ready with rollout status and timeouts
◦ Creates a dedicated ClusterIP service for Jaeger Query

Added values files:

◦ opensearch-values.yaml
◦ opensearch-dashboard-values.yaml
◦ jaeger-values.yaml
◦ jaeger-config.yaml (userconfig for Jaeger)
◦ otel-demo-values.yaml
• Added Jaeger Query service:
◦ jaeger-query-service.yaml (ClusterIP service in jaeger namespace)

How was this change tested?

  • Tested in Local Minikube Cluster as well as production Oracle Cluster

Checklist

…OTel Demo

Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
```


## Automatic port-forward using scrpit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a small typo in the heading: scrpit should be script in "Automatic port-forward using script"

Suggested change
## Automatic port-forward using scrpit
## Automatic port-forward using script

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 113 to 114
done
log "Cleanup complete"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of the done statement appears to be misaligned with its corresponding for loop. The current indentation suggests it's closing the inner loop, but based on the code structure, it should be aligned with the outer for loop. This could cause confusion during maintenance or potentially lead to unexpected behavior if modified later.

# Current:
for ns in jaeger otel-demo opensearch; do
  for i in {1..120}; do
    if ! kubectl get namespace "$ns" >/dev/null 2>&1; then
      break
    fi
    sleep 2
  done
  done
log "Cleanup complete"

# Suggested:
for ns in jaeger otel-demo opensearch; do
  for i in {1..120}; do
    if ! kubectl get namespace "$ns" >/dev/null 2>&1; then
      break
    fi
    sleep 2
  done
done
log "Cleanup complete"

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
codecov bot commented Sep 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.63%. Comparing base (b966cc5) to head (794f492).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7516      +/-   ##
==========================================
+ Coverage   96.59%   96.63%   +0.03%     
==========================================
  Files         383      383              
  Lines       23238    23238              
==========================================
+ Hits        22447    22455       +8     
+ Misses        602      596       -6     
+ Partials      189      187       -2     
Flag Coverage Δ
badger_v1 9.02% <ø> (ø)
badger_v2 1.70% <ø> (ø)
cassandra-4.x-v1-manual 11.67% <ø> (ø)
cassandra-4.x-v2-auto 1.69% <ø> (ø)
cassandra-4.x-v2-manual 1.69% <ø> (ø)
cassandra-5.x-v1-manual 11.67% <ø> (ø)
cassandra-5.x-v2-auto 1.69% <ø> (ø)
cassandra-5.x-v2-manual 1.69% <ø> (ø)
elasticsearch-6.x-v1 16.57% <ø> (ø)
elasticsearch-7.x-v1 16.61% <ø> (ø)
elasticsearch-8.x-v1 16.75% <ø> (ø)
elasticsearch-8.x-v2 1.70% <ø> (ø)
elasticsearch-9.x-v2 1.70% <ø> (ø)
grpc_v1 10.22% <ø> (ø)
grpc_v2 1.70% <ø> (ø)
kafka-3.x-v1 9.66% <ø> (ø)
kafka-3.x-v2 1.70% <ø> (ø)
memory_v2 1.70% <ø> (ø)
opensearch-1.x-v1 16.66% <ø> (ø)
opensearch-2.x-v1 16.66% <ø> (ø)
opensearch-2.x-v2 1.70% <ø> (-0.09%) ⬇️
opensearch-3.x-v2 1.70% <ø> (-0.09%) ⬇️
query 1.70% <ø> (ø)
tailsampling-processor 0.46% <ø> (ø)
unittests 95.62% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Jaeger (all-in-one) for tracing
- OpenSearch and OpenSearch Dashboards
- OpenTelemetry Demo application (multi-service web store)
- HotRod application
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to subsume examples/oci which also deploys jaeger and hotrod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to subsume examples/oci which also deploys jaeger and hotrod?

Yes , this demo has otel-demo along with hotrod app as well . In examples/oci , jaeger uses in-memory storage and prometheus as metrics backend , as per you suggestion in the last meeting , we are using opensearch as metrics backend and removing prometheus completely in this demo .

Copy link
github-actions bot commented Sep 20, 2025

Metrics Comparison Summary

Total changes across all snapshots: 199

Detailed changes per snapshot

summary_metrics_snapshot_opensearch

📊 Metrics Diff Summary

Total Changes: 73

  • 🆕 Added: 73 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics

🆕 Added Metrics

  • jaeger_storage_latency_seconds (18 variants)
View diff sample
+jaeger_storage_latency_seconds{kind="opensearch",le="+Inf",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="0",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="10",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="100",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="1000",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="10000",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="25",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
...
- `jaeger_storage_requests` (1 variants)
View diff sample
+jaeger_storage_requests{kind="opensearch",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
- `rpc_server_duration_milliseconds` (18 variants)
View diff sample
+rpc_server_duration_milliseconds{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_requests_per_rpc` (18 variants)
View diff sample
+rpc_server_requests_per_rpc{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_responses_per_rpc` (18 variants)
View diff sample
+rpc_server_responses_per_rpc{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
### summary_metrics_snapshot_opensearch ## 📊 Metrics Diff Summary

Total Changes: 73

  • 🆕 Added: 73 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics

🆕 Added Metrics

  • jaeger_storage_latency_seconds (18 variants)
View diff sample
+jaeger_storage_latency_seconds{kind="opensearch",le="+Inf",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="0",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="10",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="100",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="1000",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="10000",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
+jaeger_storage_latency_seconds{kind="opensearch",le="25",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
...
- `jaeger_storage_requests` (1 variants)
View diff sample
+jaeger_storage_requests{kind="opensearch",name="some_storage",operation="find_traces",otel_scope_name="jaeger-v2",otel_scope_schema_url="",otel_scope_version="",result="err",role="tracestore"}
- `rpc_server_duration_milliseconds` (18 variants)
View diff sample
+rpc_server_duration_milliseconds{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_duration_milliseconds{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_requests_per_rpc` (18 variants)
View diff sample
+rpc_server_requests_per_rpc{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_requests_per_rpc{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
- `rpc_server_responses_per_rpc` (18 variants)
View diff sample
+rpc_server_responses_per_rpc{le="+Inf",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="0",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="10",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="100",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="1000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="10000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
+rpc_server_responses_per_rpc{le="25",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_schema_url="https://opentelemetry.io/schemas/1.34.0",otel_scope_version="0.62.0",rpc_grpc_status_code="2",rpc_method="FindTraces",rpc_service="jaeger.api_v3.QueryService",rpc_system="grpc"}
...
### summary_metrics_snapshot_cassandra ## 📊 Metrics Diff Summary

Total Changes: 53

  • 🆕 Added: 0 metrics
  • ❌ Removed: 53 metrics
  • 🔄 Modified: 0 metrics

❌ Removed Metrics

  • http_server_request_body_size_bytes (18 variants)
View diff sample
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_request_duration_seconds` (17 variants)
View diff sample
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_response_body_size_bytes` (18 variants)
View diff sample
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
-http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.62.0",server_address="localhost",server_port="13133",url_scheme="http"}
...

➡️ View full metrics file

minor fix

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Signed-off-by: hippie-danish <133037056+danish9039@users.noreply.github.com>
persistence:
enabled: true
size: "10Gi"
storageClass: "oci-bv" # Using Oracle Cloud Block Volume storage class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded cloud-specific storage class creates deployment failures. The storage class 'oci-bv' is specific to Oracle Cloud Infrastructure and will cause PVC creation failures on other Kubernetes platforms (AWS, GCP, Azure, minikube, etc.). Either remove this line to use the default storage class, or make it configurable via environment variable or parameter.

Suggested change
storageClass: "oci-bv" # Using Oracle Cloud Block Volume storage class
storageClass: "" # Set this to your cluster's storage class (e.g., "gp2" for AWS, "standard" for GCP, "oci-bv" for Oracle Cloud)

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 169 to 171
while true; do
sleep 10
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infinite loop without proper signal handling creates resource leak. The while true; do sleep 10; done loop runs indefinitely and may not properly handle all termination signals, potentially leaving port-forward processes running as orphans. The trap on line 166 only handles INT signal. Add proper cleanup for other termination signals (TERM, QUIT) and consider using wait instead of infinite sleep to make the script more responsive to signals.

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Comment on lines +107 to +113
1E71 for i in {1..120}; do
if ! kubectl get namespace "$ns" >/dev/null 2>&1; then
break
fi
sleep 2
done
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition in namespace deletion wait loop. The loop uses a fixed iteration count (120) with 2-second sleeps, but if namespace deletion takes longer than 240 seconds, the script continues without ensuring namespaces are fully deleted. This can cause subsequent deployments to fail with resource conflicts. Should add explicit timeout handling and error checking after the loop completes.

Suggested change
for i in {1..120}; do
if ! kubectl get namespace "$ns" >/dev/null 2>&1; then
break
fi
sleep 2
done
done
# Wait up to 4 minutes for namespace deletion
timeout_seconds=240
start_time=$(date +%s)
while true; do
current_time=$(date +%s)
elapsed_time=$((current_time - start_time))
if ! kubectl get namespace "$ns" >/dev/null 2>&1; then
echo "Namespace $ns successfully deleted."
break
fi
if [ $elapsed_time -ge $timeout_seconds ]; then
echo "ERROR: Timed out waiting for namespace $ns to be deleted after ${timeout_seconds} seconds."
echo "You may need to manually check and delete the namespace before retrying."
exit 1
fi
sleep 2
done
done

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

echo "Deleting namespaces..."
for ns in otel-demo jaeger opensearch; do
if kubectl get namespace "$ns" > /dev/null 2>&1; then
kubectl delete namespace "$ns" --force --grace-period=0 2>/dev/null || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangerous use of --force --grace-period=0 flags for namespace deletion. This forces immediate termination without allowing pods to shut down gracefully, which can lead to data corruption, incomplete cleanup of resources, and potential resource leaks. The --force flag should be removed to allow proper graceful shutdown, or used only as a last resort with additional safety checks.

Suggested change
kubectl delete namespace "$ns" --force --grace-period=0 2>/dev/null || true
kubectl delete namespace "$ns" 2>/dev/null || true

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.


# Stop any existing port forwards first
echo " Stopping any existing port-forward processes..."
pkill -f "kubectl port-forward" 2>/dev/null || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad process killing with 'pkill -f kubectl port-forward' will terminate ALL kubectl port-forward processes on the system, not just those started by this script. This can disrupt other users' port-forwards or other applications. Should track PIDs of started processes and kill only those specific processes, or use more specific process identification.

Suggested change
pkill -f "kubectl port-forward" 2>/dev/null || true
# Kill any port-forward processes started by this script
if [ -f ".port-forward-pids" ]; then
while read -r pid; do
kill "$pid" 2>/dev/null || true
done < ".port-forward-pids"
rm ".port-forward-pids"
fi

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

1E71
Signed-off-by: danish9039 <danishsiddiqui040@gmail.com>
Comment on lines 159 to 162
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://jaeger-collector.jaeger.svc.cluster.local:4317
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: grpc
< ED4F details class="details-overlay details-reset position-relative d-inline-block"> Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gRPC connections, the OTLP endpoint for the checkout service should not include the http:// prefix. The correct format for gRPC endpoints is just the host and port without a protocol prefix. Please update to:

- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: jaeger-collector.jaeger.svc.cluster.local:4317

This ensures proper gRPC communication between the checkout service and the Jaeger collector.

Suggested change
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://jaeger-collector.jaeger.svc.cluster.local:4317
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: grpc
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: jaeger-collector.jaeger.svc.cluster.local:4317
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: grpc

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

- name: OTEL_TRACES_EXPORTER
value: otlp

# Component-specific configurations using correct schema names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to redefine all of this? It's going to be very fragile - if upstream demo changes our settings will become stale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro The OTEL demo comes with a built-in OTEL Collector that receives telemetry data from each service and provides a connection point to access traces, metrics, and logs for other services.
Architecture of the OTEL demo out-of-the-box: https://opentelemetry.io/docs/demo/architecture/

In our earlier discussions about the architecture of the otel-demo-jaeger-opensearch stack #7326 (comment) , we decided to drop the built-in OTEL Collector and instead use Jaeger v2, which also serves the purpose of a collector. Because of this, we had to wire each service in the OTEL demo manually to send traces directly to our Jaeger instance.

Had we chosen to keep the built-in OTEL Collector, we would only have needed to configure the collector with trace endpoints to forward traces to our Jaeger instance.

https://opentelemetry.io/docs/demo/kubernetes-deployment/#bring-your-own-backend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If having the collector results in less configuration I would keep it. Does it also provide some enrichment of telemetry based on k8s information?

Copy link
Contributor Author
@danish9039 danish9039 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro Sure, we can use the otel collector . Regarding enrichment of telemetry: by default otel collector in demo chart provides basic resource attributes (docker, environment, etc.) using the resourcedetection processor
resourcedetection example in otel-demo config

we can also use something like the k8sattributes processor
k8sattributes processor documentation

@yurishkuro yurishkuro added the changelog:exprimental Change to an experimental part of the code label Sep 27, 2025
danish9039 and others added 2 commits September 29, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/otel changelog:exprimental Change to an experimental part of the code mentorship
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0