Observability is the ability to understand the internal state of a system by examining its outputs. In the context of software, this means being able to understand the internal state of a system by examining its telemetry data, which includes traces, metrics, and logs.
To make a system observable, it must be instrumented. That is, the code must emit traces, metrics, or logs. The instrumented data must then be sent to an observability backend.
OpenTelemetry (OTel) is an open-source observability framework that provides a standardized way to collect metrics, logs, and traces from your applications and systems, so you can monitor performance and diagnose problems across distributed systems.
-
An observability framework and toolkit designed to facilitate the
- Generation
- Export
- Collection of telemetry data such as traces, metrics, and logs.
-
Open source, as well as vendor- and tool-agnostic, meaning that it can be used with a broad variety of observability backends, including open source tools like Jaeger and Prometheus, as well as commercial offerings. OpenTelemetry is not an observability backend itself.
Before OTel, different vendors had their own proprietary SDKs for telemetry. This made it painful to switch or combine tools.
OTel solves this by providing vendor-neutral APIs and SDKs for:
- Tracing – Following a request across microservices.
- Metrics – Measuring performance and resource usage.
- Logging – Recording discrete events.
You can collect the data once and export it to any backend (Prometheus, Jaeger, Grafana Tempo, Elasticsearch, etc.) without rewriting instrumentation.
Concept | Meaning |
---|---|
Instrumentation | Adding OTel SDK calls to your code to collect telemetry. |
Span | A timed unit of work (e.g., "GET /users" request). |
Trace | A collection of spans that represent the journey of a single request. |
Context Propagation | Passing trace context between services so traces can be correlated. |
Exporter | Sends collected telemetry data to a backend (Jaeger, Prometheus, etc.). |
Collector | A separate service that receives, processes, and exports telemetry data from multiple apps. |
Flow:
- Your app has OTel SDK or auto-instrumentation.
- It creates logs, spans and metrics.
- Data is sent to the OpenTelemetry Collector.
- The collector processes and sends it to your observability backend.
A trace is the full journey of one request or transaction through your system. Traces add further to the observability picture by telling you what happens at each step or action in a data pathway. Traces provide the map—the where—something is going wrong.
-
In the restaurant analogy:
One customer's entire visit — from entering the restaurant, ordering food, eating, to paying and leaving. -
In OTel:
A trace has a unique Trace ID and contains all the spans that happened as part of that request. -
Example in an HTTP API:
APOST /checkout
request triggers:- Web server receives request
- Calls inventory service
- Calls payment service
- Writes to database
→ All these are part of one trace.
A span is a single operation or unit of work inside a trace.
-
In the restaurant analogy:
One scene in the movie — e.g., "Server takes the order", "Chef cooks main course", "Cashier processes payment". -
In OTel:
- Has a start time & end time
- Can have attributes (
db.statement
,http.method
, etc.) - Can be nested (parent/child relationship)
-
Example in the API trace:
-
Span 1:
HTTP POST /checkout
(parent)- Span 1.1:
SELECT inventory
(child) - Span 1.2:
Process payment
(child) - Span 1.3:
INSERT order record
(child)
- Span 1.1:
-
📌 Traces are made up of spans. Every span knows:
- Which trace it belongs to (
trace_id
) - Which span called it (
parent_span_id
)
A metric is a numerical measurement over time. Metrics provide a high level picture of the state of a system. Metrics are the foundation of alerts because metrics are numeric values and can be compared against known thresholds.
-
In the restaurant analogy:
The scoreboard showing:- Number of customers served per hour
- Average wait time
- Revenue per day
-
In OTel:
-
Common types: Counter, Gauge, Histogram
-
Examples:
http.server.request_count
(counter)memory_usage_bytes
(gauge)http.request.duration
(histogram)
-
-
Metrics are aggregated — you don't look at every single event, you look at totals, averages, percentiles over time.
Logs provide an audit trail of activity from a single process that create informational context. Logs act as atomic events, detailing what's occurring in the services in your application.
-
In the restaurant analogy:
The detailed diary entries:- "10:15 AM: Customer #42 requested extra cheese"
- "10:16 AM: Kitchen started preparing order #42"
- "10:17 AM: ERROR: Ran out of mozzarella cheese"
-
In OTel:
-
Structured logs with timestamps and levels
-
Examples:
info!("Starting Salvo server with OpenTelemetry"); warn!("Database connection retrying..."); error!("Failed to process payment: {}", error_msg);
-
-
Log Levels:
TRACE
→DEBUG
→INFO
→WARN
→ERROR
-
Correlation: Logs can include
trace_id
andspan_id
to correlate with traces -
Context: Rich structured data (user_id, request_id, etc.)
Restaurant analogy summary:
OTel Concept | Restaurant Analogy | Example |
---|---|---|
Trace | The full dining experience | Customer's entire dinner visit |
Span | A single step in that visit | "Server takes order" |
Metric | The stats across many visits | Avg. cooking time today |
Tech example (HTTP service):
OTel Concept | Example |
---|---|
Trace | One POST /checkout journey |
Span | ValidateCart() function call |
Metric | Average request latency over 5 minutes |
Tempo is an open source, easy-to-use, and high-scale distributed tracing backend. Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki. Tempo can ingest common open source tracing protocols, including Jaeger, Zipkin, and OpenTelemetry.
Prometheus is a powerful open-source monitoring and alerting system designed for reliability and scalability. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays results, and can trigger alerts when specified conditions are met. Prometheus stores all data as time series and uses a powerful query language (PromQL) for analysis.
Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate, as it does not index the contents of the logs, but rather a set of labels for each log stream.
Alloy is a flexible, high performance, vendor-neutral distribution of the OpenTelemetry Collector. It's fully compatible with the most popular open source observability standards such as OpenTelemetry and Prometheus. Alloy is a replacement for Promtail, it essentially replaces the log collector/scraper that traditionally used Promtail, Grafana Agent or OTel Agent.
graph TB
subgraph "Rust Application"
App[Rust Server<br/>Port: 5800]
App --> |Logs| OtelLogs[OpenTelemetry<br/>Logs Provider]
App --> |Traces| OtelTraces[OpenTelemetry<br/>Trace Provider]
App --> |Metrics| OtelMetrics[OpenTelemetry<br/>Metrics Provider]
end
subgraph "Data Collection"
OtelLogs --> |HTTP/4318| Alloy
OtelTraces --> |HTTP/4318| Alloy
OtelMetrics --> |HTTP/4318| Alloy
Alloy[Grafana Alloy<br/>OTLP Receiver<br/>Ports: 4317/4318]
end
subgraph "Storage Backends"
Alloy --> |Forward Traces| Tempo[Tempo<br/>Trace Storage<br/>Port: 4317]
Alloy --> |Forward Logs| Loki[Loki<br/>Log Storage<br/>Port: 3100]
Alloy --> |Forward Metrics| Prometheus[Prometheus<br/>Metrics Storage<br/>Port: 9090]
end
subgraph "Visualization"
Grafana[Grafana Dashboard<br/>Port: 3000]
Grafana --> |Query| Tempo
Grafana --> |Query| Loki
Grafana --> |Query| Prometheus
end
subgraph "Kubernetes"
K8s[K8s Cluster<br/>Namespace: monitoring]
K8s -.-> Alloy
K8s -.-> Tempo
K8s -.-> Loki
K8s -.-> Prometheus
K8s -.-> Grafana
end
The stack creates a complete observability pipeline:
- Applications → Alloy:4318 (OTLP HTTP port)
- Alloy → Tempo:4317 (forwarded traces)
- Alloy → Loki:3100 (forwarded logs)
- Alloy → Prometheus:9090 (forwarded metrics)
- Grafana ↔ All backends (unified observability dashboard)
Port 4317 is the OpenTelemetry standard - like port 80 for HTTP, it's the universal port for trace collection.
observability/
├── 📁 backend/ # Rust application with OpenTelemetry
│ ├── Cargo.toml # Dependencies and project configuration
│ ├── Cargo.lock # Lock file for reproducible builds
│ ├── Dockerfile # Container image for the Rust server
│ └── src/
│ └── main.rs # Main application with OTel instrumentation
├── 📁 k8s/ # Kubernetes manifests
│ ├── deployment.yml # Rust server deployment configuration
│ └── service.yml # Kubernetes service for rust-server
├── 📁 helm/ # Helm values for observability stack
│ ├── alloy/
│ │ └── values.yml # Alloy (OTel Collector) configuration
│ ├── grafana/
│ │ └── values.yml # Grafana dashboard configuration
│ ├── loki/
│ │ └── values.yml # Loki (logs) configuration
│ ├── prometheus/
│ │ └── values.yml # Prometheus (metrics) configuration
│ └── tempo/
│ └── values.yml # Tempo (traces) configuration
├── skaffold.yaml # Development workflow automation
├── Makefile # Convenient commands for development
└── README.md # This documentation
Component | Purpose | Port | Configuration |
---|---|---|---|
rust-server | Demo Rust app with OTel instrumentation | 5800 | backend/src/main.rs |
Alloy | OpenTelemetry Collector (data pipeline) | 4317/4318 | helm/alloy/values.yml |
Grafana | Visualization dashboard | 3000 | helm/grafana/values.yml |
Loki | Log aggregation system | 3100 | helm/loki/values.yml |
Prometheus | Metrics storage | 9090 | helm/prometheus/values.yml |
Tempo | Distributed tracing backend | 3200 | helm/tempo/values.yml |
Add the following crates to your Cargo.toml file.
dotenv = "0.15.0"
opentelemetry = { version = "0.30.0", features = ["logs", "metrics", "trace"] }
opentelemetry-appender-tracing = "0.30.1"
opentelemetry-otlp = { version = "0.30.0", features = ["logs", "metrics", "trace", "tokio"] }
opentelemetry-semantic-conventions = "0.30.0"
opentelemetry_sdk = { version = "0.30.0", features = ["logs", "metrics", "trace"] }
salvo = { version = "0.82.0", features = ["cors", "logging", "otel", "session"] }
tokio = { version = "1.44.1", features = ["full"] }
tracing = "0.1.41"
tracing-subscriber = { version = "0.3.19", features = ["env-filter", "fmt", "json", "registry", "tracing"] }
Note: Check main.rs
for full code.
- What it is:
The language-agnostic API layer for OpenTelemetry in Rust. - Role:
Defines the traits, data types, and basic functions for creating traces, spans, and metrics — but doesn't decide how they're exported or processed.
Think of this as the interface that knows what a Trace/Span/Metric is, but not where it goes.
-
What it is:
The default SDK implementation of the OTel API for Rust. -
Role:
- Actually stores spans in memory until export.
- Handles batching, sampling, aggregation.
- Lets you configure resources (service name, version, etc.).
- Connects API calls from
opentelemetry
to exporters like Jaeger or OTLP.
- What it is:
An exporter implementation that sends telemetry data to an OTLP (OpenTelemetry Protocol) endpoint — usually the OTel Collector. - Role:
Converts spans/metrics into OTLP gRPC or HTTP Protobuf format and ships them off.
Without this crate, you could still create spans in memory — but they'd never leave your app.
-
What it is:
A crate that contains standard attribute names & values defined by the OpenTelemetry spec. -
Role:
Ensures your telemetry data is consistent and portable between systems and languages. -
Example:
Instead of:KeyValue::new("service", "salvo-app")
You'd use:
use opentelemetry_semantic_conventions::resource::SERVICE_NAME; KeyValue::new(SERVICE_NAME, "salvo-app")
→ This way, Jaeger/Prometheus/Grafana knows exactly how to interpret
service.name
,http.method
,net.peer.ip
, etc.
If you make up your own attribute keys ("foo"
), they may not show up in dashboards or get special treatment.
// First, we need to get a tracer object.
let tracer = global::tracer("my-tracer");
// With tracer, we can now start new spans.
let mut _span = tracer
.span_builder("Call to /myendpoint")
.with_kind(SpanKind::Internal)
.start(&tracer);
_span.set_attribute(KeyValue::new("http.method", "GET"));
_span.set_attribute(KeyValue::new("net.protocol.version", "1.1"));
// TODO: Your code goes here
_span.end();
In the above code, we:
- Create a new span and name it "Call to /myendpoint"
- Add two attributes, following the semantic naming convention, specific to - the action of this span: information on the HTTP method and version
- Add a TODO in place of the eventual business logic
- Call the span's end() method to complete the span
// First, we need to get a tracer object.
let meter = global::meter("request_counter");
// With meter, we can now create individual instruments, such as a counter.
let updown_counter = meter.i64_up_down_counter("request_counter").build();
// We can now invoke the add() method of updown_counter to record new values with the counter.
updown_counter.add(1,&[],);
# Start minikube and deploy everything
make start
make dev
- Application: http://localhost:5800
- Grafana: http://localhost:3000
- Prometheus: http://localhost:9090
# Deploy observability stack with Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
kubectl create namespace monitoring
# Deploy components (see skaffold.yaml for full configuration)
skaffold run
Skaffold automates builds, deployments, and port forwarding:
# Development with hot reload
make dev
# Deploy once
skaffold run
# Clean up
skaffold delete
Edit backend/src/main.rs
→ Skaffold rebuilds → Auto-deploy to K8s
Customize observability components via helm/*/values.yml
:
- Alloy: OTLP receivers, processors, exporters
- Grafana: Data sources, dashboards, auth
- Loki/Prometheus/Tempo: Storage, retention, resources
# Quick start
make start # Start minikube
make dev # Deploy with hot reload
make status # Show cluster status
# Cleanup
make stop # Stop cluster
make delete # Delete cluster
- Resources: Min 4 CPU, 8GB RAM
- Security: Enable RBAC, use secrets, TLS
- Scaling: Use HPA, distributed deployments