8000 Service Discovery by brett0000FF · Pull Request #29046 · DataDog/documentation · GitHub
[go: up one dir, main page]

Skip to content

Service Discovery #29046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 30 additions & 25 deletions config/_default/menus/main.en.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3865,31 +3865,36 @@ menu:
identifier: monitor_endpoints
parent: endpoint_observability
weight: 803
- name: Service Discovery
url: /tracing/service_discovery/
identifier: service_discovery
parent: tracing
weight: 9
- name: Dynamic Instrumentation
url: dynamic_instrumentation/
identifier: dyninst
parent: tracing
weight: 9
weight: 10
- name: Enabling
url: dynamic_instrumentation/enabling
identifier: dyninst_enable
parent: dyninst
weight: 901
weight: 1001
- name: Expression Language
url: dynamic_instrumentation/expression-language
identifier: dyninst_explang
parent: dyninst
weight: 902
weight: 1002
- name: Live Debugger
url: tracing/live_debugger/
identifier: live_debugger
parent: tracing
weight: 10
weight: 11
- name: Error Tracking
url: tracing/error_tracking/
parent: tracing
identifier: tracing_error_tracking
weight: 11
weight: 12
- name: Error Tracking Explorer
url: tracing/error_tracking/explorer
parent: tracing_error_tracking
Expand All @@ -3899,102 +3904,102 @@ menu:
url: tracing/error_tracking/issue_states
parent: tracing_error_tracking
identifier: tracing_error_tracking_states
weight: 1102
weight: 1202
- name: Error Grouping
url: tracing/error_tracking/error_grouping
parent: tracing_error_tracking
identifier: tracing_error_tracking_error_grouping
weight: 1103
weight: 1203
- name: Monitors
url: tracing/error_tracking/monitors
parent: tracing_error_tracking
identifier: tracing_error_tracking_monitors
weight: 1105
weight: 1205
- name: Identify Suspect Commits
url: tracing/error_tracking/suspect_commits
parent: tracing_error_tracking
identifier: tracing_error_tracking_suspect_commits
weight: 1106
weight: 1206
- name: Exception Replay
url: tracing/error_tracking/exception_replay
parent: tracing_error_tracking
identifier: tracing_error_tracking_exception_replay
weight: 1107
weight: 1207
- name: Troubleshooting
url: error_tracking/troubleshooting
parent: tracing_error_tracking
identifier: tracing_error_tracking_troubleshooting
weight: 1108
weight: 1208
- name: Data Security
url: tracing/configure_data_security/
parent: tracing
identifier: tracing_data_security
weight: 12
weight: 13
- name: Guides
url: tracing/guide/
parent: tracing
identifier: tracing_guides
weight: 13
weight: 14
- name: Troubleshooting
url: tracing/troubleshooting/
parent: tracing
identifier: tracing_troubleshooting
weight: 14
weight: 15
- name: Tracer Startup Logs
url: tracing/troubleshooting/tracer_startup_logs
identifier: tracing_troubleshooting_startup_logs
parent: tracing_troubleshooting
weight: 1401
weight: 1501
- name: Tracer Debug Logs
url: tracing/troubleshooting/tracer_debug_logs
identifier: tracing_troubleshooting_debug_logs
parent: tracing_troubleshooting
weight: 1402
weight: 1502
- name: Connection Errors
url: tracing/troubleshooting/connection_errors
identifier: tracing_troubleshooting_connection_errors
parent: tracing_troubleshooting
weight: 1403
weight: 1503
- name: Agent Rate Limits
url: tracing/troubleshooting/agent_rate_limits
identifier: tracing_troubleshooting_rate_limits
parent: tracing_troubleshooting
weight: 1404
weight: 1504
- name: Agent APM metrics
url: tracing/troubleshooting/agent_apm_metrics
identifier: tracing_troubleshooting_apm_metrics
parent: tracing_troubleshooting
weight: 1405
weight: 1505
- name: Agent Resource Usage
url: tracing/troubleshooting/agent_apm_resource_usage
identifier: tracing_troubleshooting_agent_usage
parent: tracing_troubleshooting
weight: 1406
weight: 1506
- name: Correlated Logs
url: tracing/troubleshooting/correlated-logs-not-showing-up-in-the-trace-id-panel
identifier: tracing_troubleshooting_correlated_logs
parent: tracing_troubleshooting
weight: 1407
weight: 1507
- name: PHP 5 Deep Call Stacks
url: tracing/troubleshooting/php_5_deep_call_stacks
identifier: tracing_troubleshooting_php_5_deep_call_stacks
parent: tracing_troubleshooting
weight: 1408
weight: 1508
- name: .NET diagnostic tool
url: tracing/troubleshooting/dotnet_diagnostic_tool
identifier: tracing_troubleshooting_dotnet_diagnostic_tool
parent: tracing_troubleshooting
weight: 1409
weight: 1509
- name: APM Quantization
url: tracing/troubleshooting/quantization
identifier: tracing_troubleshooting_quantization
parent: tracing_troubleshooting
weight: 1410
weight: 1510
- name: Go Compile-Time Instrumentation
url: /tracing/troubleshooting/go_compile_time
identifier: tracing_troubleshooting_go_instrumentation
parent: tracing_troubleshooting
weight: 1411
weight: 1511
- name: Continuous Profiler
url: profiler/
pre: profiling-1
Expand Down
138 changes: 138 additions & 0 deletions content/en/tracing/service_discovery/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: Service Discovery
further_reading:
- link: "/agent/fleet_automation/"
tag: "Documentation"
text: "Fleet Automation"
- link: "/tracing/trace_collection/automatic_instrumentation/single-step-apm/"
tag: "Documentation"
text: "Single Step APM Instrumentation"
- link: "/data_streams/"
tag: "Documentation"
text: "Data Streams Monitoring"
---

## Overview

Service Discovery provides visibility into the monitoring coverage of your services within Datadog. It automatically discovers services running across your infrastructure, helps you identify potential observability gaps, and provides relevant information to triage and take action.

Service Discovery is accessed through the **[Fleet Automation > Services][1]** page.

{{< img src="/tracing/service_discovery/service_discovery.png" alt="Datadog Fleet Automation Services page showing a list of discovered services not sending traces. Each service entry includes tags, container count, CPU and memory usage, and network activity." style="width:100%;" >}}

### Key benefits

- **Discover all services**: View both monitored and unmonitored services running in your fleet in one centralized location.
- **Demystify services**: Understand the potential importance of unmonitored services using contextual information like infrastructure footprint, resource consumption, network activity, configuration details, and tags.
- **Close observability gaps**: Receive recommendations and guided instructions to instrument services with APM or enable other relevant Datadog products such as [Data Streams Monitoring (DSM)][3].
- **Triage effectively**: Prioritize which services to monitor using sortable metadata columns, facet filtering, and an organization-wide ignore list for noisy or irrelevant services.

### How it works

1. The Datadog Agent inspects running processes and container metadata on supported hosts.
2. Processes are automatically grouped into services based on a defined naming hierarchy (see [Discovered service naming](#discovered-service-naming).
3. Datadog checks if discovered services are emitting traces to Datadog APM.
4. Services are displayed in **Fleet Automation > Services** under either **Monitored with APM** or **Not Sending Traces**.
5. Contextual metadata (CPU, memory, network I/O, tags, infrastructure links) is associated with each service to aid prioritization.
6. Short-lived processes (running for less than 1 minute) are automatically ignored to reduce noise.

### Discovered service naming

Service Discovery automatically identifies and names services based on a priority order of available identifiers:

1. Existing Datadog service name set using [Unified Service Tagging][5].
2. Container labels (for containerized services).
3. Language-specific manifest files (for example, `package.json` for Node.js).
4. Command-line arguments and process information.


For Java enterprise web applications (specifically JBoss, Websphere, Tomcat, Jetty, and Weblogic), multiple web applications running in the same process are displayed as individual services in the unmonitored services list, with visual indicators showing they belong to the same process.

## Requirements

Service Discovery requires the Datadog Agent and is supported on specific operating environments:

| Environment | Minimum Agent Version |
|-------------------------------|-----------------------|
| Linux Hosts (x86-64) | `[7.xx.x+]` |
| Docker Containers (on Linux) | `[7.xx.x+]` |
| Kubernetes (Helm Chart) | `[7.xx.x+]` |
| Kubernetes (Datadog Operator) | `[7.xx.x+]` |

<div class="alert alert-info">Service Discovery supports Linux (x86-64) environments only. Windows, ARM architectures, and serverless environments are not supported.</div>

## Setup

Enable Service Discovery by installing the latest version of the Datadog Agent with the Service Discovery feature turned on.

To enable Service Discovery on Linux hosts, install the latest version of the Datadog Agent and use the toggle in the installation UI:

1. Navigate to [**Fleet Automation > Install Agents**][6].
2. Select your platform.
3. Follow the installation instructions, and ensure that the **Service Discovery** toggle is turned on.

Allow a few minutes for data to appear on the **Fleet Automation > Services** page.

## Explore discovered services

Navigate to **[Fleet Automation > Services][1]** in Datadog to view discovered services. The page presents two main views:

- **Not Sending Traces**: Services discovered by the Agent that are not sending trace data to Datadog APM.
- **Monitored with APM**: Services actively sending trace data to Datadog APM.

### Monitored with APM

This view lists services already instrumented and sending trace data to Datadog APM. It provides insights into the current monitoring state:

- **Type**: Shows the service type with its corresponding icon.
- **Service**: Displays the service name.
- **APM SDK**: Shows the versions of Datadog tracing libraries used by the service instances.
- **Telemetry**: Indicates the types of telemetry being collected (Traces, Logs, USM, DSM, Profiling).

### Not sending traces

This view lists services detected on your infrastructure that are not monitored by Datadog APM. Understanding these services is the first step toward closing observability gaps.

For each unmonitored service, Datadog provides contextual information to help you assess its importance and prioritize instrumentation:

- **Service**: The name assigned to the service, based on the naming hierarchy described above.
- **Infra**: An overview of the hosts or containers the service is running on, displayed as container icons with counts. Clicking this provides links to the relevant infrastructure components in Datadog.
- **CPU Usage**: Shows CPU cores usage.
- **Memory Usage**: Shows memory usage.
- **Bytes Received**: Shows incoming network traffic.
- **Bytes Sent**: Shows outgoing network traffic.

#### Triaging unmonitored services

- **Sorting**: By default, services are sorted by infrastructure footprint (descending). You can sort by other columns like CPU, Memory, or Network Activity to prioritize based on resource usage or traffic.
- **Filtering**: Use the search bar and facets to narrow down the list.
- **Ignoring Services**: If a discovered service is noisy or not relevant for monitoring (such as temporary utilities or test workloads), hover over the service and click **Ignore**. Ignored services are hidden from the main list but can be viewed and restored from the **Ignored Services** toggle.

**Enabling APM:**

For services you decide to monitor, click the **Enable APM** button. Datadog provides instructions to enable APM using **Single Step Instrumentation** for supported languages.

## Troubleshooting

- **No services listed under "Not Sending Traces"**:
- Verify the Datadog Agent version meets the minimum requirement.
- Confirm that Service Discovery is correctly enabled during Agent installation.
- Ensure services have been running for more than 1 minute on supported platforms.
- Allow a few minutes after Agent installation for data to populate.
- **Some expected services are missing**:
- Check if the service runs for less than 1 minute.
- Confirm the service is running on a supported platform (Linux x86-64).

If issues persist, collect an [Agent flare][4] and contact [Datadog Support][2].

## Further reading

{{< partial name="whats-next/whats-next.html" >}}

[1]: https://app.datadoghq.com/fleet/services
[2]: /help/
[3]: /data_streams/
[4]: /agent/troubleshooting/send_a_flare/
[5]: /getting_started/tagging/unified_service_tagging/
[6]: https://app.datadoghq.com/fleet/install-agent/latest?platform=overview

5 changes: 1 addition & 4 deletions content/en/tracing/trace_collection/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,7 @@ Capture observability data from in-house code or complex functions that aren't c

To learn more, see [custom instrumentation][6].

{{< callout url="https://www.datadoghq.com/product-preview/service-discovery/" btn_hidden="false" header="Service discovery is in Preview">}}
Service discovery provides complete visibility into the current state of application monitoring, highlighting any major gaps or broken traces in your system.
{{< /callout >}}

<div class="alert alert-info"><a href="/tracing/service_discovery/">Service Discovery</a> provides visibility into the monitoring coverage of your services within Datadog. It automatically discovers services running across your infrastructure, helps you identify potential observability gaps, and provides relevant information to triage and take action.</div>

## APM setup tutorials

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
0