From eed830f631a9cec40c3b3e1e89d148201778f69a Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 2 May 2025 10:38:18 -0400 Subject: [PATCH 1/6] added failure to answer configuration option explanation --- content/en/llm_observability/terms/_index.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/content/en/llm_observability/terms/_index.md b/content/en/llm_observability/terms/_index.md index 0a62f76964bf4..84f65b58fcd7d 100644 --- a/content/en/llm_observability/terms/_index.md +++ b/content/en/llm_observability/terms/_index.md @@ -183,6 +183,16 @@ This check identifies instances where the LLM fails to deliver an appropriate re |---|---|---| | Evaluated on Output | Evaluated using LLM | Failure To Answer flags whether each prompt-response pair demonstrates that the LLM application has provided a relevant and satisfactory answer to the user's question. | +##### Failure to Answer Configuration +The types of Failure to Answer are defined below and can be configured when the Failure to Answer evaluation is enabled. +| Configuration Option | Description | Example(s) | +|---|---| +| Empty Code Response | An empty code object like an empty list or tuple, signifiying no data or results | (), [], {} | +| Empty Response | No meaningful response, returning only whitespace | whitespace | +| No Content Response | An empty output accompanied by a message indicated no content is available | Not found, N/A | +| Redirection Response | Redirects the user to another source of suggests an alternative approach | If you have additional details, I’d be happy to include them| +| Refusal Response | Explicitly declines to provide an answer or the complete the request | Sorry, I can't answer this question | + #### Language Mismatch This check identifies instances where the LLM generates responses in a different language or dialect than the one used by the user, which can lead to confusion or miscommunication. This check ensures that the LLM's responses are clear, relevant, and appropriate for the user's linguistic preferences and needs. From 33d840908c2af5f158be2e42db0adcc6c0ef3111 Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 2 May 2025 11:09:38 -0400 Subject: [PATCH 2/6] hopefully table renders --- content/en/llm_observability/terms/_index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/en/llm_observability/terms/_index.md b/content/en/llm_observability/terms/_index.md index 84f65b58fcd7d..b9a449269e655 100644 --- a/content/en/llm_observability/terms/_index.md +++ b/content/en/llm_observability/terms/_index.md @@ -185,8 +185,9 @@ This check identifies instances where the LLM fails to deliver an appropriate re ##### Failure to Answer Configuration The types of Failure to Answer are defined below and can be configured when the Failure to Answer evaluation is enabled. + | Configuration Option | Description | Example(s) | -|---|---| +|---|---|---| | Empty Code Response | An empty code object like an empty list or tuple, signifiying no data or results | (), [], {} | | Empty Response | No meaningful response, returning only whitespace | whitespace | | No Content Response | An empty output accompanied by a message indicated no content is available | Not found, N/A | From 76b5c5ba4f625aded5c1b10ce95ea8358e015b3c Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 2 May 2025 12:31:29 -0400 Subject: [PATCH 3/6] added reference to failure to answer configuration in configuration page --- content/en/llm_observability/configuration/_index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/en/llm_observability/configuration/_index.md b/content/en/llm_observability/configuration/_index.md index dc91c256df685..4d2c35a0a304e 100644 --- a/content/en/llm_observability/configuration/_index.md +++ b/content/en/llm_observability/configuration/_index.md @@ -102,6 +102,7 @@ Connect your Amazon Bedrock account to LLM Observability with your AWS Account. 1. Select the span names you would like your evaluation to run on. (Optional if traces is selected). 1. Optionally, specify the tags you want this evaluation to run on and choose whether to apply the evaluation to spans that match any of the selected tags (Any of), or all of the selected tags (All of). 1. Select what percentage of spans you would like this evaluation to run on by configuring the **sampling percentage**. This number must be greater than 0 and less than or equal to 100. A Sampling Percentage of 100% means that the evaluation runs on all valid spans, whereas a sampling percentage of 50% means that the evaluation runs on 50% of valid spans. +1. (Optional) For Failure to Answer, configure the evaluation by selecting what types of answers should be considered Failure to Answer. This configuration is detailed in [Failure to Answer Configuration][5] After you click **Save**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. @@ -138,3 +139,4 @@ Topics can contain multiple words and should be as specific and descriptive as p [2]: https://app.datadoghq.com/llm/settings/evaluations [3]: /llm_observability/terms/#topic-relevancy [4]: https://app.datadoghq.com/llm/applications +[5]: /llm_observability/terms/#failure-to-answer-configuration From 0f9bbac7400d0d1e7dbeb1c6a7471a3b27b8ba67 Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 2 May 2025 13:28:38 -0400 Subject: [PATCH 4/6] added a period --- content/en/llm_observability/configuration/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/llm_observability/configuration/_index.md b/content/en/llm_observability/configuration/_index.md index 4d2c35a0a304e..d8314dbfa9bbd 100644 --- a/content/en/llm_observability/configuration/_index.md +++ b/content/en/llm_observability/configuration/_index.md @@ -102,7 +102,7 @@ Connect your Amazon Bedrock account to LLM Observability with your AWS Account. 1. Select the span names you would like your evaluation to run on. (Optional if traces is selected). 1. Optionally, specify the tags you want this evaluation to run on and choose whether to apply the evaluation to spans that match any of the selected tags (Any of), or all of the selected tags (All of). 1. Select what percentage of spans you would like this evaluation to run on by configuring the **sampling percentage**. This number must be greater than 0 and less than or equal to 100. A Sampling Percentage of 100% means that the evaluation runs on all valid spans, whereas a sampling percentage of 50% means that the evaluation runs on 50% of valid spans. -1. (Optional) For Failure to Answer, configure the evaluation by selecting what types of answers should be considered Failure to Answer. This configuration is detailed in [Failure to Answer Configuration][5] +1. (Optional) For Failure to Answer, configure the evaluation by selecting what types of answers should be considered Failure to Answer. This configuration is detailed in [Failure to Answer Configuration][5]. After you click **Save**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. From ac41f763c136800349045d136fa5ffcffdec37da Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 2 May 2025 14:16:06 -0400 Subject: [PATCH 5/6] added specification that this is only for OpenAI and Azure OpenAI --- content/en/llm_observability/configuration/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/llm_observability/configuration/_index.md b/content/en/llm_observability/configuration/_index.md index d8314dbfa9bbd..26b95143245f8 100644 --- a/content/en/llm_observability/configuration/_index.md +++ b/content/en/llm_observability/configuration/_index.md @@ -102,7 +102,7 @@ Connect your Amazon Bedrock account to LLM Observability with your AWS Account. 1. Select the span names you would like your evaluation to run on. (Optional if traces is selected). 1. Optionally, specify the tags you want this evaluation to run on and choose whether to apply the evaluation to spans that match any of the selected tags (Any of), or all of the selected tags (All of). 1. Select what percentage of spans you would like this evaluation to run on by configuring the **sampling percentage**. This number must be greater than 0 and less than or equal to 100. A Sampling Percentage of 100% means that the evaluation runs on all valid spans, whereas a sampling percentage of 50% means that the evaluation runs on 50% of valid spans. -1. (Optional) For Failure to Answer, configure the evaluation by selecting what types of answers should be considered Failure to Answer. This configuration is detailed in [Failure to Answer Configuration][5]. +1. (Optional) For Failure to Answer, if OpenAI or Azure OpenAI is selected, configure the evaluation by selecting what types of answers should be considered Failure to Answer. This configuration is detailed in [Failure to Answer Configuration][5]. After you click **Save**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. From 95db6365e622d6ca8bfb809411bfb7358a665d32 Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Tue, 13 May 2025 15:51:42 -0400 Subject: [PATCH 6/6] added two more examples of empty code response --- content/en/llm_observability/terms/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/llm_observability/terms/_index.md b/content/en/llm_observability/terms/_index.md index b9a449269e655..de0fadf45b97d 100644 --- a/content/en/llm_observability/terms/_index.md +++ b/content/en/llm_observability/terms/_index.md @@ -188,7 +188,7 @@ The types of Failure to Answer are defined below and can be configured when the | Configuration Option | Description | Example(s) | |---|---|---| -| Empty Code Response | An empty code object like an empty list or tuple, signifiying no data or results | (), [], {} | +| Empty Code Response | An empty code object like an empty list or tuple, signifiying no data or results | (), [], {}, "", '' | | Empty Response | No meaningful response, returning only whitespace | whitespace | | No Content Response | An empty output accompanied by a message indicated no content is available | Not found, N/A | | Redirection Response | Redirects the user to another source of suggests an alternative approach | If you have additional details, I’d be happy to include them|