From 83cee22dc7de9d066eef139da15909f63d214dc1 Mon Sep 17 00:00:00 2001 From: EdwardAngert Date: Fri, 14 Feb 2025 20:43:38 +0000 Subject: [PATCH 1/6] add oom/ood to notifications --- docs/admin/monitoring/notifications/index.md | 50 ++++++++++++++++++-- 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/docs/admin/monitoring/notifications/index.md b/docs/admin/monitoring/notifications/index.md index eb077e13b38ed..4193bab4b3fa8 100644 --- a/docs/admin/monitoring/notifications/index.md +++ b/docs/admin/monitoring/notifications/index.md @@ -29,14 +29,14 @@ These notifications are sent to the workspace owner: ### User Events -These notifications sent to users with **owner** and **user admin** roles: +These notifications are sent to users with **owner** and **user admin** roles: - User account created - User account deleted - User account suspended - User account activated -These notifications sent to users themselves: +These notifications are sent to users themselves: - User account suspended - User account activated @@ -48,6 +48,8 @@ These notifications are sent to users with **template admin** roles: - Template deleted - Template deprecated +- Out of memory (OOM) / Out of disk (OOD) + - [Configure](#configure-oomood-notifications) in the template `main.tf`. - Report: Workspace builds failed for template - This notification is delivered as part of a weekly cron job and summarizes the failed builds for a given template. @@ -63,6 +65,48 @@ flags. | ✔️ | `--notifications-method` | `CODER_NOTIFICATIONS_METHOD` | `string` | Which delivery method to use (available options: 'smtp', 'webhook'). See [Delivery Methods](#delivery-methods) below. | smtp | | -️ | `--notifications-max-send-attempts` | `CODER_NOTIFICATIONS_MAX_SEND_ATTEMPTS` | `int` | The upper limit of attempts to send a notification. | 5 | +### Configure OOM/OOD notifications + +You can alert users when they overutilize memory and disk. + +This can help prevent agent disconnects due to OOM/OOD issues. + +To enable OOM/OOD notifications on a template, use the +[`resources_monitoring`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#resources_monitoring-1) +block on the +[`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent) +resource in our Terraform provider. + +You can specify one or more volumes to monitor for OOD alerts. +OOM alerts are reported per-agent. + +Add the following example to the template's `main.tf`. +Change the `90`, `80`, and `95` to a threshold that's more appropriate for your +deployment: + +```hcl +resource "coder_agent" "main" { + arch = data.coder_provisioner.dev.arch + os = data.coder_provisioner.dev.os + resources_monitoring { + memory { + enabled = true + threshold = 90 + } + volume { + path = "/volume1" + enabled = true + threshold = 80 + } + volume { + path = "/volume2" + enabled = true + threshold = 95 + } + } +} +``` + ## Delivery Methods Notifications can currently be delivered by either SMTP or webhook. Each message @@ -135,7 +179,7 @@ for more options. After setting the required fields above: -1. Setup an account on Microsoft 365 or outlook.com +1. Set up an account on Microsoft 365 or outlook.com 1. Set the following configuration options: ```text From 682655fa7e04d4c4fed156616e7f1d56d14d2ae0 Mon Sep 17 00:00:00 2001 From: EdwardAngert Date: Thu, 20 Feb 2025 20:23:17 +0000 Subject: [PATCH 2/6] move tf block to new resource-monitoring page --- docs/admin/monitoring/notifications/index.md | 40 ++---------------- .../resource-monitoring.md | 42 +++++++++++++++++++ docs/manifest.json | 5 +++ 3 files changed, 51 insertions(+), 36 deletions(-) create mode 100644 docs/admin/templates/extending-templates/resource-monitoring.md diff --git a/docs/admin/monitoring/notifications/index.md b/docs/admin/monitoring/notifications/index.md index 4193bab4b3fa8..7cd2b02d4908a 100644 --- a/docs/admin/monitoring/notifications/index.md +++ b/docs/admin/monitoring/notifications/index.md @@ -67,45 +67,13 @@ flags. ### Configure OOM/OOD notifications -You can alert users when they overutilize memory and disk. +You can monitor out of memory (OOM) and out of disk (OOD) erros and alert users +when they overutilize memory and disk. This can help prevent agent disconnects due to OOM/OOD issues. -To enable OOM/OOD notifications on a template, use the -[`resources_monitoring`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#resources_monitoring-1) -block on the -[`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent) -resource in our Terraform provider. - -You can specify one or more volumes to monitor for OOD alerts. -OOM alerts are reported per-agent. - -Add the following example to the template's `main.tf`. -Change the `90`, `80`, and `95` to a threshold that's more appropriate for your -deployment: - -```hcl -resource "coder_agent" "main" { - arch = data.coder_provisioner.dev.arch - os = data.coder_provisioner.dev.os - resources_monitoring { - memory { - enabled = true - threshold = 90 - } - volume { - path = "/volume1" - enabled = true - threshold = 80 - } - volume { - path = "/volume2" - enabled = true - threshold = 95 - } - } -} -``` +To enable OOM/OOD notifications on a template, follow the steps in the +[resource monitoring guide](../../templates/extending-templates/resource-monitoring.md). ## Delivery Methods diff --git a/docs/admin/templates/extending-templates/resource-monitoring.md b/docs/admin/templates/extending-templates/resource-monitoring.md new file mode 100644 index 0000000000000..c3bdef387efc0 --- /dev/null +++ b/docs/admin/templates/extending-templates/resource-monitoring.md @@ -0,0 +1,42 @@ +# Resource monitoring + +Use the +[`resources_monitoring`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#resources_monitoring-1) +block on the +[`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent) +resource in our Terraform provider to monitor out of memory (OOM) and out of +disk (OOD) erros and alert users when they overutilize memory and disk. + +This can help prevent agent disconnects due to OOM/OOD issues. + +You can specify one or more volumes to monitor for OOD alerts. +OOM alerts are reported per-agent. + +## Example + +Add the following example to the template's `main.tf`. +Change the `90`, `80`, and `95` to a threshold that's more appropriate for your +deployment: + +```hcl +resource "coder_agent" "main" { + arch = data.coder_provisioner.dev.arch + os = data.coder_provisioner.dev.os + resources_monitoring { + memory { + enabled = true + threshold = 90 + } + volume { + path = "/volume1" + enabled = true + threshold = 80 + } + volume { + path = "/volume2" + enabled = true + threshold = 95 + } + } +} +``` diff --git a/docs/manifest.json b/docs/manifest.json index 3b49c2321ccef..af477f0f71d1d 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -389,6 +389,11 @@ "description": "Display resource state in the workspace dashboard", "path": "./admin/templates/extending-templates/resource-metadata.md" }, + { + "title": "Resource Monitoring", + "description": "Monitor resources in the workspace dashboard", + "path": "./admin/templates/extending-templates/resource-monitoring.md" + }, { "title": "Resource Ordering", "description": "Design the UI of workspaces", From d80568a819283538e8a4a2d34ea6ad90754d22ed Mon Sep 17 00:00:00 2001 From: EdwardAngert Date: Thu, 20 Feb 2025 20:27:24 +0000 Subject: [PATCH 3/6] typos --- docs/admin/monitoring/notifications/index.md | 2 +- docs/admin/templates/extending-templates/resource-monitoring.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/admin/monitoring/notifications/index.md b/docs/admin/monitoring/notifications/index.md index 7cd2b02d4908a..42330e821bd11 100644 --- a/docs/admin/monitoring/notifications/index.md +++ b/docs/admin/monitoring/notifications/index.md @@ -67,7 +67,7 @@ flags. ### Configure OOM/OOD notifications -You can monitor out of memory (OOM) and out of disk (OOD) erros and alert users +You can monitor out of memory (OOM) and out of disk (OOD) errors and alert users when they overutilize memory and disk. This can help prevent agent disconnects due to OOM/OOD issues. diff --git a/docs/admin/templates/extending-templates/resource-monitoring.md b/docs/admin/templates/extending-templates/resource-monitoring.md index c3bdef387efc0..500e20623a7ea 100644 --- a/docs/admin/templates/extending-templates/resource-monitoring.md +++ b/docs/admin/templates/extending-templates/resource-monitoring.md @@ -5,7 +5,7 @@ Use the block on the [`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent) resource in our Terraform provider to monitor out of memory (OOM) and out of -disk (OOD) erros and alert users when they overutilize memory and disk. +disk (OOD) errors and alert users when they overutilize memory and disk. This can help prevent agent disconnects due to OOM/OOD issues. From 33f41644548fb62bf7b9684262007bb77f67bf56 Mon Sep 17 00:00:00 2001 From: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com> Date: Mon, 24 Feb 2025 18:41:05 +0000 Subject: [PATCH 4/6] add smtp prereq --- .../templates/extending-templates/resource-monitoring.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/admin/templates/extending-templates/resource-monitoring.md b/docs/admin/templates/extending-templates/resource-monitoring.md index 500e20623a7ea..309abb16dfa7f 100644 --- a/docs/admin/templates/extending-templates/resource-monitoring.md +++ b/docs/admin/templates/extending-templates/resource-monitoring.md @@ -12,6 +12,10 @@ This can help prevent agent disconnects due to OOM/OOD issues. You can specify one or more volumes to monitor for OOD alerts. OOM alerts are reported per-agent. +## Prerequisites + +Configure Coder to [use an SMTP server](../../monitoring/notifications.md#smtp-email). + ## Example Add the following example to the template's `main.tf`. From 7a43571cda41a3c23c255c5c6cce2b6c1edecede Mon Sep 17 00:00:00 2001 From: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com> Date: Mon, 24 Feb 2025 18:42:46 +0000 Subject: [PATCH 5/6] notifications are sent through --- docs/admin/templates/extending-templates/resource-monitoring.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/admin/templates/extending-templates/resource-monitoring.md b/docs/admin/templates/extending-templates/resource-monitoring.md index 309abb16dfa7f..0106cc43d5ff3 100644 --- a/docs/admin/templates/extending-templates/resource-monitoring.md +++ b/docs/admin/templates/extending-templates/resource-monitoring.md @@ -14,6 +14,7 @@ OOM alerts are reported per-agent. ## Prerequisites +Notifications are sent through SMTP. Configure Coder to [use an SMTP server](../../monitoring/notifications.md#smtp-email). ## Example From 36dcc071a8d2d795f7bd1f94576d15bb36b2c065 Mon Sep 17 00:00:00 2001 From: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com> Date: Mon, 24 Feb 2025 18:49:26 +0000 Subject: [PATCH 6/6] fix link --- docs/admin/templates/extending-templates/resource-monitoring.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/templates/extending-templates/resource-monitoring.md b/docs/admin/templates/extending-templates/resource-monitoring.md index 0106cc43d5ff3..78ce1b61278e0 100644 --- a/docs/admin/templates/extending-templates/resource-monitoring.md +++ b/docs/admin/templates/extending-templates/resource-monitoring.md @@ -15,7 +15,7 @@ OOM alerts are reported per-agent. ## Prerequisites Notifications are sent through SMTP. -Configure Coder to [use an SMTP server](../../monitoring/notifications.md#smtp-email). +Configure Coder to [use an SMTP server](../../monitoring/notifications/index.md#smtp-email). ## Example