diff --git a/docs/admin/monitoring/notifications/index.md b/docs/admin/monitoring/notifications/index.md index eb077e13b38ed..42330e821bd11 100644 --- a/docs/admin/monitoring/notifications/index.md +++ b/docs/admin/monitoring/notifications/index.md @@ -29,14 +29,14 @@ These notifications are sent to the workspace owner: ### User Events -These notifications sent to users with **owner** and **user admin** roles: +These notifications are sent to users with **owner** and **user admin** roles: - User account created - User account deleted - User account suspended - User account activated -These notifications sent to users themselves: +These notifications are sent to users themselves: - User account suspended - User account activated @@ -48,6 +48,8 @@ These notifications are sent to users with **template admin** roles: - Template deleted - Template deprecated +- Out of memory (OOM) / Out of disk (OOD) + - [Configure](#configure-oomood-notifications) in the template `main.tf`. - Report: Workspace builds failed for template - This notification is delivered as part of a weekly cron job and summarizes the failed builds for a given template. @@ -63,6 +65,16 @@ flags. | ✔️ | `--notifications-method` | `CODER_NOTIFICATIONS_METHOD` | `string` | Which delivery method to use (available options: 'smtp', 'webhook'). See [Delivery Methods](#delivery-methods) below. | smtp | | -️ | `--notifications-max-send-attempts` | `CODER_NOTIFICATIONS_MAX_SEND_ATTEMPTS` | `int` | The upper limit of attempts to send a notification. | 5 | +### Configure OOM/OOD notifications + +You can monitor out of memory (OOM) and out of disk (OOD) errors and alert users +when they overutilize memory and disk. + +This can help prevent agent disconnects due to OOM/OOD issues. + +To enable OOM/OOD notifications on a template, follow the steps in the +[resource monitoring guide](../../templates/extending-templates/resource-monitoring.md). + ## Delivery Methods Notifications can currently be delivered by either SMTP or webhook. Each message @@ -135,7 +147,7 @@ for more options. After setting the required fields above: -1. Setup an account on Microsoft 365 or outlook.com +1. Set up an account on Microsoft 365 or outlook.com 1. Set the following configuration options: ```text diff --git a/docs/admin/templates/extending-templates/resource-monitoring.md b/docs/admin/templates/extending-templates/resource-monitoring.md new file mode 100644 index 0000000000000..78ce1b61278e0 --- /dev/null +++ b/docs/admin/templates/extending-templates/resource-monitoring.md @@ -0,0 +1,47 @@ +# Resource monitoring + +Use the +[`resources_monitoring`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#resources_monitoring-1) +block on the +[`coder_agent`](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent) +resource in our Terraform provider to monitor out of memory (OOM) and out of +disk (OOD) errors and alert users when they overutilize memory and disk. + +This can help prevent agent disconnects due to OOM/OOD issues. + +You can specify one or more volumes to monitor for OOD alerts. +OOM alerts are reported per-agent. + +## Prerequisites + +Notifications are sent through SMTP. +Configure Coder to [use an SMTP server](../../monitoring/notifications/index.md#smtp-email). + +## Example + +Add the following example to the template's `main.tf`. +Change the `90`, `80`, and `95` to a threshold that's more appropriate for your +deployment: + +```hcl +resource "coder_agent" "main" { + arch = data.coder_provisioner.dev.arch + os = data.coder_provisioner.dev.os + resources_monitoring { + memory { + enabled = true + threshold = 90 + } + volume { + path = "/volume1" + enabled = true + threshold = 80 + } + volume { + path = "/volume2" + enabled = true + threshold = 95 + } + } +} +``` diff --git a/docs/manifest.json b/docs/manifest.json index 3b49c2321ccef..af477f0f71d1d 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -389,6 +389,11 @@ "description": "Display resource state in the workspace dashboard", "path": "./admin/templates/extending-templates/resource-metadata.md" }, + { + "title": "Resource Monitoring", + "description": "Monitor resources in the workspace dashboard", + "path": "./admin/templates/extending-templates/resource-monitoring.md" + }, { "title": "Resource Ordering", "description": "Design the UI of workspaces",