-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Closed
Description
Current Status
- ONGOING - waiting the approval for the deployment of extra resources
Error looks like
- Large queue times for macos-m2-15 instances
Incident timeline (all times pacific)
- [25-05-13 08:00] queue started to become significant
- [25-05-13 10:00] start to take remediation actions by the team
- [25-05-14 08:00] efforts to increase the fleet, given the load seems legitimate increase
- [25-05-14 14:08] instances fully deployed, efforts standing down
User impact
- Large queue times for macos-m2-15 instances
Root cause
- Weak monitoring for m2-15 instances and the decision to suppress alerts when instance usage is too high
Mitigation
NA
Prevention/followups
- Improve alerting when instance usage is high
- improve general monitoring for m2 instances
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done