A curated list of Site Reliability and Production Engineering resources.
-
Updated
Jun 10, 2024
8000
A curated list of Site Reliability and Production Engineering resources.
A collection of postmortem templates
A role-playing game for incident management training
A party card game for engineers caring about reliability. Based on Cards Against Humanity.
A curated list of awesome Site Reliability and Production Engineering resources.
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
A collection templates ported from the SRE Workbook
A list of common Disaster Recovery (DR) scenarios for software companies
An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE)
🔖 Daily-updated reading list for designing High Scalability 🍒, High Availability 🔥, High Stability 🗻 back-end systems - Pull requests are greatly welcome 👬 I hope you will find this project helpful 🍀 Please help me share it to more and more people ❤️ Thank you - 谢谢 - धन्यवाद - ধন্যবাদ - Спасибо - شكرا - Merci - Gracias - Danke - Cảm ơn! 🙇
Overall map of topics to cover for my “Engineering for Site Reliability” blog series.
Smartshield Infrastructure Guide
A .Net Standard library for working with the Uptime Robot API.
Prometheus Blackbox Exporter’ın kurulumu ve yapılandırılması üzerine bir rehber. HTTP, HTTPS, DNS, TCP ve ICMP üzerinden servislerin ulaşılabilirliğini test etmek için konfigürasyon dosyaları ve örnek kullanım senaryoları içerir.
Gerd by Onyx is a light-weight chaos monkey implementation for k8s (kubernetes)
Add a description, image, and links to the site-reliability topic page so that developers can more easily learn about it.
To associate your repository with the site-reliability topic, visit your repo's landing page and select "manage topics."