A curated list of Site Reliability and Production Engineering resources.
-
Updated
Aug 28, 2025
E5A5
A curated list of Site Reliability and Production Engineering resources.
A collection of postmortems. Sorry for the delay in merging PRs!
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Compilation of public failure/horror stories related to Kubernetes
A curated list of Site Reliability and Production Engineering Tools
A collection of postmortem templates
Postmortem debugging tools for MinGW.
A curated list of awesome Site Reliability and Production Engineering resources.
An Incident Management Process / Post Mortem Template
Analysis of the major exploits that took place on the Ethereum blockchain
A Claude Code skill for structured retrospective analysis that transforms incidents into systematic improvements
How to run effective incident post-morterms
Selection of Development Templates
💀 🔥 ❄️ A basic analyzer for memory dumps containing managed code
Compilation of public failure/horror stories related to Kubernetes
Compilation of public incident/interesting/horror stories related to Kafka operations
Perform post-mortem Linux baselining and forensic analysis.
Post-mortem debugging tool for Python that provides direct access to variables and frames after exceptions occur. Rich tracebacks, frame inspection, and context execution without separate interactive shells.
AI CLI tool that generates highly professional, blame-free post-mortems from your git history in seconds.
Shell_basics
Add a description, image, and links to the post-mortem topic page so that developers can more easily learn about it.
To associate your repository with the post-mortem topic, visit your repo's landing page and select "manage topics."