emergency-kexec

Okay, your system is completely broken, and you need to umount / or something like that. What do you do?

Motivation

One of our servers had a broken root filesystem (btrfs, don't judge me). Online recovery was not possible, so the filesystem needed to be unmounted which is not possible for the root fs. Additionally, as errors were detected, the kernel decided to mount it read only and didn't let me remount it as rw. IPMI? Yes, I had the password in my password store but not the username. So the only logical solution was to kexec into an emergency system. This code is what I used. It recovers all IP addresses as well as SSH host and user keys from the old system and kexecs into a new one - entirely in-memory.

What it does

The emergency script (found in the repository root) will SSH over and execute the following things:

Build the recovery image (a .tar.xz with a small nix store and a kexec script) from the files in this repository locally on the machine you're executing this code on
1. The system configuration is found in configuration.nix
2. Some kexec-related features are imported from kexec.nix
3. The scripts will be included to be used in the kexec script (see below)
Try to mkdir /nix and /tmp. If the don't already exist and your root fs is read-only, you have a problem this project can't fix
Mount a fresh tmpfs on /tmp because there might not be one already
scp the emergency image over and extract it
Mount the nix store from the emergency image over /nix using overlayfs
Run the kexec script

The kexec script (found in kexec.nix) will do the following:

Prepare a second initrd
Put your SSH host keys into the initrd
Put all of your SSH user keys into the initrd
Fetch all your IP addresses and routes and put them into the initrd
Pack the second initrd and append it to the default NixOS initrd from the emergency image
kexec into the kernel from the emergency image while using the new initrd
In case you didn't already notice: This will crash your currently running system, so maybe it's a good idea to gracefully shut down remaining daemons if that's still possible

The script that is packed into the initrd of the new system will do the following:

Place the SSH host key
Place the SSH user keys
Place a script for the IP addresses which will be executed using networking.localCommands so the interfaces are available

If you set the environment variable EMERGENCY_DUMP_NETWORK to 1, all IPs, routes, and nameservers will be placed in the emergency_ips, emergency_routes, and emergency_nameservers files, respectively.

How to use

$ ./emergency root@somehost
# or
$ ./emergency somebody@somehost

Disclaimer and license

If it doesn't work for you, I'm sorry. I can probably not help you, but if you're able to fix something, feel free to create a PR.

The code is based on clever's kexec nix-test (found here).

The code is licensed under the LGPL3.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configuration.nix		configuration.nix
emergency		emergency
ips.awk		ips.awk
kexec.nix		kexec.nix
routes.awk		routes.awk
ssh-keys		ssh-keys

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

emergency-kexec

Motivation

What it does

How to use

Disclaimer and license

About

Releases

Packages

Contributors 2

Languages

License

dasJ/emergency-kexec

Folders and files

Latest commit

History

Repository files navigation

emergency-kexec

Motivation

What it does

How to use

Disclaimer and license

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages