Nixos in Production
Nixos in Production
Gabriella Gonzalez
This book is available at http://leanpub.com/nixos-in-production
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools
and many iterations to get reader feedback, pivot until you have the right book and build
traction once you do.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
10. Flakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Flakes, step-by-step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Flake-related commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
• What real-world use cases does NixOS address better than the alternatives?
• What does a mature NixOS enterprise look like?
• How do I smoothly migrate an organization to adopt NixOS?
• What potential pitfalls of NixOS should I be mindful to avoid?
• How can I effectively support and debug NixOS when things go wrong?
I’m writing this book because I cultivated years of professional experience doing all of the above,
back when no such resource existed. I learned NixOS the hard way and I’m writing this book so
that you don’t have to make the same mistakes I did.
Currently, most educational resources for NixOS (including the NixOS manual) are written with
desktop users in mind, whereas I view NixOS as far better suited as a production operating
system. This book attempts to fill that documentation gap by catering to professional NixOS
users instead of hobbyists.
Continue reading on if you want to use NixOS “for real” and build a career around one of the
hottest emerging DevOps technologies. This book will improve your NixOS proficiency and
outline a path towards using NixOS to improve your organization’s operational maturity and
reliability.
2. What is NixOS for?
Some NixOS users might try to “convert” others to NixOS using a pitch that goes something like
this:
NixOS is a Linux distribution built on top of the Nix package manager. It uses
declarative configuration and allows reliable system upgrades.
Source: Wikipedia - NixOS¹
This sort of feature-oriented description explains what NixOS does, but does not quite explain
what NixOS is for. What sort of useful things can you do with NixOS? When is NixOS the best
solution? What types of projects, teams, or organizations should prefer using NixOS over other
the alternatives?
Come to think of it, what are the alternatives? Is NixOS supposed to replace Debian? Or Docker?
Or Ansible? Or Vagrant? Where does NixOS fit in within the modern software landscape?
In this chapter I’ll help you better understand when you should recommend NixOS to others
and (just as important!) when you should gently nudge people away from NixOS. Hopefully this
chapter will improve your overall understanding of NixOS’s “niche”.
• NixOS expects users to be developers who are more hands-on with their system
NixOS does not come preinstalled on most computers and the installation guide assumes
quite a bit of technical proficiency. For example, NixOS is typically configured via text files
and upgrades are issued from the command line.
• The NixOS user experience differs from what most desktop users expect
Most desktop users (especially non-technical users) expect to install packages by either
downloading the package from the publisher’s web page or by visiting an “app store” of
some sort. They don’t expect to modify a text configuration file in order to install package.
¹https://en.wikipedia.org/wiki/NixOS
What is NixOS for? 3
However, the above limitations don’t apply when using NixOS as a server
operating system:
• End users can more easily self-serve if they stray from the beaten path
Server-oriented software is more likely to be open source than desktop-oriented software
and therefore easier to package.
NixOS is better suited for SaaS than on-prem deployments, because NixOS fares worse in
restricted network environments where network access is limited or unavailable.
You can still deploy NixOS for on-prem deployments and I will cover that in a later chapter, but
you will have a much better time using NixOS for SaaS deployments.
Virtualization
You might be interested in how NixOS fares with respect to virtualization or containers, so I’ll
break things down into these four potential use cases:
• Application containers
Containers technically do not need to run an entire operating system and can instead run
a single process (e.g. one service). You can do this using Nixpkgs, which provides support
for building application containers.
So which use cases are NixOS/Nixpkgs well-suited for? If I had to rank these deployment models
then my preference (in descending order) would be:
If your deployment model matches that outline then NixOS is not only a safe choice, but likely
the best choice! You will be in great company if you use NixOS in this way.
You can still use NixOS in other capacities, but the further you depart from the above “killer app”
the more you will need to roll up your sleeves.
DevOps is more of a set of cultural practices than a team, but some organizations
explicitly create a DevOps team or hire engineers for their DevOps expertise in order to
support tools (like NixOS) that enable those cultural practices.
You can use NixOS in conjunction with Docker containers since NixOS supports declaratively
launching containers, but you probably want to avoid buying further into the broader Docker
ecosystem if you use NixOS. You don’t want to be in a situation where your engineering
organization fragments and does everything in two different ways: the NixOS way and the
Docker way.
For those familiar with the Gentoo Linux distribution, NixOS is like Gentoo, but for
Docker⁶. Similar to Gentoo, NixOS is an operating system that provides unparalleled
control over the machine while targeting use cases and workflows similar to the Docker
ecosystem.
⁶The reason why is that writing a (meaningful) test for our TODO list example would require executing JavaScript using something
like Selenium, which will significantly increase the size of the example integration test. postgrest, on the other hand, is easier to test
from the command line.
3. The big picture
Before diving in further you might want to get some idea of what a “real” NixOS software
enterprise looks like. Specifically:
Here I’ll do my best to answer those questions so that you can get a better idea of what you
would be signing up for.
I say “at most one command” because some activities (like continuous deployment) should ideally
require no human intervention at all. However, activities that do require human intervention
should in principle be compressible into a single Nix command.
I can explain this by providing an example of a development workflow that disregards this master
cue:
Suppose that you want to test your local project’s changes within the context of some larger
system at work (i.e. an integration test¹). Your organization’s process for testing your code might
hypothetically look like this:
Now what if I told you that the entire integration testing process from start to finish could be:
¹https://en.wikipedia.org/wiki/Integration_testing
The big picture 9
In other words:
Some of these potential improvements are not specific to the Nix ecosystem. After all, you could
attempt to create a script that automates the more painstaking multi-step process. However, you
would likely need to reinvent large portions of the Nix ecosystem for this automation to be
sufficiently robust and efficient. For example:
• Do you generate unique labels for build products to isolate parallel workflows?
In the best case scenario, you label build products by a hash of their dependencies and
you’ve reinvented the Nix store’s hashing scheme. In the worst case scenario you’re doing
something less accurate (e.g. using timestamps in the labels instead of hashes).
• Do you have a custom script that updates references to these build products?
This would be reinventing Nix’s language support for automatically updating dependency
references.
You can save yourself a lot of headaches by taking time to learn and use the Nix ecosystem as
idiomatically as possible instead of learning these lessons the hard way.
The big picture 10
GitOps
NixOS exemplifies the Infrastructure as Code (IaC)² paradigm, meaning that every aspect of your
organization (including hardware/systems/software) is stored in code or configuration files that
are the source of truth for how everything is built. In particular, you don’t make undocumented
changes to your infrastructure that cause it to diverge from what is recorded within those files.
This book will espouse a specific flavor of Infrastructure of Code known as GitOps³ where:
DevOps
NixOS also exemplifies the DevOps⁴ principle of breaking down boundaries between software
developers (“Dev”) and operations (“Ops”). Specifically, NixOS goes further in this regard than
most other tools by unifying both software configuration and system configuration underneath
the NixOS option system. These NixOS options fall into roughly three categories:
• Systems configuration
These are options that are mostly interesting to operations engineers, such as:
– log rotation policies
– kernel boot parameters
– disk encryption settings
• Software configuration
These are options that are mostly interesting to software engineers, such as:
– Patches
– Command-line arguments
– Environment variables
In extreme cases, you can even embed non-Nix code inside of Nix and do “pure software
development”. In other words, you can author inline code written within another language inside
of a NixOS configuration file. I’ll include one example of this later on in the “Our first web server”
chapter.
Architecture
A NixOS-centric architecture tends to have the following key pieces of infrastructure:
• Version control
If you’re going to use GitOps then you had better use git! More specifically, you’ll likely use
a git hosting provider like GitHub⁵ or GitLab⁶ which supports pull requests and continuous
integration.
Most companies these days use version control, so this is not a surprising requirement.
• Product servers
These are the NixOS servers that actually host your product-related services.
• A cache
In simpler setups the “hub” can double as a cache, but as you grow you will likely want to
upload build products to a dedicated cache.
⁵https://github.com/
⁶https://about.gitlab.com/
The big picture 12
Moreover, you will either need a cloud platform (e.g. AWS⁷) or data center for hosting these
machines. In this book we’ll primarily focus on hosting infrastructure on AWS.
These are not the only components you will need to build out your product, but these should be
the only components necessary to support DevOps workflows, including continuous integration
and continuous deployment.
Notably absent from the above list are:
• Container-specific infrastructure
A NixOS-centric architecture already mitigates some of the need for containerizing services,
but the architecture doesn’t change much even if you do use containers, because containers
can be built by Nixpkgs, distributed via the cache, and declaratively deployed to any NixOS
machine.
• Programming-language-specific infrastructure
If Nixpkgs supports a given language then we require no additional infrastructure to support
building and deploying that language. However, we might still host language-specific
amenities on our utility server, such as generated documentation.
• Continuous-deployment services
NixOS provides out-of-the-box services that we can use for continuous deployment, which
we will cover in a later chapter.
Scope
So far I’ve explained NixOS in high-level terms, but you might prefer a more down-to-earth
picture of the day-to-day requirements and responsibilities for a professional NixOS user.
To that end, here is a checklist that will summarize what you would need to understand in order
to effectively introduce and support NixOS within an organization:
• Infrastructure setup
– Continuous integration
– Builders
– Caching
• Development
– NixOS module system
– Project organization
– NixOS best practices
– Quality controls
• Testing
– Running virtual machines
– Automated testing
• Deployment
– Provisioning a new system
– Upgrading a system
– Dealing with restricted networks
• System administration
– Infrastructure as code
– Disk management
– Filesystem
– Networking
– Users and authentication
– Limits and quotas
• Security
– System hardening
– Patching dependencies
• Diagnostics and Debugging
– Nix failures
– Test failures
– Production failures
– Useful references
• Fielding inquiries
– System settings
The big picture 14
– Licenses
– Vulnerabilities
• Non-NixOS Integrations
– Images
– Containers
This book will cover all of the above topics and more, although they will not necessarily be
grouped or organized in that exact order.
4. Setting up your development
environment
I’d like you to be able to follow along with the examples in this book, so this chapter provides
a quick setup guide to bootstrap from nothing to deploying a blank NixOS system that you can
use for experimentation.
Installing Nix
In order to follow along with this book you will need the following requirements:
You’ve likely already installed Nix if you’re reading this book, but I’ll still cover how to do this
because I have a few tips to share that can help you author a more reliable installation script for
your colleagues.
Needless to say, if you or any of your colleagues are using NixOS as your development operating
system then you don’t need to install Nix and you can skip to the Running a NixOS Virtual
Machine section below.
Default installation
If you go to the download page for Nix² it will tell you to run something similar to this:
$ sh <(curl --location https://nixos.org/nix/install)
Throughout this book I’ll use consistently long option names instead of short names (e.g.
--location instead of -L), for two reasons:
For example, tar --extract --file is clearer and a better mnemonic than tar xf.
You may freely use shorter option names if you prefer, though, but I still highly
recommend using long option names at least for non-interactive scripts.
¹https://nixos.org/manual/nix/stable/command-ref/conf-file.html
²https://nixos.org/download.html
Setting up your development environment 16
Depending on your platform the download instructions might also tell you to pass the --daemon
or --no-daemon option to the installation script to specify a single-user or multi-user installation.
For simplicity, the instructions in this chapter will omit the --daemon / --no-daemon flag, but
keep in mind the following platform-specific advice:
$ VERSION='2.18.1'
$ URL="https://releases.nixos.org/nix/nix-${VERSION}/install"
$ sh <(curl --location "${URL}")
… and you can find the full set of available releases by visiting the release file server³.
Feel free to use a Nix version newer than 2.18.1 if you want. The above example
installation script only pins the version 2.18.1 because that’s what happened to be the
latest stable version at the time of this writing. That’s also the Nix version that the
examples from this book have been tested against.
The only really important thing is that everyone within your organization uses the same
version of Nix, if you want to minimize your support burden.
However, there are a few more options that the script accepts that we’re going to make good use
of, and we can list those options by supplying --help to the script:
$ VERSION='2.18.1'
$ URL="https://releases.nixos.org/nix/nix-${VERSION}/install"
$ sh <(curl --location "${URL}") --help
³https://releases.nixos.org/?prefix=nix/
Setting up your development environment 17
--daemon: Installs and configures a background daemon that manages the store,
providing multi-user support and better isolation for local builds.
Both for security and reproducibility, this method is recommended if
supported on your platform.
See https://nixos.org/manual/nix/stable/installation/installing-binary.html#multi-user-i\
nstallation
--no-daemon: Simple, single-user installation that does not require root and is
trivial to uninstall.
(default)
You might wonder if you can use the --tarball-url-prefix option for distributing a
custom build of Nix, but that’s not what this option is for. You can only use this option
to download Nix from a different location (e.g. an internal mirror), because the new
download still has to match the same integrity check as the old download.
Don’t worry, though; there still is a way to distribute a custom build of Nix, and we’ll
cover that in a later chapter.
• --nix-extra-conf-file
This lets you extend the installed nix.conf if you want to make sure that all users within
your organization share the same settings.
• --no-channel-add
You can (and should) enable this option within a professional organization to disable the
preinstallation of any channels.
These two options are crucial because we are going to use them to systematically replace Nix
channels with flakes.
Setting up your development environment 18
Nix channels are a trap and I treat them as a legacy Nix feature poorly suited for
professional development, despite how ingrained they are in the Nix ecosystem.
The issue with channels is that they essentially introduce impurity into your builds by
depending on the NIX_PATH and there aren’t great solutions for enforcing that every Nix
user or every machine within your organization has the exact same NIX_PATH.
Moreover, Nix now supports flakes, which you can think of as a more modern alternative
to channels. Familiarity with flakes is not a precondition to reading this book, though:
I’ll teach you what you need to know.
$ VERSION='2.18.1'
$ URL="https://releases.nixos.org/nix/nix-${VERSION}/install"
$ CONFIGURATION="
extra-experimental-features = nix-command flakes repl-flake
extra-trusted-users = ${USER}
"
$ sh <(curl --location "${URL}") \
--no-channel-add \
--nix-extra-conf-file <(<<< "${CONFIGURATION}")
The prior script only works if your shell is Bash or Zsh and all shell commands
throughout this book assume the use of one of those two shells.
For example, the above command uses support for process substitution (which is not
available in a POSIX shell environment) because otherwise we’d have to create a
temporary file to store the CONFIGURATION and clean up the temporary file afterwards
(which is tricky to do 100% reliably). Process substitution is also more reliable than a
temporary file because it happens entirely in memory and the intermediate result can’t
be accidentally deleted.
macOS-specific instructions
If you are using macOS, then follow the instructions in the Nixpkgs manual⁴ to set up a local
Linux builder. We’ll need this builder to create other NixOS machines, since they require Linux
build products.
In particular, you will need to leave that builder running in the background while following the
remaining examples in this chapter. In other words, in one terminal window you will need to
run:
… and you will need that to be running whenever you need to build a NixOS system. However,
you can shut down the builder when you’re not using it by giving the builder the shutdown now
command.
If you are using Linux (including NixOS or the Windows Subsystem for Linux) you can skip to
the next step.
Platform-independent instructions
Run the following command to generate your first project:
{ inputs = {
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
nixpkgs.url = "github:NixOS/nixpkgs/23.11";
};
# https://github.com/utmapp/UTM/issues/2353
networking.nameservers = lib.mkIf pkgs.stdenv.isDarwin [ "8.8.8.8" ];
⁴https://nixos.org/manual/nixpkgs/stable/#sec-darwin-builder
Setting up your development environment 20
virtualisation = {
graphics = false;
machine = nixpkgs.lib.nixosSystem {
system = builtins.replaceStrings [ "darwin" ] [ "linux" ] system;
${machine.config.system.build.vm}/bin/run-nixos-vm
'';
in
{ packages = { inherit machine; };
apps.default = {
type = "app";
program = "${program}";
};
}
);
}
# module.nix
{ services.getty.autologinUser = "root";
}
Then run this command within the same directory to run our test virtual machine:
$ nix run
warning: creating lock file '…/flake.lock'
trace: warning: system.stateVersion is not set, defaulting to 23.11. …
…
[root@nixos:~]#
You can then shut down the virtual machine by entering shutdown now.
Setting up your development environment 21
If you’re unable to shut down the machine gracefully for any reason you can shut down
the machine non-gracefully by typing Ctrl-a + c to open the qemu prompt and then
entering quit to exit.
Also, don’t worry about the system.stateVersion warning for now. We’ll fix that later.
If you were able to successfully launch and shut down the virtual machine then you’re ready to
follow along with the remaining examples throughout this book. If you see an example in this
book that begins with this line:
# module.nix
… then that means that I want you to save that example code to the module.nix file and then
restart the virtual machine by running nix run.
For example, let’s test that right now; save the following file to module.nix:
# module.nix
{ services.getty.autologinUser = "root";
services.postgresql.enable = true;
}
… then start the virtual machine and log into the machine. As the root user, run:
postgres=#
Hello, world!
We’ll begin from the template project from “Setting up your development environment”. You can
either begin from the previous chapter by running the following command (if you haven’t done
so already):
… or if you want to skip straight to the final result at the end of this chapter you can run:
Let’s modify module.nix to specify a machine that serves a simple static “Hello, world!” page on
http://localhost:
# module.nix
{ pkgs, ... }:
{ services = {
getty.autologinUser = "root";
nginx = {
enable = true;
virtualHosts.localhost.locations."/" = {
index = "index.html";
networking.firewall.allowedTCPPorts = [ 80 ];
virtualisation.forwardPorts = [
{ from = "host"; guest.port = 80; host.port = 8080; }
];
system.stateVersion = "23.11";
}
You always want to specify a system state version that matches the starting version of
Nixpkgs for that machine and never change it afterwards. In other words, even if you
upgrade Nixpkgs later on you would keep the state version the same.
Nixpkgs uses the state version to migrate your NixOS system because in order to migrate
your system each migration needs to know where your system started from.
Two common mistakes NixOS users sometimes make are:
If you deploy that using nix run you can open the web page in your browser by visiting
http://localhost:8080¹ which should display the following contents:
Hello, world!
In general I don’t recommend testing things by hand like this. Remember the “master
cue”:
In a later chapter we’ll cover how to automate this sort of testing using NixOS’s support
for integration tests. These tests will also take care of starting up and tearing down the
virtual machine for you so that you don’t have to do that by hand either.
DevOps
The previous example illustrates how NixOS promotes DevOps on a small scale. If the inline
web page represents the software development half of the project (the “Dev”) and the nginx
configuration represents the operational half of the project (the “Ops”) then we can in principle
store both the “Dev” and the “Ops” halves of our project within the same file. As an extreme
example, we can even template the web page with system configuration options!
# module.nix
{ services = {
getty.autologinUser = "root";
nginx = {
enable = true;
virtualHosts.localhost.locations."/" = {
index = "index.html";
<ul>
${
let
renderPort = port: "<li>${toString port}</li>\n";
in
¹http://localhost:8080
Our first web server 25
networking.firewall.allowedTCPPorts = [ 80 ];
virtualisation.forwardPorts = [
{ from = "host"; guest.port = 80; host.port = 8080; }
];
system.stateVersion = "23.11";
}
If you restart the machine and refresh http://localhost:8080² the page should now display:
• 80
There are less roundabout ways to query our system’s configuration that don’t involve
serving a web page. For example, using the same flake.nix file we can more directly
query the open ports using:
TODO list
Now we’re going to create the first prototype of a toy web application: a TODO list implemented
entirely in client-side JavaScript (later on we’ll add a backend service).
Create a subdirectory named www within your current directory:
$ mkdir www
… and then save a file named index.html with the following contents underneath that subdirec-
tory:
²http://localhost:8080
Our first web server 26
<html>
<body>
<button id='add'>+</button>
</body>
<script>
let add = document.getElementById('add');
function newTask() {
let subtract = document.createElement('button');
subtract.textContent = "-";
let input = document.createElement('input');
input.setAttribute('type', 'text');
let div = document.createElement('div');
div.replaceChildren(subtract, input);
function remove() {
div.replaceChildren();
div.remove();
}
subtract.addEventListener('click', remove);
add.before(div);
}
add.addEventListener('click', newTask);
</script>
</html>
In other words, the above file should be located at www/index.html relative to the directory
containing your module.nix file.
Now save the following NixOS configuration to module.nix:
# module.nix
{ services = {
getty.autologinUser = "root";
nginx = {
enable = true;
virtualHosts.localhost.locations."/" = {
index = "index.html";
root = ./www;
};
};
};
networking.firewall.allowedTCPPorts = [ 80 ];
virtualisation.forwardPorts = [
{ from = "host"; guest.port = 80; host.port = 8080; }
];
system.stateVersion = "23.11";
}
If you restart the virtual machine and refresh the web page you’ll see a single + button:
Our first web server 27
Each time you click the + button it will add a TODO list item consisting of:
virtualisation.sharedDirectories.www = {
source = "$WWW";
target = "/var/www";
};
virtualHosts.localhost.locations."/" = {
index = "index.html";
root = "/var/www";
};
Finally, restart the machine, except with a slightly modified version of our original nix run
command:
Now, we only need to refresh the page to view any changes we make to index.html and we no
longer need to restart the virtual machine.
Exercise: Add a “TODO list” heading (i.e. <h1>TODO list</h1>)to the web page and refresh the
page to confirm that your changes took effect.
6. NixOS option definitions
By this point in the book you may have copied and pasted some NixOS code, but perhaps you
don’t fully understand what is going on, especially if you’re not an experienced NixOS user. This
chapter will slow down and help you solidify your understanding of the NixOS module system
so that you can improve your ability to read, author, and debug modules.
Throughout this book I’ll consistently use the following terminology to avoid ambiguity:
In this chapter and the next chapter we’ll focus mostly on option definitions and later
on we’ll cover option declarations in more detail.
# Module arguments which our system can use to refer to its own configuration
{ config, lib, pkgs, ... }:
In other words, in the fully general case a NixOS module is a function whose output is an attribute
set with three attributes named imports, options, and config.
Nix supports data structures known “attribute sets” which are analogous to “maps” or
“records” in other programming languages.
To be precise, Nix uses the following terminology:
I’m explaining all of this because I’ll use the terms “attribute set”, “attribute”, and
“attribute path” consistently throughout the text to match Nix’s official terminology
(even though no other language uses those terms).
Syntactic sugar
All elements of a NixOS module are optional and NixOS supports “syntactic sugar” to simplify
several common cases. For example, you can omit the module arguments if you don’t use them:
{ imports = [
…
];
options = {
…
};
config = {
…
};
}
You can also omit any of the imports, options, or config attributes, too, like in this module,
which only imports other modules:
NixOS option definitions 30
{ imports = [
./physical.nix
./logical.nix
];
}
{ config = {
services = {
apache-kafka.enable = true;
zookeeper.enable = true;
};
};
}
Additionally, the NixOS module system provides special support for modules which only define
options by letting you elide the config attribute and promote the options defined within to the
“top level”. As an example, we can simplify the previous NixOS module to this:
{ services = {
apache-kafka.enable = true;
zookeeper.enable = true;
};
}
You might wonder if there should be some sort of coding style which specifies whether
people should include or omit these elements of a NixOS module. For example, perhaps
you might require that all elements are present, for consistency, even if they are empty
or unused.
My coding style for NixOS modules is:
The NixOS module system is a domain-specific language implemented within the Nix pro-
gramming language. Specifically, the NixOS module system is (mostly) implemented within the
lib/modules.nix file included in Nixpkgs¹. If you ever receive a stack trace related to the NixOS
module system you will often see functions from modules.nix show up in the stack trace, because
they are ordinary functions and not language features.
In fact, a NixOS module in isolation is essentially “inert” from the Nix language’s point of view.
For example, if you save the following NixOS module to a file named example.nix:
{ config = {
services.openssh.enable = true;
};
}
… and you evaluate that, the result will be the same, just without the syntactic sugar:
The Nix programming language provides “syntactic sugar” for compressing nested
attributes by chaining them using a dot (.). In other words, this Nix expression:
{ config = {
services.openssh.enable = true;
};
}
{ config = {
services = {
openssh = {
enable = true;
};
};
};
}
… and they are both also the same thing as this Nix expression:
{ config.services.openssh.enable = true; }
Note that this syntactic sugar is a feature of the Nix programming language, not the
NixOS module system. In other words, this feature works even for Nix expressions that
are not destined for use as NixOS modules.
¹https://github.com/NixOS/nixpkgs/blob/23.11/lib/modules.nix
NixOS option definitions 32
{ config, ... }:
{ config = {
services.apache-kafka.enable = config.services.zookeeper.enable;
};
}
… is just a function. If we save that to example.nix and then evaluate that the interpreter will
simply say that the file evaluates to a “lambda” (an anonymous function):
… although we can get a more useful result within the nix repl by calling our function on a
sample argument:
$ nix repl
…
nix-repl> example = import ./example.nix
nix-repl> :p output
{ config = { services = { apache-kafka = { enable = true; }; }; }; }
nix-repl> output.config.services.apache-kafka.enable
true
This illustrates that our NixOS module really is just a function whose input is an attribute set
and whose output is also an attribute set. There is nothing special about this function other than
it happens to be the same shape as what the NixOS module system accepts.
NixOS
So if NixOS modules are just pure functions or pure attribute sets, what turns those functions
or attribute sets into a useful operating system? In other words, what puts the “NixOS” in the
“NixOS module system”?
The answer is that this actually happens in two steps:
• All NixOS modules your system depends on are combined into a single, composite
attribute set
In other words all of the imports, options declarations, and config settings are fully
resolved, resulting in one giant attribute set. The code for combining these modules lives
in lib/modules.nix² in Nixpkgs.
²https://github.com/NixOS/nixpkgs/blob/23.11/lib/modules.nix
NixOS option definitions 33
• The final composite attribute set contains a special attribute that builds
the system
Specifically, there will be a config.system.build.toplevel attribute path which contains
a derivation you can use to build a runnable NixOS system. The top-level code for assem-
bling an operating system lives in nixos/modules/system/activation/top-level.nix³ in
Nixpkgs.
This will probably make more sense if we use the NixOS module system ourselves to create a
fake placeholder value that will stand in for a real operating system.
First, we’ll create our
own top-level.nix module that will include a fake
config.system.build.toplevel attribute path that is a string instead of a derivation for
building an operating system:
# top-level.nix
{ imports = [ ./other.nix ];
options = {
system.build.toplevel = lib.mkOption {
description = "A fake NixOS, modeled as a string";
type = lib.types.str;
};
};
config = {
system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
};
}
{ lib, ... }:
{ options = {
system.nixos.release = lib.mkOption {
description = "The NixOS version";
type = lib.types.str;
};
};
config = {
system.nixos.release = "23.11";
};
}
We can then materialize the final composite attribute set like this:
³https://github.com/NixOS/nixpkgs/blob/23.11/nixos/modules/system/activation/top-level.nix
NixOS option definitions 34
nix-repl> :p result.config
{ system = { build = { toplevel = "Fake NixOS - version 23.11"; }; nixos = { release = "23.11"; }; }; }
nix-repl> result.config.system.build.toplevel
"Fake NixOS - version 23.11"
In other words, lib.evalModules is the magic function that combines all of our NixOS modules
into a composite attribute set.
NixOS essentially does the same thing as in the above example, except on a much larger scale.
Also, in a real NixOS system the final config.system.build.toplevel attribute path stores a
buildable derivation instead of a string.
Recursion
The NixOS module system lets modules refer to the final composite configuration using the
config function argument that is passed into every NixOS module. For example, this is how our
top-level.nix module was able to refer to the system.nixos.release option that was set in the
other.nix module:
{ …
config = {
system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
# |
# … which we can use within our configuration
};
}
You’re not limited to referencing configuration values set in other NixOS modules; you can even
reference configuration values set within the same module. In other words, NixOS modules
support recursion⁴ where modules can refer to themselves.
As a concrete example of recursion, we can safely merge the other.nix module into the top-
level.nix module:
⁴https://en.wikipedia.org/wiki/Recursion
NixOS option definitions 35
{ options = {
system.build.toplevel = lib.mkOption {
description = "A fake NixOS, modeled as a string";
type = lib.types.str;
};
system.nixos.release = lib.mkOption {
description = "The NixOS version";
type = lib.types.str;
};
};
config = {
system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
system.nixos.release = "23.11";
};
}
… and this would still work, even though this module now refers to its own configuration values.
The Nix interpreter won’t go into an infinite loop because the recursion is still well-founded.
We can better understand why this recursion is well-founded by simulating how
lib.evalModules works by hand. Conceptually what lib.evalModules does is:
We’ll walk through this by performing the same steps as lib.evalModules. First, to simplify
things we’ll consolidate the prior example into a single flake that we can evaluate as we go:
{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/23.11";
⁵https://en.wikipedia.org/wiki/Fixed_point_(mathematics)
NixOS option definitions 36
topLevel =
{ config, lib, ... }:
{ imports = [ other ];
options.system.build.toplevel = lib.mkOption {
description = "A fake NixOS, modeled as a string";
type = lib.types.str;
};
config.system.build.toplevel =
"Fake NixOS - version ${config.system.nixos.release}";
};
in
nixpkgs.lib.evalModules { modules = [ topLevel ]; };
}
Various nix commands (like nix eval) take a flake reference as an argument which has
the form:
${URI}#${ATTRIBUTE_PATH}
In the previous example, the URI was ./evalModules (a file path in this case) and the
ATTRIBUTE_PATH was config.system.build.toplevel.
However, if you use zsh as your shell with EXTENDED_GLOB glob support (i.e. setopt
extended_glob) then zsh interprets # as a special character. This is why all of the
examples from this book quote the flake reference as a precaution, but if you’re not
using zsh or its extended globbing support then you can remove the quotes, like this:
… this can happen because you created the ./evalModules directory inside of a git
repository. When you use flakes inside of a repository you need to explicitly add them
and all files they depend on to the repository using:
… which comes in handy if you don’t plan to ever actually commit the file.
The first thing that lib.evalModules does is to merge the other module into the topLevel
module, which we will simulate by hand by performing the same merge ourselves:
{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/23.11";
in
nixpkgs.lib.evalModules { modules = [ topLevel ]; };
}
After that we compute the fixed point of our module by passing the module’s output as its own
input, the same way that evalModules would:
NixOS option definitions 38
{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/23.11";
result = topLevel {
inherit (result) config options;
inherit (nixpkgs) lib;
};
in
result;
}
This walkthrough grossly oversimplifies what evalModules does. For starters, we’ve
completely ignored how evalModules uses the options declarations to:
The last step is that when nix eval accesses the config.system.build.toplevel field of the
result, the Nix interpreter conceptually performs the following substitutions:
result.config.system.build.toplevel
So even though our NixOS module is defined recursively in terms of itself, that recursion is still
well-founded and produces an actual result.
7. Advanced option definitions
NixOS option definitions are actually much more sophisticated than the previous chapter let on
and in this chapter we’ll cover some common tricks and pitfalls.
Make sure that you followed the instructions from the “Setting up your development environ-
ment” chapter if you would like to test the examples in this chapter.
Imports
The NixOS module system lets you import other modules by their path, which merges their
option declarations and option definitions with the current module. But, did you know that the
elements of an imports list don’t have to be paths?
You can put inline NixOS configurations in the imports list, like these:
{ imports = [
{ services.openssh.enable = true; }
{ services.getty.autologinUser = "root"; }
];
}
… and they will behave as if you had imported files with the same contents as those inline
configurations.
In fact, anything that is a valid NixOS module can go in the import list, including NixOS modules
that are functions:
{ imports = [
{ services.openssh.enable = true; }
I will make use of this trick in a few examples below, so that we can simulate modules importing
other modules within a single file.
lib utilities
Nixpkgs provides several utility functions for NixOS modules that are stored underneath the
“lib” hierarchy, and you can find the source code for those functions in lib/modules.nix¹.
¹https://github.com/NixOS/nixpkgs/blob/23.11/lib/modules.nix
Advanced option definitions 42
If you want to become a NixOS module system expert, take the time to read and
understand all of the code in lib/modules.nix.
Remember that the NixOS module system is implemented as a domain-specific language
in Nix and lib/modules.nix contains the implementation of that domain-specific
language, so if you understand everything in that file then you understand essentially
all that there is to know about how the NixOS module system works under the hood.
That said, this chapter will still try to explain things enough so that you don’t have to
read through that code.
You do not need to use or understand all of the functions in lib/modules.nix, but you do need
to familiarize yourself with the following four primitive functions:
• lib.mkMerge
• lib.mkOverride
• lib.mkIf
• lib.mkOrder
By “primitive”, I mean that these functions cannot be implemented in terms of other functions.
They all hook into special behavior built into lib.evalModules.
mkMerge
The lib.mkMerge function merges a list of “configuration sets” into a single “configuration
set” (where “configuration set” means a potentially nested attribute set of configuration option
settings).
For example, the following NixOS module:
{ lib, ... }:
{ config = lib.mkMerge [
{ services.openssh.enable = true; }
{ services.getty.autologinUser = "root"; }
];
}
{ config = {
services.openssh.enable = true;
services.getty.autologinUser = "root";
};
}
Advanced option definitions 43
You might wonder whether you should merge modules using lib.mkMerge or merge
them using the imports list. After all, we could have also written the previous mkMerge
example as:
{ imports = [
{ services.openssh.enable = true; }
{ services.getty.autologinUser = "root"; }
];
}
… and that would have produced the same result. So which is better?
The short answer is: lib.mkMerge is usually what you want.
The long answer is that the main trade-off between imports and lib.mkMerge is:
• The imports section can merge NixOS modules that are functions
lib.mkMerge can only merge configuration sets and not functions.
The latter point is why you should typically prefer using lib.mkMerge.
Merging options
You can merge configuration sets that define same option multiple times, like this:
{ lib, ... }:
{ config = lib.mkMerge [
{ networking.firewall.allowedTCPPorts = [ 80 ]; }
{ networking.firewall.allowedTCPPorts = [ 443 ]; }
];
}
… and the outcome of merging two identical attribute paths depends on the option’s “type”.
For example, the networking.firewall.allowedTCPPorts option’s type is:
If you specify a list-valued option twice, the lists are combined, so the above example reduces to
this:
Advanced option definitions 44
{ lib, ... }:
{ config = lib.mkMerge [
{ networking.firewall.allowedTCPPorts = [ 80 443 ]; }
];
}
… and we can even prove that by querying the final value of the option from the command line:
However, you might find the nix repl more convenient if you prefer to interactively browse the
available options. Run this command:
… which will load your NixOS system into the REPL and now you can use tab-completion to
explore what is available:
nix-repl> config.<TAB>
config.appstream config.nix
config.assertions config.nixops
…
nix-repl> config.networking.<TAB>
config.networking.bonds
config.networking.bridges
…
nix-repl> config.networking.firewall.<TAB>
config.networking.firewall.allowPing
config.networking.firewall.allowedTCPPortRanges
…
nix-repl> config.networking.firewall.allowedTCPPorts
[ 80 443 ]
Exercise: Try to save the following NixOS module to module.nix, which specifies the
same option twice without using lib.mkMerge:
{ lib, ... }:
{ config = {
networking.firewall.allowedTCPPorts = [ 80 ];
networking.firewall.allowedTCPPorts = [ 443 ];
};
}
This will fail to deploy. Do you understand why? Specifically, is the failure a limitation
of the NixOS module system or the Nix programming language?
You can also nest lib.mkMerge underneath an attribute. For example, this:
Advanced option definitions 45
{ config = lib.mkMerge [
{ networking.firewall.allowedTCPPorts = [ 80 ]; }
{ networking.firewall.allowedTCPPorts = [ 443 ]; }
];
}
{ config.networking = lib.mkMerge [
{ firewall.allowedTCPPorts = [ 80 ]; }
{ firewall.allowedTCPPorts = [ 443 ]; }
];
}
{ config.networking.firewall = lib.mkMerge [
{ allowedTCPPorts = [ 80 ]; }
{ allowedTCPPorts = [ 443 ]; }
];
}
{ config.networking.firewall.allowedTCPPorts = [ 80 443 ]; }
Conflicts
Duplicate options cannot necessarily always be merged. For example, if you merge two
configuration sets that disagree on whether to enable a service:
{ lib, ... }:
{ config = {
services.openssh.enable = lib.mkMerge [ true false ];
};
}
This is because services.openssh.enable is declared to have a boolean type, and you can only
merge multiple boolean values if all occurrences agree. You can verify this yourself by changing
both occurrences to true, which will fix the error.
As a general rule of thumb:
• Most complex option types will successfully merge in the obvious way
e.g. lists will be concatenated and attribute sets will be combined.
The most common exception to this rule of thumb is the “lines” type (lib.types.lines), which
is a string option type that you can define multiple times. services.zookeeper.extraConf is an
example of one such option that has this type:
{ lib, ... }:
{ config = {
services.zookeeper = {
enable = true;
… and merging multiple occurrences of that option concatenates them as lines by inserting an
intervening newline character:
mkOverride
The lib.mkOverride function specifies the “priority” of an option definition, which comes in
handy if you want to override a configuration value that another NixOS module already defined.
This most commonly comes up when we need to override an option that was already defined by
one of our dependencies (typically a NixOS module provided by Nixpkgs). One example would
be overriding the restart frequency of nginx:
Advanced option definitions 47
{ config = {
services.nginx.enable = true;
systemd.services.nginx.serviceConfig.RestartSec = "5s";
};
}
The problem is that when we enable nginx that automatically defines a whole bunch of other
NixOS options, including systemd.services.nginx.serviceConfig.RestartSec². This option is
a scalar string option that disallows multiple distinct values because the NixOS module system
by default has no way to known which one to pick to resolve the conflict.
However, we can use mkOverride to annotate our value with a higher priority so that it overrides
the other conflicting definition:
{ lib, ... }:
{ config = {
services.nginx.enable = true;
… and now that works, since we specified a new priority of 50 that takes priority over the default
priority of 100. There is also a pre-existing utility named lib.mkForce which sets the priority to
50, so we could have also used that instead:
{ lib, ... }:
{ config = {
services.nginx.enable = true;
²https://github.com/NixOS/nixpkgs/blob/23.11/nixos/modules/services/web-servers/nginx/default.nix#L1234
Advanced option definitions 48
{ lib, ... }:
{ config = {
services.nginx.enable = true;
That is not equivalent, because it overrides not only the RestartSec attribute, but also all
other attributes underneath the serviceConfig attribute (like Restart, User, and Group,
all of which are now gone).
You always want to narrow your use of lib.mkForce as much as possible to protect
against this common mistake.
The default priority is 100 and lower numeric values actually represent higher priority. In other
words, an option definition with a priority of 50 takes precedence over an option definition with
a priority of 100.
Yes, the NixOS module system confusingly uses lower numbers to indicate higher priorities, but
in practice you will rarely see explicit numeric priorities. Instead, people tend to use derived
utilities like lib.mkForce or lib.mkDefault which select the appropriate numeric priority for
you.
In extreme cases you might still need to specify an explicit numeric priority. The most common
example is when one of your dependencies already define an option using lib.mkForce and
you need to override that. In that scenario you could use lib.mkOverride 49, which would take
precedence over lib.mkForce
{ lib, ... }:
{ config = {
services.nginx.enable = true;
systemd.services.nginx.serviceConfig.RestartSec = lib.mkMerge [
(lib.mkForce "5s")
(lib.mkOverride 49 "3s")
];
};
}
The default values for options also have a priority, which is priority 1500 and there’s a
lib.mkOptionDefault that sets a configuration value to that same priority.
That means that a NixOS module like this:
Advanced option definitions 49
{ lib, ... }:
{ options.foo = lib.mkOption {
default = 1;
};
}
{ lib, ... }:
{ options.foo = lib.mkOption { };
config.foo = lib.mkOptionDefault 1;
}
However, you will more commonly use lib.mkDefault which defines a configuration option
with priority 1000. Typically you’ll use lib.mkDefault if you want to override the default value
of an option, while still allowing a downstream user to override the option yet again at the
normal priority (100).
mkIf
mkIf is far-and-away the most widely used NixOS module primitive, because you can use mkIf
to selectively enable certain options based on the value of another option.
An extremely common idiom from Nixpkgs is to use mkIf in conjunction with an enable option,
like this:
# module.nix
let
# Pretend that this came from another file
cowsay =
{ config, lib, pkgs, ... }:
{ options.services.cowsay = {
enable = lib.mkEnableOption "cowsay";
greeting = lib.mkOption {
description = "The phrase the cow will greet you with";
type = lib.types.str;
};
};
}
in
{ imports = [ cowsay ];
config = {
services.cowsay.enable = true;
services.getty.autologinUser = "root";
};
}
If you launch the above NixOS configuration you should be able to verify that the cowsay service
is running like this:
You might wonder why we need a mkIf primitive at all. Couldn’t we use an if expression like
this instead?
{ …
The most important reason why this doesn’t work is because it triggers an infinite loop:
Advanced option definitions 51
at /nix/store/vgicc88fhmlh7mwik7gqzzm2jyfva9l9-source/lib/modules.nix:259:21:
… and the reason why lib.mkIf doesn’t share the same problem is because evalModules pushes
mkIf conditions to the “leaves” of the configuration tree, as if we had instead written this:
{ …
config = {
systemd.services.cowsay = {
wantedBy = lib.mkIf config.services.cowsay.enable [ "multi-user.target" ];
script =
lib.mkIf config.services.cowsay.enable
"${pkgs.cowsay}/bin/cowsay ${config.services.cowsay.greeting}";
};
};
}
let
kafkaSynonym =
{ config, lib, ... }:
config.services.apache-kafka.enable = config.services.kafka.enable;
};
in
{ imports = [ kafkaSynonym ];
config.services.apache-kafka.enable = true;
}
Advanced option definitions 52
The above example leads to a conflict because the kafkaSynonym module defines
services.kafka.enable to false (at priority 100), and the downstream module defines
services.apache-kafka.enable to true (also at priority 100).
let
kafkaSynonym =
{ config, lib, ... }:
config.services.apache-kafka.enable =
lib.mkIf config.services.kafka.enable true;
};
in
{ imports = [ kafkaSynonym ];
config.services.apache-kafka.enable = true;
}
… then that would do the right thing because in the default case services.apache-kafka.enable
would remain undefined, which would be the same thing as being defined as false at priority
1500. That avoids defining the same option twice at the same priority.
mkOrder
The NixOS module system strives to make the behavior of our system depend as little as possible
on the order in which we import or mkMerge NixOS modules. In other words, if we import two
modules that we depend on:
… then ideally the behavior shouldn’t change if we import those same two modules in a different
order:
… and in most cases that is true. 99% of the time you can safely sort your import list and either
your NixOS system will be exactly the same as before (producing the exact same Nix store build
product) or essentially the same as before, meaning that the difference is irrelevant. However,
for those 1% of cases where order matters we need the lib.mkOrder function.
Here’s one example of where ordering matters:
Advanced option definitions 53
let
moduleA = { pkgs, ... }: {
environment.defaultPackages = [ pkgs.gcc ];
};
in
{ imports = [ moduleA moduleB ]; }
Both the gcc package and clang package add a cc executable to the PATH, so the order matters
here because the first cc on the PATH wins.
In the above example, clang’s cc is the first one on the PATH, because we imported moduleB
second:
This sort of order-sensitivity frequently arises for “list-like” option types, including actual lists
or string types like lines that concatenate multiple definitions.
Fortunately, we can fix situations like these with the lib.mkOrder function, which specifies a
numeric ordering that NixOS will respect when merging multiple definitions of the same option.
Every option’s numeric order is 1000 by default, so if we set the numeric order of clang to 1500:
let
moduleA = { pkgs, ... }: {
environment.defaultPackages = [ pkgs.gcc ];
};
in
{ imports = [ moduleA moduleB ]; }
… then gcc will always come first on the PATH, no matter which order we import the modules.
You can also use lib.mkBefore and lib.mkAfter, which provide convenient synonyms for
numeric order 500 and 1500, respectively:
Advanced option definitions 54
let
moduleA = { pkgs, ... }: {
environment.defaultPackages = [ pkgs.gcc ];
};
in
{ imports = [ moduleA moduleB ]; }
8. Deploying to AWS using Terraform
Up until now we’ve been playing things safe and test-driving everything locally on our own
machine. We could even prolong this for quite a while because NixOS has advanced support
for building and testing clusters of NixOS machines locally using virtual machines. However, at
some point we need to dive in and deploy a server if we’re going to use NixOS for real.
In this chapter we’ll deploy our TODO app to our first “production” server in AWS meaning that
you will need to create an AWS account¹ to follow along.
AWS prices and offers will vary so this book can’t provide any strong guarantees about
what this would cost you. However, at the time of this writing the examples in this
chapter would fit well within the current AWS free tier, which is 750 hours of a t3.micro
instance.
Even if there were no free tier, the cost of a t3.micro instance is currently ≈1¢ / hour or
≈ $7.50 / month if you never shut it off (and you can shut it off when you’re not using
it). So at most this chapter should only cost you a few cents from start to finish.
Throughout this book I’ll take care to minimize your expenditures by showing how you
to develop and test locally as much as possible.
In the spirit of Infrastructure as Code, we’ll be using Terraform to declaratively provision AWS
resources, but before doing so we need to generate AWS access keys for programmatic access.
The above AWS documentation also recommends generating temporary access creden-
tials instead of long-term credentials. However, setting this up properly and ergonom-
ically requires setting up the IAM Identity Center which is only permitted for AWS
accounts that have set up an AWS Organization. That is way outside of the scope of this
book so instead you should just generate long-term credentials for a non-root admin
account.
If you haven’t already, configure your development environment to use these tokens by running:
If you’re not sure what region to use, pick the one closest to you based on
the list of AWS service endpoints³.
• module.nix + www/index.html
The NixOS configuration for our TODO list web application, except adapted to run on AWS
instead of inside of a qemu VM.
• flake.nix
A Nix flake that wraps our NixOS configuration so that we can refer to the configuration
using a flake URI.
• main.tf
The Terraform specification for deploying our NixOS configuration to AWS.
• backend/main.tf
This Terraform configuration provisions an S3 bucket for use with Terraform’s S3 backend⁴.
We won’t use this until the very end of this chapter, though, so we’ll ignore it for now.
³https://docs.aws.amazon.com/general/latest/gr/rande.html
⁴https://developer.hashicorp.com/terraform/language/settings/backends/s3
Deploying to AWS using Terraform 57
… and when prompted to enter the region, use the same AWS region you specified earlier when
running aws configure:
var.region
Enter a value: …
After that, terraform will display the execution plan and ask you to confirm the plan:
module.ami.data.external.ami: Reading...
module.ami.data.external.ami: Read complete after 1s [id=-]
Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
+ create
<= read (data resources)
… and if you confirm then terraform will deploy that execution plan:
Outputs:
public_dns = "ec2-….compute.amazonaws.com"
The final output will include the URL for your server. If you open that URL in your browser you
will see the exact same TODO server as before, except now running on AWS instead of inside of a
qemu virtual machine. If this is your first time deploying something to AWS then congratulations!
Cleaning up
Once you verify that everything works you can destroy all deployed resources by running:
Deploying to AWS using Terraform 58
terraform will prompt you for the same information (i.e. the same region) and also prompt for
confirmation just like before:
var.region
Enter a value: …
Now you can read the rest of this chapter in peace knowing that you are no longer being billed
for this example.
Terraform walkthrough
The key file in our Terraform project is main.tf containing the Terraform logic for how to deploy
our TODO list application.
You can think of a Terraform module as being sort of like a function with side effects, meaning:
Our starting main.tf file provides examples of all of the above concepts.
⁵https://developer.hashicorp.com/terraform/language/values/variables
⁶https://developer.hashicorp.com/terraform/language/values/outputs
⁷https://developer.hashicorp.com/terraform/language/resources/syntax
⁸https://developer.hashicorp.com/terraform/language/modules/syntax#calling-a-child-module
Deploying to AWS using Terraform 59
Input variables
For example, the beginning of the module declares one input variable:
variable "region" {
type = string
nullable = false
}
… which is analogous to a Nix function like this one that takes the following attribute set as an
input:
{ region }:
…
When you run terraform apply you will be automatically prompted to supply all input variables:
$ terraform apply
var.region
Enter a value: …
… but you can also provide the same values on the command line, too, if you don’t want to
supply them interactively:
… and if you really want to make the whole command non-interactive you can also add the
-auto-approve flag:
… so that you don’t have to manually confirm the deployment by entering “yes”.
Output variables
The end of the Terraform module declares one output value:
output "public_dns" {
value = aws_instance.todo.public_dns
}
… which would be like our function returning an attribute set with one attribute:
Deploying to AWS using Terraform 60
{ region }:
let
…
in
{ output = aws_instance.todo.public_dns; }
… and when the deploy completes Terraform will render all output values:
Outputs:
public_dns = "ec2-….compute.amazonaws.com"
Resources
In between the input variables and the output values the Terraform module declares several
resources. For now, we’ll highlight the resource that provisions the EC2 instance:
root_block_device {
volume_size = 7
}
}
… and you can think of resources sort of like let bindings that provision infrastructure as a side
effect:
Deploying to AWS using Terraform 61
{ region }:
let
…;
aws_security_group.todo = aws_security_group { … };
tls_private_key.nixos-in-production = tls_private_key { … };
local_sensitive_file.ssh_private_key = local_sensitive_file { … };
local_file.ssh_public_key = local_file { … };
aws_key_pair.nixos-in-production = aws_key_pair { … };
aws_instance.todo = aws_instance {
ami = module.ami.ami;
instance_type = "t3.micro";
security_groups = [ aws_security_group.todo.name ];
key_name = aws_key_pair.nixos-in-production.key_name;
root_block_device.volume_size = 7;
}
null_resource.wait = null_resource { … };
in
{ output = aws_instance.todo.public_dns; }
Our Terraform deployment declares six resources, the first of which declares a security group
(basically like a firewall):
# We need to open port 80 so that we can view our TODO list web page.
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
Deploying to AWS using Terraform 62
cidr_blocks = [ "0.0.0.0/0" ]
}
}
The next four resources generate an SSH key pair that we’ll use to manage the machine:
# Synchronize the SSH private key to a local file that the "nixos" module can
# use
resource "local_sensitive_file" "ssh_private_key" {
filename = "${path.module}/id_ed25519"
content = tls_private_key.nixos-in-production.private_key_openssh
}
# Mirror the SSH public key to EC2 so that we can later install the public key
# as an authorized key for our server
resource "aws_key_pair" "nixos-in-production" {
public_key = tls_private_key.nixos-in-production.public_key_openssh
}
The tls_private_key resource⁹ is currently not secure because the deployment state
is stored locally unencrypted. We will fix this later on in this chapter by storing the
deployment state using Terraform’s S3 backend¹⁰.
# We could use a smaller instance size, but at the time of this writing the
# t3.micro instance type is available for 750 hours under the AWS free tier.
instance_type = "t3.micro"
⁹https://registry.terraform.io/providers/hashicorp/tls/latest/docs/resources/private_key
¹⁰https://developer.hashicorp.com/terraform/language/settings/backends/s3
Deploying to AWS using Terraform 63
volume_size = 7
}
Finally, we declare a resource whose sole purpose is to wait until the EC2 instance is reachable
via SSH so that the “nixos” module knows how long to wait before deploying the NixOS
configuration:
# This ensures that the instance is reachable via `ssh` before we deploy NixOS
resource "null_resource" "wait" {
provisioner "remote-exec" {
connection {
host = aws_instance.todo.public_dns
private_key = tls_private_key.nixos-in-production.private_key_openssh
}
Modules
Our Terraform module also invokes two other Terraform modules (which I’ll refer to as “child
modules”) and we’ll highlight here the module that deploys the NixOS configuration:
module "ami" {
…;
}
module "nixos" {
source = "github.com/Gabriella439/terraform-nixos-ng//nixos?ref=af1a0af57287851f957be2b524fcdc008a21\
d9ae"
host = "root@${aws_instance.todo.public_ip}"
flake = ".#default"
arguments = [ "--build-host", "root@${aws_instance.todo.public_ip}" ]
ssh_options = "-o StrictHostKeyChecking=accept-new"
depends_on = [ null_resource.wait ]
}
You can liken child modules to Nix function calls for imported functions:
Deploying to AWS using Terraform 64
{ region }:
let
module.ami = …;
module.nixos =
let
source = fetchFromGitHub {
owner = "Gabriella439";
repo = "terraform-nixos-ng";
rev = "af1a0af57287851f957be2b524fcdc008a21d9ae";
hash = …;
};
in
import source {
host = "root@${aws_instance.todo_public_ip}";
flake = ".#default";
arguments = [ "--build-host" "root@${aws_instance.todo.public_ip}" ];
ssh_options = "-o StrictHostKeyChecking=accept-new";
depends_on = [ null_resource.wait ];
};
aws_security_group.todo = aws_security_group { … };
tls_private_key.nixos-in-production = tls_private_key { … };
local_sensitive_file.ssh_private_key = local_sensitive_file { … };
local_file.ssh_public_key = local_file { … };
aws_key_pair.nixos-in-production = aws_key_pair { … };
aws_instance.todo = aws_instance { … };
null_resource.wait = null_resource { … };
in
{ output = aws_instance.todo.public_dns; }
The first child module selects the correct NixOS AMI to use:
module "ami" {
source = "github.com/Gabriella439/terraform-nixos-ng//ami?ref=af1a0af57287851f957be2b524fcdc008a21d9\
ae"
release = "23.05"
region = var.region
system = "x86_64-linux"
}
… and the second child module deploys our NixOS configuration to our EC2 instance:
Deploying to AWS using Terraform 65
module "nixos" {
source = "github.com/Gabriella439/terraform-nixos-ng//nixos?ref=af1a0af57287851f957be2b524fcdc008a21\
d9ae"
host = "root@${aws_instance.todo.public_ip}"
# Build our NixOS configuration on the same machine that we're deploying to
arguments = [ "--build-host", "root@${aws_instance.todo.public_ip}" ]
depends_on = [ null_resource.wait ]
}
In this example we build our NixOS configuration on our web server so that this example
can be deployed without any supporting infrastructure. However, you typically will
want to build the NixOS configuration on a dedicated builder rather than building on
the target server for two reasons:
A future chapter will cover how to provision a dedicated builder for this purpose.
S3 Backend
The above Terraform deployment doesn’t properly protect the key pair used to ssh into and
manage the NixOS machine. By default the private key of the key pair is stored in a world-
readable terraform.tfstate file. However, even if we were to restrict that file’s permissions we
wouldn’t be able to easily share our Terraform deployment with colleagues. In particular, we
wouldn’t want to add the terraform.tfstate file to version control in a shared repository since
it contains sensitive secrets.
The good news is that we can fix both of those problems by setting up an S3 backend¹¹ for
Terraform which allows the secret to be securely stored in an S3 bucket that can be shared by
multiple people managing the same Terraform deployment.
¹¹https://developer.hashicorp.com/terraform/language/settings/backends/s3
Deploying to AWS using Terraform 66
The template for this chapter’s Terraform configuration already comes with a backend/ subdirec-
tory containing a Terraform specification that provisions a suitable S3 bucket and DynamoDB
table for an S3 backend. All you have to do is run:
$ cd ./backend
$ terraform apply
var.region
Enter a value: …
Just make sure to use the same region as our original Terraform deployment when prompted.
When the deployment succeeds it will output the name of the randomly-generated S3 bucket,
which will look something like this (with a timestamp in place of the Xs):
…
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Outputs:
bucket = "nixos-in-productionXXXXXXXXXXXXXXXXXXXXXXXXXX"
Then switch back to the original Terraform deployment in the parent directory:
$ cd ..
… and modify that deployment’s main.tf to reference the newly-created bucket like this:
terraform {
required_version = ">= 1.3.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.56"
}
}
These last few manual steps to update the S3 backend are a bit gross but this is primarily
to work around limitations in Terraform. In particular, Terraform doesn’t provide a
way for our main deployment to automatically reference the S3 backend we created.
Terraform specifically prohibits backend stanzas from referencing variables so all of the
backend options have to be hard-coded values.
Then you can upgrade your existing deployment to reference the S3 backend you just provisioned
by re-running terraform init with the -migrate-state option:
… and once that’s done you can verify that nothing broke by running terraform apply again,
which should report that no new changes need to be deployed:
$ terraform apply
var.region
Enter a value: …
The difference is that now the terraform state is securely stored in an S3 bucket instead of on
your filesystem so you’d now be able to store your Terraform configuration in version control
and let other developers manage the same deployment. There’s just one last thing you need to do,
which is to remove the terraform.tfstate.backup file, which contains the old (pre-S3-backend)
Terraform state, including the secrets:
$ rm terraform.tfstate.backup
You can also remove the terraform.tfstate file, too, since it’s empty and no longer used:
$ rm terraform.tfstate
Deploying to AWS using Terraform 68
Future Terraform examples in this book won’t include the S3 backend code to keep them
shorter, but feel free to reuse the same S3 bucket created in this chapter to upgrade any
of those examples with an S3 backend. However, if you do that then keep in mind that
you need to use a different key for storing Terraform’s state if you want to keep those
examples separate.
In other words, when adding the S3 backend to the terraform clause, specify a different
key for each separate deployment:
terraform {
…
backend "s3" {
…
key = "…" # This is what needs to be unique per deployment
…
}
}
This key is used by Terraform to record where to store the deployment’s state within
the S3 bucket, so if you use the same key for two different deployments they will will
interfere with one another.
Version control
Once you create the S3 backend you can safely store your Terraform configuration in version
control. Specifically, these are the files that you want to store in version control:
• flake.lock
It’s also worth keeping this in version control even though it’s not strictly necessary. The
lock file slightly improves the determinism of the deployment, although the flake included
in the template is already fairly deterministic even without the lockfile because it references
a specific tag from Nixpkgs.
• main.tf, backend/main.tf
We definitely want to keep the Terraform deployments for our main deployment and the
S3 backend.
• terraform.tfstate
You don’t need to keep this in version control (it’s an empty file
Deploying to AWS using Terraform 69
Just as important, you do NOT want to keep the id_ed25519 file in version control (since this
contains the private key). In fact, the provided template includes a .gitignore file to prevent
you from accidentally adding the private key to version control.
Terraform will recreate this private key file locally for each developer that manages the
deployment. For example, if another developer were to apply the deployment for the first time,
they would see this diff:
… indicating that Terraform will download the private key from the S3 backend and create a
secure local copy in order to ssh into the machine.
However, it’s completely fine to add the public key to version control if you want.
9. Continuous Integration and
Deployment
This chapter will cover how to use both continuous integration (a.k.a. “CI”) and continuous
deployment (a.k.a. “CD”), beginning with a brief explanation of what those terms mean.
Both continuous integration and continuous deployment emphasize continuously incorporating
code changes into your product. Continuous integration¹ emphasizes continuously incorporating
code changes into the trunk development branch of your version control repository whereas
Continuous deployment² emphasizes continuously incorporating code changes into production.
Continuous Integration
Colloquially developers often understand “continuous integration” to mean automatically testing
pull requests before they are merged into version control. However, continuous integration is
about more than just automated tests and is really about ensuring that changes are regularly
being integrated into the trunk development branch (and automated tests help with that). For
example, if you have long-lived development branches that’s not really adhering to the spirit of
continuous integration, even if you do put them through automated tests.
I’m mentioning this because this book will offer opinionated guidance that works better if you’re
not supporting long-lived development branches. You can still modify this book’s guidance to
your tastes, but in my experience sticking to only one long-lived development branch (the trunk
branch) will simplify your architecture, reduce communication overhead between developers,
and improve your release frequency. In fact, this specific flavor of continuous integration has a
name: trunk-based development³.
That said, this chapter will focus on how to set up automated tests since that’s the part of
continuous integration that’s NixOS-specific.
The CI solution I endorse for most Nix/NixOS projects is garnix⁴ because with garnix you don’t
have to manage secrets and you don’t have to set up your own build servers or cache. In other
words, garnix is architecturally simple to install and manage.
However, garnix only works with GitHub (it’s a GitHub app⁵) so if you are using a different
version control platform then you’ll have to use a different CI solution. The two major
alternatives that people tend to use are:
• Hydra
¹https://en.wikipedia.org/wiki/Continuous_integration
²https://en.wikipedia.org/wiki/Continuous_deployment
³https://trunkbaseddevelopment.com/
⁴https://garnix.io/
⁵https://github.com/apps/garnix-ci
Continuous Integration and Deployment 71
Like garnix, Hydra is a Nix-aware continuous integration service but unlike garnix, Hydra
is self-hosted⁶. Hydra’s biggest strength is deep visibility into builds in progress and ease
of scaling out build capacity but Hydra’s biggest weakness is that it is high maintenance to
support, and difficult to debug when things go wrong.
The reason why non-Nix-aware CI solutions tend to do worse at scale is because they
typically have their own notion of available builders/agents/slots which does not map
cleanly onto Nix’s notion of available builders. This means that you have to waste time
tuning the two sets of builders to avoid wasting available build capacity and even after
tuning you’ll probably end up with wasted build capacity.
The reason Hydra doesn’t have this problem is because Hydra uses Nix’s native
notion of build capacity (remote builds⁷) configured via the nix.distributedBuilds
and nix.buildMachines NixOS options. That means that you can easily scale out build
capacity by adding more builders⁸.
This chapter will focus on setting up garnix since it’s dramatically simpler than the alternatives.
Also, we’re going to try to minimize the amount of logic that needs to live outside of Nix. For
example:
• checks that you’d normally run in a non-Nix-aware job can be incorporated into a Nix
build’s check phase
garnix
garnix already has official documentation¹⁰ for how to set it up, but I’ll mention here the relevant
bits for setting up CI for our own production deployment. We’re going to configure this CI to
⁶The reason why is that writing a (meaningful) test for our TODO list example would require executing JavaScript using something
like Selenium, which will significantly increase the size of the example integration test. postgrest, on the other hand, is easier to test
from the command line.
⁷https://nixos.org/manual/nix/stable/advanced-topics/distributed-builds.html
⁸Okay, there is actually a limit to how much you can scale out build capacity. After a certain point you will begin to hit bottlenecks
in instantiating derivations at scale, but even in this scenario Hydra still has a higher performance ceiling than the the non-Nix-aware
alternatives.
⁹https://thenewstack.io/push-vs-pull-in-gitops-is-there-really-a-difference/
¹⁰https://garnix.io/docs
Continuous Integration and Deployment 72
build and cache the machine that we deploy to production, which will also ensure that we don’t
merge any changes that break the build.
This exercise will build upon the same example as the previous chapter on Terraform, and you
can reuse the example from that chapter or you can generate the example if you haven’t already
by running these commands:
$ mkdir todo-app
$ cd todo-app
$ nix flake init --template 'github:Gabriella439/nixos-in-production/0.10#terraform'
… or you can skip straight to the final result (minus the secrets file) by running:
garnix requires the use of Nix flakes in order to support efficient evaluation caching¹¹ and the
good news is that we can already build our NixOS system without any changes to our flake, but
it might not be obvious how at first glance.
If we wanted to build our system, we would run:
If your development system is Apple Silicon (i.e. aarch64-darwin) you will not yet be
able to build that locally. Even if you use the Linux builder from the Setting up your
development environment chapter that won’t work because the builder’s architecture
(aarch64-linux) won’t match the architecture of the system we’re deploying (x86_-
64-linux).
In a future chapter we’ll cover how to set up an x86_64-linux remote builders that
you can use for testing builds like these, but until then you will have to settle for just
evaluating the system configuration instead of building it, like this:
This will catch errors at the level of Nix evaluation (e.g. Nix syntax errors or bad function
calls) but this won’t catch errors related to actually building the system.
In fact, if all you care about is evaluation, you can simplify that latter command even
further by just running:
… which does exactly the same thing (among other things). However, typically we want
to build and cache our NixOS system, which is why we don’t just run nix flake check
in CI.
Attributes
Let’s step through this attribute path:
¹¹https://www.tweag.io/blog/2020-06-25-eval-cache/
Continuous Integration and Deployment 73
nixosConfigurations.default.config.system.build.toplevel
… to see where each attribute comes from because that will come in handy if you choose to
integrate Nix into a non-Nix-aware CI solution:
• nixosConfigurations
This is one of the “standard” output attributes¹² for flakes where we store NixOS con-
figurations that we want to build. This attribute name is not just a convention; NixOS
configurations stored under this attribute enjoy special support from Nix tools. Specifically:
– nixos-rebuild only supports systems underneath the nixosConfigurations output
We use nixos-rebuild indirectly as part of our Terraform deployment because the
terraform-nixos-ng module uses nixos-rebuild under the hood¹³. In our project’s
main.tf file the module.nixos.flake option is set to .#default which nixos-rebuild
replaces with .#nixosConfigurations.default¹⁴.
– nix flake check automatically checks the nixosConfigurations flake output
… as noted in the previous aside.
– garnix’s default configuration¹⁵ builds all of the nixosConfigurations flake outputs
… so if we stick to using that output then we don’t need to specify a non-default
configuration.
• default
We can store more than one NixOS system configuration underneath the nixosConfigu-
rations output. We can give each system any attribute name, but typically if you only
have one system to build then the convention is to name that the default system. The
command-line tooling does not give this default attribute any special treatment, though.
• config
The output of the nixpkgs.lib.nixosSystem system is similar in structure to a NixOS
module, which means that it has attributes like config and options. The config attribute
lets you access the finalized values for all NixOS options.
• system.build.toplevel
This is a NixOS option that stores the final derivation for building our NixOS system. For
more details, see the NixOS option definitions chapter.
¹²https://nixos.wiki/wiki/Flakes#Output_schema
¹³https://www.haskellforall.com/2023/01/terraform-nixos-ng-modern-terraform.html
¹⁴I have no idea why nixos-rebuild works this way and doesn’t accept the full attribute path including the nixosConfigurations
attribute.
¹⁵https://garnix.io/docs/yaml_config
Continuous Integration and Deployment 74
$ nix repl .#
Welcome to Nix 2.18.1. Type :? for help.
… and then you can use autocompletion within the REPL to see what’s available. For
example:
nix-repl> nixosConfigurations.<TAB>
nix-repl> nixosConfigurations.default.<TAB>
nixosConfigurations.default._module nixosConfigurations.default.extendModules nixosConfigurati\
ons.default.options nixosConfigurations.default.type
nixosConfigurations.default.config nixosConfigurations.default.extraArgs nixosConfigurati\
ons.default.pkgs
Enabling garnix CI
The only thing you’ll need in order to enable garnix CI for your project is to:
… and you’re mostly done! You won’t see any activity, though, until you create your first pull
request so you can verify that garnix is working by creating a pull request to make the following
change to the flake.nix file:
¹⁶https://docs.github.com/en/migrations/importing-source-code/using-the-command-line-to-import-source-code/adding-locally-
hosted-code-to-github
¹⁷https://github.com/apps/garnix-ci
Continuous Integration and Deployment 75
--- a/flake.nix
+++ b/flake.nix
@@ -7,4 +7,12 @@
modules = [ ./module.nix ];
};
};
+
+ nixConfig = {
+ extra-substituters = [ "https://cache.garnix.io" ];
+
+ extra-trusted-public-keys = [
+ "cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g="
+ ];
+ };
}
Once you create that pull request, garnix will report two status checks on that pull request:
• “Evaluate flake.nix”
This verifies that your flake.nix file is well-formed and also serves as a fallback status
check you can use if your flake has no outputs (for whatever reason).
• “nixosConfig default”
This status check verifies that our nixosConfigurations.default output builds correctly
and caches it.
The next thing you need to do is to enable branch protection settings so that those new
status checks gate merge into your main branch. To do that, visit the “Settings → Branches
→ Add branch protection rule” page of your repository (which you can also find at
https://github.com/${OWNER}/${REPOSITORY}/settings/branch_protection_rules/new
where ${OWNER} is your username and ${REPOSITORY} is the repository name you chose). Then
select the following options:
Since this is a tutorial project we won’t enable any other branch protection settings, but for a
real project you would probably want to enable some other settings (like requiring at least one
approval from another contributor).
Once you’ve made those changes, merge the pull request you just created. You’ve just set up
automated tests for your repository!
Continuous Integration and Deployment 76
The above command does not work on other systems (e.g. aarch64-darwin), even though
the complete build product is cached! You would think that Nix would just download
(“substitute”) the complete build product even if there were a system mismatch, but
this does not work because Nix refuses to substitute certain derivations¹⁸. The above
nix build command will only work if your local system is x86_64-linux or you have a
remote builder configured to build x86_64-linux build products because Nix will insist
on building some of the build products instead of substituting them.
It is possible to work around this by adding the following two nix.conf options (and
restarting your Nix daemon):
extra-substituters = https://cache.garnix.io
extra-trusted-public-keys = cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g=
$ FLAKE='.#nixosConfigurations.default.config.system.build.toplevel'
$ nix-store --realise "$(nix eval --raw "${FLAKE}.outPath")"
Continuous Deployment
We’re going to be using “pull-based” continuous deployment to manage our server, meaning that
our server will periodically fetch the desired NixOS configuration and install that configuration.
NixOS already has a set of system.autoUpgrade¹⁹ options for managing a server in this way.
What we want is to be able to set at least the following two NixOS options:
system.autoUpgrade = {
enable = true;
flake = "github:${username}/${repository}#default";
};
However, there’s a catch: this means that our machine will need to be able to access our private
git repository. Normally the way you’d do this is to specify an access token in nix.conf like
this:
¹⁸https://github.com/NixOS/nix/issues/8677
¹⁹https://search.nixos.org/options?query=system.autoUpgrade
Continuous Integration and Deployment 77
access-tokens = github.com=${SECRET_ACCESS_TOKEN}
… but don’t want to save this access token in version control in order to deploy our machine.
Another way we could fetch from a private git repository is to specify a flake like this:
flake = "git+ssh://git@github.com/${username}/${repository}#default";
… which would allow us to access the private git repository using an SSH key pair instead
of using a GitHub access token (assuming that we configure GitHub to grant that key
pair access to the repository). Either way, we’d need some sort of secret to be present on
the machine in order to access the private repository.
So we need some way to securely transmit or install secrets (such as personal access tokens) to
our machine, but how do we bootstrap all of that?
For the examples in this book, we’re going to reuse the SSH key pair generated for our Terraform
deployment as a “primary key pair”. In other words, we’re going to install the private key of
our SSH key pair on the target machine and then use the corresponding public key (which we
can freely share) to encrypt other secrets (which only our target machine can decrypt, using the
private key). In fact, our original Terraform template already does this:
We are now living in the future and we’re going to use the SSH private key mirrored to
/var/lib/id_ed25519 as the primary key that bootstraps all our other secrets.
This implies that our “admin” (the person deploying our machine using Terraform) will be able
to transitively access all other secrets that the machine depends on because the admin has access
to the same private key. However, there’s no real good way to prevent this sort of privilege
escalation, because the admin has root access to the machine and good luck granting the machine
access to a secret without granting the root user access to the same secret.²⁰
sops-nix
We’re going to use sops-nix²¹ (a NixOS wrapper around sops²²) to securely distribute all other
secrets we need to our server. The way that sops-nix works is:
²⁰There are some ways you can still prevent privilege escalation by the root user, like multi-factor authentication, but do you really
want some other person to have to multi-factor authenticate every time one of your machines polls GitHub for the latest configuration?
It’s much simpler to just trust your admin.
²¹https://github.com/Mic92/sops-nix
²²https://github.com/getsops/sops
Continuous Integration and Deployment 78
• You install the private key on the target machine without using sops
There is no free lunch here. You can’t bootstrap secrets on the target machine out of thin
air. The private key of our primary key pair needs to already be present on the machine so
that the machine can decrypt secrets encrypted by the public key.
You might wonder what is the point of using sops to distribute secrets to the machine if
it requires already having a secret present on the machine (the primary key).
The purpose of sops is to provide a uniform interface for adding, versioning, and
installing all other secrets. Otherwise, you’d have to roll your own system for doing this
once you realize that it’s kind of a pain to implement a secret distribution mechanism
for each new secret you need.
So sops doesn’t completely solve the problem of secrets management (you still have to
figure out how to install the primary key), but it does make it easier to manage all the
other secrets.
age keys
To use the sops command-line tool we’ll need to convert our SSH primary key pair into an age
key pair. This step is performed by the admin who has access to both the SSH public key and the
SSH private key and requires the ssh-to-age command-line tool, which you can obtain like this:
²³https://www.gnupg.org/gph/en/manual/c14.html
²⁴https://man.openbsd.org/ssh-keygen
²⁵https://github.com/FiloSottile/age#readme
Continuous Integration and Deployment 79
The public key of our age key pair will be stored in a .sops.yaml configuration file which lives
in version control. To create the age public key, run:
The private key of our age key pair is stored locally by the admin so that they can edit secrets.
To store the age private key, run:
$ # On Linux
$ KEY_FILE=~/.config/sops/age/keys.txt
$ # On MacOS
$ KEY_FILE=~/Library/'Application Support'/sops/age/keys.txt
… and then click the “Generate token” button. Keep this page with the generated token open for
just a second.
Fetch the sops command-line tool by running:
²⁶https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#
creating-a-fine-grained-personal-access-token
²⁷https://github.com/settings/personal-access-tokens/new
Continuous Integration and Deployment 80
$ sops secrets.yaml
That will open a new file in your editor with the following contents:
We’re going to do what file says and edit the file how we please. Delete the entire file and replace
it with:
… replacing github_pat_… with the personal access token you just generated.
Now if you save, exit, and view the file (without sops) you will see something like this:
github-access-token: …
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: …
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
…
-----END AGE ENCRYPTED FILE-----
lastmodified: "…"
mac: ENC[AES256_GCM,data:…,iv:…,tag:…,type:str]
pgp: []
unencrypted_suffix: _unencrypted
version: 3.7.3
… and since you’re the admin you can still decrypt the file using sops to view the secret:
$ sops secrets.yaml
Anyone who doesn’t have access to the private key would instead get an error message like this:
Continuous Integration and Deployment 81
Failed to get the data key required to decrypt the SOPS file.
Group 0: FAILED
…: FAILED
- | failed to open file: open
| …/sops/age/keys.txt: no such file or directory
Recovery failed because no master key was able to decrypt the file. In
order for SOPS to recover the file, at least one key has to be successful,
but none were.
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/23.11";
sops-nix.url = "github:Mic92/sops-nix/bd695cc4d0a5e1bead703cc1bec5fa3094820a81";
};
nixConfig = {
extra-substituters = [ "https://cache.garnix.io" ];
extra-trusted-public-keys = [
"cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g="
];
};
}
Then we’ll enable continuous deployment by adding the following lines to module.nix:
{ modulesPath, ... }:
{ …
sops = {
defaultSopsFile = ./secrets.yaml;
age.sshKeyPaths = [ "/var/lib/id_ed25519" ];
secrets.github-access-token = { };
};
That will:
{ modulesPath, ... }:
{ …
system.autoUpgrade = {
enable = true;
but don’t deploy this configuration just yet! First, add all the changes we’ve made so far to version
control:
Then create a pull request from those changes and merge the pull request once it passes CI.
Once you’ve merged your changes, checkout the main branch of your repository:
… and deploy those changes using terraform, the same way we did in the previous chapter:
Continuous Integration and Deployment 83
$ terraform apply
Once you’ve applied those changes your machine will begin automatically pulling its configura-
tion from the main branch of your repository.
… and if things are working then every minute you should see the service output something like
this:
Now let’s test that our continuous deployment is working by making a small change so that we
don’t need to add the --extra-experimental-features option to every nix command.
Add the following option to module.nix:
Continuous Integration and Deployment 84
… and create and merge a pull request for that change. Once your change is merged the machine
will automatically pick up the change on the next minute boundary and you can verify the
change worked by running:
Motivation
You can think of flakes as a package manager for Nix. In other words, if we use Nix to build
and distribute packages written in other programming languages (e.g. Go, Haskell, Python), then
flakes are how we “build” and distribute Nix packages.
Here are some example Nix packages that are distributed as flakes:
• nixpkgs
This is the most widely used Nix package of all. Nixpkgs is a giant git repository hosted
on GitHub² containing the vast majority of software packaged for Nix. Nixpkgs also
includes several important helper functions that you’ll need for building even the simplest
of packages, so you pretty much can’t get anything done in Nix without using Nixpkgs to
some degree.
• flake-utils
This is a Nix package containing useful utilities for creating flakes and is itself distributed
as a flake.
• sops-nix
This is a flake we just used in the previous chapter to securely distribute secrets.
All three of the above packages provide reusable Nix code that we might want to incorporate into
downstream Nix projects. Flakes provide a way for us to depend on and integrate Nix packages
like these into our own projects.
Flakes, step-by-step
We can build a better intuition for how flakes work by starting from the simplest possible flake
you can write:
¹https://nixos.wiki/wiki/Flakes#Flake_schema
²https://github.com/NixOS/nixpkgs
Flakes 86
# ./flake.nix
You can then build and run that flake with this command:
$ nix run
Hello, world!
Flake references
We could have also run the above command as:
$ nix run .
… or like this:
• The first half (the flake reference) specifies where a flake is located
In the above example the flake reference is “.” (a shorthand for our current directory).
• The second half (the attribute path) specifies which output attribute to use
In the above example, it is packages.x86_64-linux.default and nix run uses that output
attribute path to select which executable to run.
… then nix run will attempt to expand .#default to a fully qualified attribute path of
.#apps."${system}".default and if the flake does not have that output attribute path then nix
run will fall back to a fully qualified attribute path of .#packages."${system}".default.
Different Nix commands will expand the attribute path differently. For example:
In each case the "${system}" in the expanded attribute path corresponds to your current system,
which you can query using this command:
You can even omit the attribute path, in which case it will default to an attribute path of default.
For example, if you run:
$ nix run .
… then nix run will expand . to .#default (which will in turn expand to
.#packages.${system}.default for our flake).
Furthermore, you can omit the flake reference, which will default to ., so if you run:
$ nix run
… then that expands to a flake reference of . (which will then continue to expand according to
the above rules).
Flake URIs
So far these examples have only used a flake reference of . (the current directory), but in this
book we’ll be using several types of flake references, including:
• paths
These can be relative paths (like . or ./utils or ../bar), home-anchored paths (like
∼/workspace), or absolute paths (like /etc/nixos). In all three cases the path must be a
directory containing a flake.nix file.
Flakes 88
• GitHub URIs
These take the form github:${OWNER}/${REPOSITORY} or github:${OWNER}/${REPOSITORY}/${REFERENCE}
(where ${REFERENCE} can be a branch, tag, or revision). Nix will take care of cloning
the repository for you in a cached and temporary directory and (by default) look for a
flake.nix file within the root directory of the repository.
• indirect URIs
An indirect URI is one that refers to an entry in Nix’s “flake registry”. If you run nix
registry list you’ll see a list of all your currently configured indirect URIs.
Flake inputs
Normally the way flakes work is that you specify both inputs and outputs, like this:
{ inputs = {
foo.url = "${FLAKE_REFERENCE}";
bar.url = "${FLAKE_REFERENCE}";
};
In the above example, foo and bar would be the flake inputs while baz and qux would be the
flake outputs. In other words, the sub-attributes nested underneath the inputs attribute are the
flake inputs and the attributes generated by the outputs function are the flake outputs.
Notice how the outputs function takes input arguments which share the same name as the flake
inputs because the flakes machinery resolves each input and then passes each resolved input as
a function argument of the same name to the outputs function.
To illustrate this, if you were to build the baz output of the above flake using:
let
flake = import ./flake.nix;
in
self.baz
… where resolveFlakeURI would be sort of like a function from an input’s flake reference to the
Nix code packaged by that flake reference.
If you’re curious how flake inputs and outputs are actually resolved, it’s actually
implemented as a function in Nix, which you can find here in the NixOS/nix repository³.
However, if you were paying close attention you might have noticed that our original example
flake does not have any inputs:
… and the outputs function references a nixpkgs input which we never specified. The reason this
works is because flakes automatically convert missing inputs to “indirect” URIs that are resolved
using Nix’s flake registry. In other words, it’s as if we had written:
{ inputs = {
nixpkgs.url = "nixpkgs"; # Example of an indirect flake reference
};
An indirect flake reference is resolved by doing a lookup in the flake registry, which you can
query yourself like this:
³https://github.com/NixOS/nix/blob/2.18.1/src/libexpr/flake/call-flake.nix
Flakes 90
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
};
… which would have produced the same result: both flake references will attempt to fetch the
nixpkgs-unstable branch of the nixpkgs repository to resolve the nixpkgs flake input.
Throughout the rest of this chapter (and book) I’m going to try to make flake references
as pure as possible, meaning:
Neither of these precautions are strictly necessary when using flakes because flakes lock
their dependencies using a flake.lock file which you can (and should) store in version
control. However, it’s still a good idea to take these precautions anyway even if you
include the flake.lock file alongside your flake.nix file. The more reproducible your
flake references, the better you document how to regenerate or update your lock file.
Suppose we were to use our own local git checkout of nixpkgs instead of a remote nixpkgs
branch: we’d have to change the nixpkgs input to our flake to reference the path to our local
repository (since paths are valid flake references), like this:
{ inputs = {
nixpkgs.url = ~/repository/nixpkgs;
};
… and then we also need to build the flake using the --impure flag:
To elaborate on the latter point, instead of specifying a nixpkgs input like this:
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs";
};
…
}
{ inputs = {
type = "github";
owner = "NixOS";
repo = "nixpkgs";
};
…
}
Throughout this book I’ll consistently use the non-structured (string) representation for
flake references to keep things simple.
Flake outputs
We haven’t yet covered what we actually get when we resolve a flake input. For example, what
Nix expression does a flake reference like github:NixOS/nixpkgs/23.11 resolve to?
The answer is that a flake reference will resolve to the output attributes of the corresponding
flake. For a flake like github:NixOS/nixpkgs/23.11 that means that Nix will:
Flakes 92
# nixpkgs.nix
let
pkgs = import <nixpkgs> { };
nixpkgs = pkgs.fetchFromGitHub {
owner = "NixOS";
repo = "nixpkgs";
rev = "23.11";
hash = "sha256-btHN1czJ6rzteeCuE/PNrdssqYD2nIA4w48miQAFloM=";
};
in
self
… except that with flakes we wouldn’t have to figure out what hash to use since that would be
transparently managed for us by the flake machinery.
If you were to load the above file into the REPL:
… you would get the exact same result as if you had loaded the equivalent flake into the REPL:
In both cases the REPL would now have the lib, checks, htmlDocs, legacyPackages, and
nixosModules attributes in scope since those are the attributes returned by the outputs function:
⁴https://github.com/NixOS/nixpkgs
⁵https://github.com/NixOS/nixpkgs/tree/23.11
⁶https://github.com/NixOS/nixpkgs/blob/23.11/flake.nix
⁷https://github.com/NixOS/nixpkgs/blob/23.11/flake.nix#L6
⁸https://github.com/NixOS/nixpkgs/blob/23.11/flake.nix#L16-L74
Flakes 93
nix-repl> legacyPackages.x86_64-linux.hello
«derivation /nix/store/zjh5kllay6a2ws4w46267i97lrnyya9l-hello-2.12.1.drv»
This legacyPackages.x86_64-linux.hello attribute path is the same attribute path that our
original flake output uses:
{ …
There’s actually one more thing you can do with a flake, which is to access the original path to
the flake. The following flake shows an example of this feature in action:
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/23.11";
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
};
overlays = [ overlay ];
This flake customizes Nixpkgs using an overlay instead of using the “stock” package set, but
in order to create a new package set from that overlay we have to import the original source
directory for Nixpkgs. In the above example, that happens here when we import nixpkgs:
Normally the import keyword expects either a file or (in this case) a directory containing a
default.nix file, but here nixpkgs is neither: it’s an attribute set containing all of the nixpkgs
flake’s outputs. However, the import keyword can still treat nixpkgs like a path because it also
comes with an outPath attribute, so we could have also written:
Flakes 94
All flake inputs come with this outPath attribute, meaning that you can use a flake input
anywhere Nix expects a path and the flake input will be replaced with the path to the directory
containing the flake.nix file.
Platforms
All of the above examples hard-coded a single system (x86_64-linux), but usually you want to
support building a package for multiple systems. People typically use the flake-utils flake for
this purpose, which you can use like this;
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/23.11";
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
};
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/23.11";
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
};
We’ll be using flake-utils throughout the rest of this chapter and you’ll see almost all flakes
use this, too.
Flake-related commands
The Nix command-line interface provides several commands that are flake-aware, and for the
purpose of this chapter we’ll focus on the following commands:
• nix build
Flakes 95
• nix run
• nix shell
• nix develop
• nix flake check
• nix flake init
• nix eval
• nix repl
• nixos-rebuild
We’ll be using the following flake as the running example for our commands:
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/23.11";
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
};
in
{ packages.default = pkgs.cowsay;
apps = …;
checks = …;
devShells = …;
}) // {
templates.default = …;
};
}
One of the things you might notice is that the some of the output attributes are nested inside of
the call to eachDefaultSystem. Specifically, the packages, apps, checks, and devShells outputs:
Flakes 96
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages."${system}";
in
{ packages.default = pkgs.cowsay;
apps = …;
checks = …;
devShells = …;
}) // …
For each of these outputs we want to generate system-specific build products, which is why they
go inside the call to eachDefaultSystem. However, some flake outputs (like templates) are not
system-specific, so they would go outside of the call to eachDefaultSystem, like this:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages."${system}";
in
{ …
}) // {
templates.default = …;
};
You can always consult the flake output schema⁹ if you’re not sure which outputs are system-
specific and which ones are not. For example, the sample schema will show something like this:
… and ."<system>". component of the first four attribute paths indicates that these outputs
are system-specific, whereas the templates."<name>" attribute path has no system-specific path
component.
⁹https://nixos.wiki/wiki/Flakes#Output_schema
Flakes 97
The same sample schema also explains which outputs are used by which Nix commands, but
we’re about to cover that anyway:
nix build
The nix build command builds output attributes underneath the packages attribute path.
For example, if we run:
$ nix build
… that will build the .#packages."${system}".default output, which in our flake is just a
synonym for the cowsay package from Nixpkgs:
…
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages."${system}";
in
{ packages.default = pkgs.cowsay;
…
})
$ tree ./result
./result
├── bin
│ ├── cowsay
│ └── cowthink -> cowsay
└── share
└── cowsay
├── cows
│ ├── DragonAndCow.pm
│ ├── Example.pm
│ ├── Frogs.pm
│ ├── …
│ ├── vader-koala.cow
│ ├── vader.cow
│ └── www.cow
└── site-cows
5 directories, 58 files
$ ./result/bin/cowsay howdy
_______
< howdy >
-------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
So far this isn’t very interesting because we can already build the cowsay executable from Nixpkgs
directly like this:
In fact, we don’t even need to create a local copy of the cowsay template flake. We could also
have run the flake directly from the GitHub repository where it’s hosted:
This works because flakes support GitHub URIs, so all of the flake operations in this chapter work
directly on the GitHub repository without having to clone or template the repository locally.
However, for simplicity all of the following examples will still assume you templated the flake
locally.
nix run
Typically we won’t run the command by building it and then running it. Instead, we’ll more
commonly use nix run to do both in one go:
$ nix run . -- howdy # The "." is necessary if the command takes arguments
_______
< howdy >
-------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
By default, nix run will expand out . to .#apps.${system}.default, falling back to .#pack-
ages.${system}.default if that’s not present. Our flake happens to provide the former attribute
path:
Flakes 99
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages."${system}";
in
{ packages.default = pkgs.cowsay;
apps = {
default = self.apps."${system}".cowsay;
cowsay = {
type = "app";
program = "${self.packages."${system}".default}/bin/cowsay";
};
cowthink = {
type = "app";
program = "${self.packages."${system}".default}/bin/cowthink";
};
};
…
}
This time our flake has three available apps (default, cowsay, and cowthink) and the default
app is just a synonym for the cowsay app. Each app has to have:
Notice that flake outputs can reference other flake outputs (via the self flake input). All flakes
get this self flake input for free. We could have also used the rec language keyword instead,
like this:
apps = rec {
default = cowsay;
cowsay = {
type = "app";
program = "${self.packages."${system}".default}/bin/cowsay";
};
cowthink = {
type = "app";
program = "${self.packages."${system}".default}/bin/cowthink";
};
};
Flakes 100
… which would define the default attribute to match the cowsay attribute within the same record.
This works in small cases, but doesn’t scale well to more complicated cases; you should prefer
using the self input to access other attribute paths.
You can use output attributes other than the default one by specifying their attribute paths. For
example, if we want to use the cowthink program then we can run:
Apparently, the cowthink program produces the exact same result as the cowsay program.
Since the cowthink app is indistinguishable from the cowsay app, let’s replace it with a more
interesting kittysay app that automatically adds the -f hellokitty flag. However, we can’t do
something like this:
apps = {
…
kittysay = {
type = "app";
… because Nix expects the program attribute to be an executable path, not including any
command-line arguments. If you want to wrap an executable with arguments then you need
to do something like this:
Flakes 101
{ packages = {
default = pkgs.cowsay;
apps = {
…
kittysay = {
type = "app";
program = "${self.packages."${system}".kittysay}/bin/kittysay";
};
};
Here we define a kittysay package (which wraps cowsay with the desired command-line option)
and a matching kittysay app.
Note that if the name of the app is the same as the default executable for a package then
we can just omit the app entirely. In the above kittysay example, we could delete the
kittysay app and the example would still work because Nix will fall back to running
${self.packages.${system}.kittysay}/bin/kittysay. You can use nix run --help to see the
full set of rules for how Nix decides what attribute to use and what path to execute.
nix shell
If you plan on running the same command (interactively) over and over then you probably don’t
want to have to type nix run before every use of the command. Not only is this less ergonomic
but it’s also slower since the flake has to be re-evaluated every time you run the command.
The nix shell comes in handy for use cases like this where it will take the flake outputs that
you specify and add them to your executable search path (e.g. your $PATH in bash) for ease of
repeated use. We can add our cowsay to our search path in this way:
$ nix shell
$ cowsay howdy
_______
< howdy >
-------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
nix shell creates a temporary subshell providing the desired commands and you can exit from
this subshell by entering an exit command or typing Ctrl-D.
Flakes 102
nix shell can be useful for pulling in local executables from your flake, but it’s even more
useful for pulling in executables temporarily from Nixpkgs (since Nixpkgs provides a large array
of useful programs). For example, if you wanted to temporarily add vim and tree to your shell
you could run:
Note that the {vim,tree} syntax in the previous command is a Bash/Zsh feature. Both
shells expand the previous command to:
This feature is a
convenient way to avoid having to type out the
github:NixOS/nixpkgs/23.11 flake reference twice when adding multiple programs to
your shell environment.
nix develop
You can even create a reusable development shell if you find yourself repeatedly using the same
temporary executables. Our sample flake illustrates this by providing two shells:
devShells = {
default = self.packages."${system}".default;
with-dev-tools = pkgs.mkShell {
inputsFrom = [ self.packages."${system}".default ];
packages = [
pkgs.vim
pkgs.tree
];
};
};
$ nix develop
You can exit from this development shell by either entering exit or typing Ctrl-D.
You might wonder what’s the difference between nix develop and nix shell. The difference
between the two is that:
Flakes 103
• nix shell adds the specified programs to your executable search PATH
• nix develop adds the development dependencies of the specified programs to your exe-
cutable search PATH
That adds vim to your executable search path. In contrast, if you run:
That provides a development environment necessary to build the vim executable (like a C compiler
or the ncurses package), but does not provide vim itself.
In our sample flake, we used the mkShell utility:
with-dev-tools = pkgs.mkShell {
inputsFrom = [ self.packages."${system}".default ];
packages = [
pkgs.vim
pkgs.tree
];
};
• inputsFrom
mkShell inherits the development dependencies of any package that you list here. Since
self.packages."${system}".default is just our cowsay package then that means that all
development dependencies of the cowsay package also become development dependencies
of mkShell.
• packages
All packages listed in the packages argument to mkShell become development dependencies
of mkShell. So if we add vim and tree here then those will be present on the executable
search path of our synthetic environment if we call nix develop on our mkShell.
This means that the with-dev-tools shell is essentially the same as the default shell, except also
extended with the vim and tree packages added to the executable search path.
You can add tests to a flake that the nix flake check command will run. These tests go under
the checks output attribute and our sample flake provides a simple functional test for the cowsay
package:
Flakes 104
checks = {
default = self.packages."${system}".default;
touch $out
'';
};
The default check is a synonym for our cowsay package and just runs cowsay’s (non-existent)
test suite. The diff check is a functional test that compares some sample cowsay output against
a golden result.
We can run both checks (the default check and the diff check) using:
The nix flake check command also performs other hygiene checks on the given flake and you
can learn more about the full set of checks by running:
You can template a project using nix flake init, which we’ve already used a few times
throughout this book (including this chapter). Our cowsay flake contains the following templates
output:
}) // {
templates.default = {
path = ./.;
… that copied the directory pointed to by the templates.default.path flake output to our local
directory.
Note that this flake output is not system-specific, which is why it’s not nested inside the call
to eachDefaultSystems in our flake. This is because there’s nothing system-dependent about
templating some text files.
nix eval
nix eval is another command we’ve already used a few times throughout this book to query
information about our flakes without building anything. For example, if we wanted to query the
version of our cowsay package, we could run:
This is because flake outputs are “just” attribute sets and Nix derivations are also “just” attribute
sets, so we can dig into useful information about them by accessing the appropriate attribute
path.
However, you might not necessarily know what attribute paths are even available to query, which
brings us to the next extremely useful Nix command:
nix repl
You can use the nix repl command to easily interactively explore what attributes are available
using REPL auto-completion. For example, if you run:
That will load all of the flake outputs (e.g. packages, apps, checks, devShells, templates) as
identifiers of the same name into the REPL. Then you can use tab completion to dig further into
their available fields:
nix-repl> pac<TAB>
nix-repl> packages.<TAB>
packages.x86_64-linux packages.i686-linux packages.x86_64-linux
packages.aarch64-linux packages.x86_64-darwin
nix-repl> packages.x86_64-linux.<TAB>
nix-repl> packages.x86_64-linux.default.<TAB>
packages.x86_64-linux.default.__darwinAllowLocalNetworking
packages.x86_64-linux.default.__ignoreNulls
packages.x86_64-linux.default.__impureHostDeps
…
Flakes 106
packages.x86_64-linux.default.updateScript
packages.x86_64-linux.default.userHook
packages.x86_64-linux.default.version
nix-repl> packages.x86_64-linux.default.version
"3.7.0"
Remember that flake-utils (specifically the eachDefaultSystem function) adds system at-
tributes underneath each of these top-level attributes, so even though we don’t explicitly specify
system attribute in our flake.nix file they’re still going to be there when we navigate the flake
outputs in the REPL. That’s why we have to specify packages.x86_64-linux.default.version
in the REPL instead of just packages.default.version.
However, you can skip having to specify the system if you specify the package you want to load
into the REPL. For example, if we load the default package output like this:
… that’s the same as loading .#packages.${system}.default into the REPL, meaning that all of
the default package’s attributes are now top-level identifiers in the REPL, including version:
nix-repl> version
"3.7.0"
The nix repl command comes in handy if you want to explore Nix code interactively, whereas
the nix eval command comes more in handy for non-interactive use (e.g. scripting).
nixos-rebuild
Last, but not least, the nixos-rebuild command also accepts flake outputs that specify the
system to deploy. We already saw an example of this in the Deploying to AWS using Terraform
chapter where we specified our system to deploy as .#default which expands out to the
.#nixosConfigurations.default flake output.
Similar to the templates flake outputs, nixosConfigurations are not system-specific. There’s
no particular good reason why this is the case since NixOS can (in theory) be built for multiple
systems (e.g. x86_64-linux or aarch64-linux), but in practice most NixOS systems are only
defined for a single architecture.
Our sample cowsay flake doesn’t provide any nixosConfigurations output, but the flake
from our Terraform chapter has an example nixosConfigurations output.
11. Integration testing
In Our first web server we covered how to test a server manually and in this chapter we’ll go
over how to use NixOS to automate this testing process. Specifically, we’re going to be authoring
a NixOS test, which you can think of as the NixOS-native way of doing integration testing¹.
However, in this chapter we’re going to depart from our running “TODO list” example² and
instead use NixOS tests to automate the Getting Started instructions from an open source tutorial.
Specifically, we’re going to be testing the PostgREST tutorial³. You can read through the tutorial
if you want, but the relevant bits are:
• Launch PostgREST
… with this configuration file:
db-uri = "postgres://authenticator:mysecretpassword@localhost:5432/postgres"
db-schemas = "api"
db-anon-role = "web_anon"
[
{
"id": 1,
"done": false,
"task": "finish tutorial 0",
"due": null
},
{
"id": 2,
"done": false,
"task": "pat self on back",
"due": null
}
]
NixOS test
You can clone the equivalent NixOS test by running:
One of the included files is setup.sql file which includes the database commands from the
tutorial verbatim:
Similarly, another file is tutorial.conf which includes the PostgREST configuration from the
tutorial verbatim:
db-uri = "postgres://authenticator:mysecretpassword@localhost:5432/postgres"
db-schemas = "api"
db-anon-role = "web_anon"
Now we need to wrap these two into a NixOS module which runs Postgres (with those setup
commands) and PostgREST (with that configuration file), which is what server.nix does:
{ config = {
networking.firewall.allowedTCPPorts = [ 3000 ];
services.postgresql = {
enable = true;
initialScript = ./setup.sql;
};
systemd.services.postgrest = {
wantedBy = [ "multi-user.target" ];
after = [ "postgresql.service" ];
path = [ pkgs.postgrest ];
serviceConfig.User = "authenticator";
};
users = {
groups.database = { };
users = {
authenticator = {
isSystemUser = true;
group = "database";
};
};
};
};
}
The main extra thing we do here (that’s not mentioned in the tutorial) is that we created an
authenticator user and database group to match the database user of the same name.
Integration testing 110
Additionally, we open up port 3000 in the firewall, which we’re going to need to do to test the
PostgREST API (served on port 3000 by default).
We’re also going to create a client.nix file containing a pretty bare NixOS configuration for
our test client machine:
{ pkgs, ... }: {
environment.defaultPackages = [ pkgs.curl ];
}
Next, we’re going to write a Python script (script.py) to orchestrate our integration test:
import json
start_all()
expected = [
{"id": 1, "done": False, "task": "finish tutorial 0", "due": None},
{"id": 2, "done": False, "task": "pat self on back", "due": None},
]
actual = json.loads(
client.wait_until_succeeds(
"curl --fail --silent http://server:3000/todos",
55,
)
)
This Python script logs into the client machine to run a curl command and compares the JSON
output of the command against the expected output from the tutorial.
Finally, we tie this all together in flake.nix:
{ inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
flake-utils.url = "github:numtide/flake-utils/v1.0.0";
};
checks.default = nixpkgs.legacyPackages."${system}".nixosTest {
name = "test";
nodes = {
server = import ./server.nix;
Here we’re using the nixosTest function, which is our one-stop shop for integration testing. This
function takes two main arguments that we care about:
• nodes
This is an attribute set with one attribute per machine that we want to test and each
attribute contains a NixOS configuration for the corresponding machine. Here, we’re going
to be testing two machines (named “server” and “client”) whose NixOS configurations are
going to be imported from server.nix and client.nix respectively. We don’t have to store
these NixOS configurations in separate files (we could store them inline within this same
flake.nix file), but the code is a bit easier to follow if we keep things separate.
• testScript
This is our Python script that we’re also going to store in a separate file (script.py).
If you’re on macOS you will need to follow the macOS-specific setup instructions from
the Setting up your development environment chapter before you can run the above
command. In particular, you will need to have a Linux builder running in order to build
the virtual machine image for the above NixOS test.
Interactive testing
You can also interactively run a NixOS test using a Python REPL that has access to the same
commands available within our script.py test script. To do so, run:
This will then open up a Python REPL with autocompletion support and the first thing we’re
going to do in this REPL is to launch all of the machines associated with our NixOS test (server
and client in this case):
Integration testing 112
>>> start_all()
Note that if you do this you won’t notice the prompt (because it will be clobbered by the log
output from the server), but it’s still there. Alternatively, you can prevent that by temporarily
silencing the machine’s log output like this:
>>> serial_stdout_off()
>>> start_all()
start all VMs
client: starting vm
mke2fs 1.47.0 (5-Feb-2023)
client: QEMU running (pid 21160)
server: starting vm
mke2fs 1.47.0 (5-Feb-2023)
server: QEMU running (pid 21169)
(finished: start all VMs, in 0.38 seconds)
>>> serial_stdout_on()
These serial_stdout_{on,off} functions come in handy if you find the machine log output too
noisy.
Once you’ve started up the machines you can begun running commands that interact with each
machine by invoking methods on Python objects of the same name.
For example, you can run a simple echo "Hello, world!" command on the server machine like
this:
… and the succeed method will capture the command’s output and return it as a string which
you can then further process within Python.
Now let’s step through the same logic as the original test script so we can see for ourselves the
intermediate values computed along the way:
Integration testing 113
This sort of interactive exploration really comes in handy when authoring the test for the first
time since it helps you understand the shape of the data and figure out which commands you
need to run.
You can consult the NixOS test section of the NixOS manual⁴ if you need a full list of available
methods that you can invoke on machine objects. Some really common methods are:
… but there are also some really cool methods you can use like:
Shared constants
There are several constants that we use repeatedly throughout our integration test, like:
One advantage of codifying the tutorial as a NixOS test is that we can define constants like
these in one place instead of copying them repeatedly and hoping that they remain in sync.
For example, we wouldn’t want our integration test to break just because we changed the
⁴https://nixos.org/manual/nixos/stable/#ssec-machine-objects
Integration testing 114
user’s password in setup.sql and forgot to make the matching change to the password in
tutorial.conf. Integration tests can often be time consuming to run and debug, so we want
our test to break for more meaningful reasons (an actual bug in the system under test⁵) and not
because of errors in the test code.
However, we’re going to need to restructure things a little bit in order to share constants between
the various test files. In particular, we’re going to be using NixOS options to store shared
constants for reuse throughout the test. To keep this example short, we won’t factor out all of the
shared constants and we’ll focus on a turning a couple of representative constants into NixOS
options.
First, we’ll factor out the "authenticator" username into a shared constant, which we’ll store
as a tutorial.username NixOS option in server.nix:
{ options = {
tutorial = {
user = lib.mkOption {
type = lib.types.str;
};
};
};
config = {
tutorial.user = "authenticator";
systemd.services.postgrest = {
…
serviceConfig.User = config.tutorial.user;
};
users = {
users = {
"${config.tutorial.user}" = {
isSystemUser = true;
group = "database";
};
};
};
};
}
… and that fixes all of the occurrences of the authenticator user in server.nix but what about
setup.sql or tutorial.conf?
One way to do this is to inline setup.sql and tutorial.conf into our server.nix file so that we
can interpolate the NixOS options directly into the generated files, like this:
⁵https://en.wikipedia.org/wiki/System_under_test
Integration testing 115
{ …
config = {
services.postgresql = {
…
systemd.services.postgrest = {
…
script =
let
configurationFile = pkgs.writeText "tutorial.conf" ''
db-uri = "postgres://${config.tutorial.user}:mysecretpassword@localhost:5432/postgres"
db-schemas = "api"
db-anon-role = "web_anon"
'';
in
"postgrest ${configurationFile}";
…
};
…
};
}
This solution isn’t great, though, because it gets cramped pretty quickly and it’s harder to edit
inline Nix strings than standalone files. For example, when setup.sql is a separate file many
editors will enable syntax highlighting for those SQL commands, but that syntax highlighting
won’t work when the SQL commands are instead stored within an inline Nix string.
Alternatively, we can keep the files separate and use the Nixpkgs substituteAll utility⁶ to
interpolate the Nix variables into the file. The way it works is that instead of using ${user}
to interpolate a variable you use @user@, like this new tutorial.conf file does:
db-uri = "postgres://@user@:mysecretpassword@localhost:5432/postgres"
db-schemas = "api"
db-anon-role = "web_anon"
Similarly, we change our setup.sql file to also substitute in @user@ where necessary:
⁶https://nixos.org/manual/nixpkgs/stable/#fun-substituteAll
Integration testing 116
Once we’ve done that we can use the pkgs.substituteAll utility to template those files with
Nix variables of the same name:
{ …
config = {
…
services.postgresql = {
…
initialScript = pkgs.substituteAll {
name = "setup.sql";
src = ./setup.sql;
inherit (config.tutorial) user;
};
};
systemd.services.postgrest = {
…
script =
let
configurationFile = pkgs.substituteAll {
name = "tutorial.conf";
src = ./tutorial.conf;
inherit (config.tutorial) user;
};
in
"postgrest ${configurationFile}";
…
};
…
};
}
The downside to using pkgs.substituteAll is that it’s easier for there to be a mismatch between
the variable names in the template and the variable names in Nix. Even so, this is usually the
approach that I would recommend.
We can do something fairly similar to also thread through the PostgREST port everywhere it’s
needed. The original PostgREST tutorial doesn’t specify the port in the tutorial.conf file, but
we can add it for completeness:
Integration testing 117
db-uri = "postgres://@user@:mysecretpassword@localhost:5432/postgres"
db-schemas = "api"
db-anon-role = "web_anon"
server-port = @port@
… and then we can make matching changes to server.nix to define and use this port:
{ options = {
tutorial = {
port = lib.mkOption {
type = lib.types.port;
};
…
};
};
config = {
tutorial.port = 3000;
networking.firewall.allowedTCPPorts = [ config.tutorial.port ];
systemd.services.postgrest = {
…
script =
let
configurationFile = pkgs.substituteAll {
name = "tutorial.conf";
src = ./tutorial.conf;
inherit (config.tutorial) port user;
};
in
"postgrest ${configurationFile}";
…
};
…
};
}
… but we’re not done! We also need to thread this port to script.py, which references this same
port in the curl command.
This might seem trickier because the place where script.py is referenced (in flake.nix):
Integration testing 118
{ …
checks.default = nixpkgs.legacyPackages."${system}".nixosTest {
…
… is not inside of any NixOS module. So how do we access NixOS option definitions when
defining our testScript?
The trick is that the testScript argument to the nixosTest function can be a function:
in
''
import json
start_all()
expected = [
{"id": 1, "done": False, "task": "finish tutorial 0", "due": None},
{"id": 2, "done": False, "task": "pat self on back", "due": None},
]
actual = json.loads(
client.wait_until_succeeds(
"curl --fail --silent http://server:${toString port}/todos",
7,
)
)
This function takes one argument (nodes) which is an attribute set containing one attribute for
each machine in our integration test (e.g. server and client for our example). Each of these
attributes in turn has all of the output attributes generated by evalModules⁷, including:
Moreover, every NixOS configuration also has a nixpkgs.pkgs option storing the NixOS package
set used by that machine. This means that instead of adding curl to our client machine’s
environment.defaultPackages, we could instead do something like this:
testScript = { nodes }:
let
inherit (nodes.client.config.nixpkgs.pkgs) curl;
in
''
…
actual = json.loads(
client.wait_until_succeeds(
"${curl}/bin/curl --fail --silent http://server:${toString port}/todos",
7,
)
)
…
'';
12. Containers
The previous chapter on Integration Testing translated the PostgREST tutorial¹ to an equivalent
NixOS test, but that translation left out one important detail: the original tutorial asks the user
to run Postgres inside of a docker container:
If Docker is not installed, you can get it here. Next, let’s pull and start the database
image:
$ sudo docker run --name tutorial -p 5432:5432 \
-e POSTGRES_PASSWORD=mysecretpassword \
-d postgres
This will run the Docker instance as a daemon and expose port 5432 to the host system
so that it looks like an ordinary PostgreSQL server to the rest of the system.
We don’t have to use Docker to run Postgres; in fact, I’d normally advise against it and
recommend using the Postgres NixOS module instead. However, we can still use this as an
illustrative example of how to translate Docker idioms to NixOS.
More generally, in this chapter we’re going to cover container management in the context of
NixOS and introduce a spectrum of options ranging from more Docker-native to more NixOS-
native.
Docker registry
The most Docker-native approach is to fetch a container from the Docker registry and to illustrate
that we’re going to begin from the previous chapter’s example integration test:
¹https://postgrest.org/en/v12/tutorials/tut0.html
Containers 121
# server.nix
services.postgresql = {
enable = true;
initialScript = ./setup.sql;
};
… we’re going to replace that code with a NixOS configuration that runs the official postgres
image obtained from the Docker registry in almost the same way as the tutorial:
virtualisation.oci-containers = {
backend = "docker";
containers = {
# We really should call this container "postgres" but we're going to
# call it "tutorial" just for fun to match the original instructions.
tutorial = {
image = "postgres:16.2";
environment.POSTGRES_PASSWORD = "mysecretpassword";
extraOptions = [ "--network=host" ];
};
};
};
• installing and running the docker service for our NixOS machine
In particular, the backend = "docker"; option is what specifies to use Docker as the
backend for running our container (instead of the default backend, which is Podman).
We also still need to run the setup commands from setup.sql after our container starts up, but
we no longer have a convenient services.postgresql.initialScript option that we can use
for this purpose when going the Docker route. Instead, we’re going to create our own “one shot”
Systemd service to take care of this setup process for us:
systemd.services.setup-postgresql =
let
uri = "postgresql://postgres:mysecretpassword@localhost";
in
{ wantedBy = [ "multi-user.target" ];
path = [ pkgs.docker ];
preStart = ''
until docker exec tutorial pg_isready --dbname ${uri}; do
sleep 1
done
'';
script = ''
docker exec --interactive tutorial psql ${uri} < ${./setup.sql}
'';
serviceConfig = {
Type = "oneshot";
RemainAfterExit = "yes";
};
};
We can then sequence our Postgrest service after that one by changing it to be after our setup
service:
Containers 123
systemd.services.postgrest = {
…
- after = [ "postgresql.service" ];
+ after = [ "setup-postgresql.service" ];
…
};
The upside of this approach is that it requires the least buy-in to the NixOS ecosystem. However,
there’s one major downside: it only works if the system has network access to a Docker registry,
which can be a non-starter for a few reasons:
Podman
You might notice that the NixOS option hierarchy for running Docker containers is called
virtualisation.oci-containers and not virtualisation.docker-containers. This is because
Docker containers are actually OCI containers (short for “Open Container Initiative”) and OCI
containers can be run by any OCI-compatible backend.
Moreover, NixOS supports two OCI-compatible backends: Docker and Podman. In fact, you often
might prefer to use Podman (the default) instead of Docker for running containers for a few
reasons:
• Improved security
Podman’s “daemonless” operation also implies “rootless” operation. In other words, you
don’t need root privileges to run an OCI container using Podman. Docker, on the other hand,
Containers 124
requires elevated privileges by default to run containers (unless you run Docker in rootless
mode), which is an enormous security risk. For example, a misconfigured or compromised
Dockerfile can allow a container to mount the host’s root filesystem which in the best case
corrupts the host’s filesystem with the guest container’s files and in the worst case enables
total compromise of the host by an attacker.
Switching from Docker to Podman is pretty easy: we only need to change the
virtualisation.oci-containers.backend option from "docker" to "podman" (or just delete the
option, since "podman" is the default):
virtualisation.oci-containers = {
- backend = "docker";
+ backend = "podman";
- path = [ pkgs.docker ];
+ path = [ pkgs.podman ];
preStart = ''
- until docker exec tutorial pg_isready --dbname ${uri}; do
+ until podman exec tutorial pg_isready --dbname ${uri}; do
sleep 1
done
'';
script = ''
- docker exec --interactive tutorial psql ${uri} < ${./setup.sql}
+ podman exec --interactive tutorial psql ${uri} < ${./setup.sql}
'';
This works because the podman command-line tool provides the exact same interface as the the
docker command-line tool, so it’s a drop-in replacement.
streamLayeredImage
If you’re willing to lean more into NixOS, there are even better options at your disposal. For
example, you can build the Docker image using NixOS, too! In fact, Docker images built with
NixOS tend to be leaner than official Docker images for two main reasons:
Nixpkgs provides several utilities for building Docker images³ using Nix, but we’re only going
to concern ourselves with one of those utilities: pkgs.dockerTools.streamLayeredImage⁴. This
is the most efficient utility at our disposal that will ensure the best caching and least disk churn
out of all the available options.
We’ll delete the old postgrest service and instead use this streamLayeredImage utility to
build an application container wrapping postgrest. We can then reference that container in
virtualisation.oci-containers.containers, like this:
virtualisation.oci-containers = {
backend = "docker";
containers = {
…
postgrest = {
image = "postgrest:nix";
imageStream = pkgs.dockerTools.streamLayeredImage {
name = "postgrest";
tag = "nix";
contents = [ pkgs.postgrest ];
extraOptions = [ "--network=host" ];
};
};
};
You can also clone an example containing all changes up to this point by running:
This creates a new postgrest container that doesn’t depend on the Docker registry at all. Note
that the Docker registry does host an official postgrest image but we’re not going to use that
image. Instead, we’re using a postgrest Docker image built entirely using Nix.
²https://grahamc.com/blog/nix-and-layered-docker-images/
³https://nixos.org/manual/nixpkgs/stable/#sec-pkgs-dockerTools
⁴https://nixos.org/manual/nixpkgs/stable/#ssec-pkgs-dockerTools-streamLayeredImage
Containers 126
Moreover, this Nix-built docker image integrates efficiently with Nix. If we add or remove
dependencies from our Docker image then we’ll only build and store what changed (the “diff”),
instead of building and storing an entirely new copy of the whole Docker image archive.
Of course, your next thought might be: “if we’re using Nix/NixOS to build and consume Docker
images, then do we still need Docker?”. Can we cut out Docker as an intermediate and still
preserve most of the same benefits of containerization?
Yes!
NixOS containers
NixOS actually supports a more NixOS-native alternative to Docker, known as NixOS contain-
ers⁵. Under the hood, these use systemd-nspawn as the container engine but that’s essentially
invisible to the end user (you). The user interface for NixOS containers is much simpler than the
Docker-based alternatives, so if you don’t need Docker specifically but you still want some basic
isolation guarantees then this is the way to go.
The easiest way to illustrate how NixOS containers work is to redo our postgrest example to put
both Postgres and PostgREST in separate NixOS containers. We’re going to begin by resetting
our example back to the non-container example from the previous chapter:
… and then we’ll make two changes. First, instead of running Postgres on the host machine like
this:
services.postgresql = {
enable = true;
initialScript = ./setup.sql;
};
… we’re going to change that code to run it inside of a NixOS container (still named tutorial)
like this:
⁵https://nixos.org/manual/nixos/stable/#ch-containers
Containers 127
containers.tutorial = {
autoStart = true;
config = {
services.postgresql = {
enable = true;
initialScript = ./setup.sql;
};
};
};
This change illustrates what’s neat about NixOS containers: we can configure them using the
same NixOS options that we use to configure the host machine. All we have to do is wrap
the options inside containers.${NAME}.config but otherwise we configure NixOS options the
same way whether inside or outside of the container. This is why it’s worth trying out NixOS
containers if you don’t need any Docker-specific functionality but you still want some basic
isolation in place. NixOS containers are significantly more ergonomic to use.
We can also wrap our PostgREST service in the exact same way, replacing this:
systemd.services.postgrest = {
wantedBy = [ "multi-user.target" ];
after = [ "postgresql.service" ];
path = [ pkgs.postgrest ];
serviceConfig.User = "authenticator";
};
users = {
groups.database = { };
users.authenticator = {
isSystemUser = true;
group = "database";
};
};
… with this:
Containers 128
containers.postgrest = {
autoStart = true;
config = {
systemd.services.postgrest = {
wantedBy = [ "multi-user.target" ];
after = [ "postgresql.service" ];
path = [ pkgs.postgrest ];
serviceConfig.User = "authenticator";
};
users = {
groups.database = { };
users.authenticator = {
isSystemUser = true;
group = "database";
};
};
};
};
… and that’s it! In both cases, we just took our existing NixOS configuration options and wrapped
them in something like:
containers."${name}" = {
autoStart = true;
config = {
…
};
};
Just like the Docker example, these NixOS containers use the host network to connect
to one another, meaning that they don’t set privateNetwork = true; (which creates a
private network for the given NixOS container). At the time of this writing there isn’t an
easy way to network NixOS containers isolated in this way that doesn’t involve carefully
selecting a bunch of magic strings (IP addresses and network interfaces). This is a poor
user experience and not one that I feel comfortable documenting or endorsing at the
time of this writing.